Amazon DAS-C01 Online Practice
Questions and Exam Preparation
DAS-C01 Exam Details
Exam Code
:DAS-C01
Exam Name
:AWS Certified Data Analytics - Specialty (DAS-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:285 Q&As
Last Updated
:May 26, 2026
Amazon DAS-C01 Online Questions &
Answers
Question 171:
A company uses Amazon Redshift for its data warehousing needs. ETL jobs run every night to load data, apply business rules, and create aggregate tables for reporting. The company's data analysis, data science, and business intelligence teams use the data warehouse during regular business hours. The workload management is set to auto, and separate queues exist for each team with the priority set to NORMAL.
Recently, a sudden spike of read queries from the data analysis team has occurred at least twice daily, and queries wait in line for cluster resources. The company needs a solution that enables the data analysis team to avoid query queuing without impacting latency and the query times of other teams.
Which solution meets these requirements?
A. Increase the query priority to HIGHEST for the data analysis queue. B. Configure the data analysis queue to enable concurrency scaling. C. Create a query monitoring rule to add more cluster capacity for the data analysis queue when queries are waiting for resources. D. Use workload management query queue hopping to route the query to the next matching queue.
B. Configure the data analysis queue to enable concurrency scaling.
Explanation/Reference:
Correct answer is B as concurrency scaling can help scale the avoiding query queuing without impacting latency and the query times of other teams.
With the Concurrency Scaling feature, you can support virtually unlimited concurrent users and concurrent queries, with consistently fast query performance.When you turn on concurrency scaling, Amazon Redshift automatically adds
additional cluster capacity to process an increase in both read and write queries. Users see the most current data, whether the queries run on the main cluster or a concurrency-scaling cluster. You're charged for concurrency-scaling clusters
only for the time they're actively running queries.
Option D is wrong as workload management query queue hopping works only with manual WLM config.
Option A is wrong as it does not help avoid query queuing.
Option C is wrong as concurrency scaling does the same automatically.
Question 172:
A gaming company is collecting cllckstream data into multiple Amazon Kinesis data streams. The company uses Amazon Kinesis Data Firehose delivery streams to store the data in JSON format in Amazon S3 Data scientists use Amazon Athena to query the most recent data and derive business insights. The company wants to reduce its Athena costs without having to recreate the data pipeline. The company prefers a solution that will require less management effort.
Which set of actions can the data scientists take immediately to reduce costs?
A. Change the Kinesis Data Firehose output format to Apache Parquet Provide a custom S3 object YYYYMMDD prefix expression and specify a large buffer size For the existing data, run an AWS Glue ETL job to combine and convert small JSON files to large Parquet files and add the YYYYMMDD prefix Use ALTER TABLE ADD PARTITION to reflect the partition on the existing Athena table. B. Create an Apache Spark Job that combines and converts JSON files to Apache Parquet files Launch an Amazon EMR ephemeral cluster daily to run the Spark job to create new Parquet files in a different S3 location Use ALTER TABLE SET LOCATION to reflect the new S3 location on the existing Athena table. C. Create a Kinesis data stream as a delivery target for Kinesis Data Firehose Run Apache Flink on Amazon Kinesis Data Analytics on the stream to read the streaming data, aggregate ikand save it to Amazon S3 in Apache Parquet format with a custom S3 object YYYYMMDD prefix Use ALTER TABLE ADD PARTITION to reflect the partition on the existing Athena table D. Integrate an AWS Lambda function with Kinesis Data Firehose to convert source records to Apache Parquet and write them to Amazon S3 In parallel, run an AWS Glue ETL job to combine and convert existing JSON files to large Parquet files Create a custom S3 object YYYYMMDD prefix Use ALTER TABLE ADD PARTITION to reflect the partition on the existing Athena table.
D. Integrate an AWS Lambda function with Kinesis Data Firehose to convert source records to Apache Parquet and write them to Amazon S3 In parallel, run an AWS Glue ETL job to combine and convert existing JSON files to large Parquet files Create a custom S3 object YYYYMMDD prefix Use ALTER TABLE ADD PARTITION to reflect the partition on the existing Athena table.
An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: "Command Failed with Exit Code 1."
Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90?5% soon after. The average memory usage across all executors continues to be less than 4%.
The data engineer also notices the following error while examining the related Amazon CloudWatch Logs.
What should the data engineer do to solve the failure in the MOST cost-effective way?
A. Change the worker type from Standard to G.2X. B. Modify the AWS Glue ETL code to use the `groupFiles': `inPartition' feature. C. Increase the fetch size setting by using AWS Glue dynamics frame. D. Modify maximum capacity to increase the total maximum data processing units (DPUs) used.
B. Modify the AWS Glue ETL code to use the `groupFiles': `inPartition' feature.
Explanation/Reference:
Correct answer is B as the key issue is the driver memory problem caused because of the glue job processing multiple small files. Grouping of the files helps group the files and hence Spark driver stores significantly less state in memory.
In this scenario, a Spark job is reading a large number of small files from Amazon Simple Storage Service (Amazon S3). It converts the files to Apache Parquet format and then writes them out to Amazon S3. The Spark driver is running out of memory. The input Amazon S3 data has more than 1 million files in different Amazon S3 partitions.
You can fix the processing of the multiple files by using the grouping feature in AWS Glue. Grouping is automatically enabled when you use dynamic frames and when the input dataset has a large number of files (more than 50,000). Grouping allows you to coalesce multiple files together into a group, and it allows a task to process the entire group instead of a single file. As a result, the Spark driver stores significantly less state in memory to track fewer tasks.
Question 174:
A company's data analytics specialist must build a solution to implement quality checks on a dataset before the company uses the data in a sales report. The dataset is stored in an Amazon S3 bucket and is in CSV format.
The data quality checks must include identification of duplicate rows, removal of duplicate rows, and validation of date formats. The solution must run daily and must produce output data in Apache Parquet format in Amazon S3.
Which solution will meet these requirements with the LEAST development effort?
A. Create an AWS Glue ETL job that includes transformation steps to implement data quality checks. Configure the job to write to Amazon S3. Create a schedule-based job within an AWS Glue workflow to run the job daily. B. Create an AWS Glue DataBrew job that includes data quality recipe steps to implement data quality checks. Configure the job to write to Amazon S3. Create a schedule within the DataBrew job to run the job daily. C. Create an Amazon EMR cluster. Use an Apache Spark ETL job that includes data processing steps to implement data quality checks. Configure the job to write to Amazon S3. Create an Apache Oozie workflow to run the job daily. D. Create an AWS Lambda function. Use custom code to implement data quality checks and to write to Amazon S3. Create an Amazon EventBridge rule to run the Lambda function daily.
C. Create an Amazon EMR cluster. Use an Apache Spark ETL job that includes data processing steps to implement data quality checks. Configure the job to write to Amazon S3. Create an Apache Oozie workflow to run the job daily.
Explanation/Reference:
Question 175:
A company wants to collect and process events data from different departments in near-real time. Before storing the data in Amazon S3, the company needs to clean the data by standardizing the format of the address and timestamp columns. The data varies in size based on the overall load at each particular point in time. A single data record can be 100 KB-10 MB.
How should a data analytics specialist design the solution for data ingestion?
A. Use Amazon Kinesis Data Streams. Configure a stream for the raw data. Use a Kinesis Agent to write data to the stream. Create an Amazon Kinesis Data Analytics application that reads data from the raw stream, cleanses it, and stores the output to Amazon S3. B. Use Amazon Kinesis Data Firehose. Configure a Firehose delivery stream with a preprocessing AWS Lambda function for data cleansing. Use a Kinesis Agent to write data to the delivery stream. Configure Kinesis Data Firehose to deliver the data to Amazon S3. C. Use Amazon Managed Streaming for Apache Kafka. Configure a topic for the raw data. Use a Kafka producer to write data to the topic. Create an application on Amazon EC2 that reads data from the topic by using the Apache Kafka consumer API, cleanses the data, and writes to Amazon S3. D. Use Amazon Simple Queue Service (Amazon SQS). Configure an AWS Lambda function to read events from the SQS queue and upload the events to Amazon S3.
C. Use Amazon Managed Streaming for Apache Kafka. Configure a topic for the raw data. Use a Kafka producer to write data to the topic. Create an application on Amazon EC2 that reads data from the topic by using the Apache Kafka consumer API, cleanses the data, and writes to Amazon S3.
Explanation/Reference:
Correct answer is C as only Amazon Managed Streaming for Apache Kafka seems to provide a maximum size of a record that can be configured up to 10MB.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data. Amazon MSK provides the control-plane operations,
such as those for creating, updating, and deleting clusters. It lets you use Apache Kafka data-plane operations, such as those for producing and consuming data. It runs open-source versions of Apache Kafka. This means existing
applications, tooling, and plugins from partners and the Apache Kafka community are supported without requiring changes to application code.
Options A and B are wrong as the maximum size of a record sent to both Kinesis Data Stream and Kinesis Data Firehose base64-encoding, is 1,000 KiB.
Option D is wrong SQS data size limit is 256KB.
Question 176:
A financial institution is building an Amazon QuickSight business intelligence (BI) dashboard to show financial performance and analyze trends. The development team is using an Amazon Redshift database in the development environment and is having difficulty with validating the accuracy of the metrics calculation algorithm due to the lack of quality data. The Redshift production environment database is 500 TB and is in a different AWS account in the same AWS Region as the development environment account. The company needs to use up-to-date production environment data for development purposes.
Which solution MOST cost-effectively meets these requirements?
A. Setup data streaming with Amazon Kinesis Data Streams from the production environment Redshift database to replicate the data to the development environment Redshift database. B. Create a Redshift datashare to share the production environment data with the development team. C. Upload the data from Amazon Redshift to Amazon S3. Then load the data directly from Amazon S3 to the development environment Redshift cluster using the COPY command. D. Create Redshift views that are configured to share all the data between the production and development clusters.
D. Create Redshift views that are configured to share all the data between the production and development clusters.
Question 177:
An event ticketing website has a data lake on Amazon S3 and a data warehouse on Amazon Redshift. Two datasets exist: events data and sales data. Each dataset has millions of records.
The entire events dataset is frequently accessed and is stored in Amazon Redshift. However, only the last 6 months of sales data is frequently accessed and is stored in Amazon Redshift. The rest of the sales data is available only in Amazon
S3.
A data analytics specialist must create a report that shows the total revenue that each event has generated in the last 12 months. The report will be accessed thousands of times each week.
Which solution will meet these requirements with the LEAST operational effort?
A. Create an AWS Glue job to access sales data that is older than 6 months from Amazon S3 and to access event and sales data from Amazon Redshift. Load the results into a new table in Amazon Redshift. B. Create a stored procedure to copy sales data that is older than 6 months and newer than 12 months from Amazon S3 to Amazon Redshift. Create a materialized view with the autorefresh option. C. Create an AWS Lambda function to copy sales data that is older than 6 months and newer than 12 months to an Amazon Kinesis Data Firehose delivery stream. Specify Amazon Redshift as the destination of the delivery stream. Create a materialized view with the autorefresh option. D. Create a materialized view in Amazon Redshift with the autorefresh option. Use Amazon Redshift Spectrum to include sales data that is older than 6 months.
A. Create an AWS Glue job to access sales data that is older than 6 months from Amazon S3 and to access event and sales data from Amazon Redshift. Load the results into a new table in Amazon Redshift.
Explanation/Reference:
Question 178:
A company wants find ways to expand its website business by analyzing customer orders and purchasing trends. To perform data analysis, a pipeline must support daily data ingestion from the production databases into a data lake that is built on Amazon S3. The website uses Amazon DynamoDB to store product details and Amazon Aurora PostgreSQL to store order details in production.
Which solution can be used to accomplish these goals with LEAST operational overhead?
A. Leverage AWS Database Migration Service (AWS DMS) to run two continuous data replication jobs from both Aurora PostgreSQL and DynamoDB into Amazon S3. Leverage AWS Glue for data cataloging. B. Set up an AWS Lake Formation workflow with blueprints for Aurora PostgreSQL and an AWS Glue ETLjob for DynamoDB to ingest data into Amazon S3. Leverage AWS Glue for data cataloging. C. Create a custom Python script to ingest data from both Aurora PostgreSQL and Amazon DynamoDB into Amazon S3 using the AWS SDK for Python (Boto3) library. Deploy the script on an Amazon EC2 instance and schedule the job to run daily using a cron job. Leverage AWS Glue for data cataloging. D. Use Amazon EMR to ingest data from both Aurora PostgreSQL and DynamoDB into Amazon S3. Leverage Apache Hive on the same EMR cluster for data cataloging.
A. Leverage AWS Database Migration Service (AWS DMS) to run two continuous data replication jobs from both Aurora PostgreSQL and DynamoDB into Amazon S3. Leverage AWS Glue for data cataloging.
Question 179:
A company collects data from parking garages. Analysts have requested the ability to run reports in near real time about the number of vehicles in each garage.
The company wants to build an ingestion pipeline that loads the data into an Amazon Redshift cluster. The solution must alert operations personnel when the number of vehicles in a particular garage exceeds a specific threshold. The alerting
query will use garage threshold values as a static reference. The threshold values are stored in Amazon S3.
What is the MOST operationally efficient solution that meets these requirements?
A. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Create an Amazon Kinesis Data Analytics application that uses the same delivery stream as an input source. Create a reference data source in Kinesis Data Analytics to temporarily store the threshold values from Amazon S3 and to compare the number of vehicles in a particular garage to the corresponding threshold value. Configure an AWS Lambda function to publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold. B. Use an Amazon Kinesis data stream to collect the data. Use an Amazon Kinesis Data Firehose delivery stream to deliver the data to Amazon Redshift. Create another Kinesis data stream to temporarily store the threshold values from Amazon S3. Send the delivery stream and the second data stream to Amazon Kinesis Data Analytics to compare the number of vehicles in a particular garage to the corresponding threshold value. Configure an AWS Lambda function to publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold. C. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Automatically initiate an AWS Lambda function that queries the data in Amazon Redshift. Configure the Lambda function to compare the number of vehicles in a particular garage to the corresponding threshold value from Amazon S3. Configure the Lambda function to also publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold. D. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Create an Amazon Kinesis Data Analytics application that uses the same delivery stream as an input source. Use Kinesis Data Analytics to compare the number of vehicles in a particular garage to the corresponding threshold value that is stored in a table as an in-application stream. Configure an AWS Lambda function as an output for the application to publish an Amazon Simple Queue Service (Amazon SQS) notification if the number of vehicles exceeds the threshold.
A. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Create an Amazon Kinesis Data Analytics application that uses the same delivery stream as an input source. Create a reference data source in Kinesis Data Analytics to temporarily store the threshold values from Amazon S3 and to compare the number of vehicles in a particular garage to the corresponding threshold value. Configure an AWS Lambda function to publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold.
Explanation/Reference:
Question 180:
A hospital uses wearable medical sensor devices to collect data from patients. The hospital is architecting a near-real-time solution that can ingest the data securely at scale. The solution should also be able to remove the patient's protected health information (PHI) from the streaming data before storing the data in durable storage.
Which solution meets these requirements with the LEAST operational overhead?
A. Ingest the data by using Amazon Kinesis Data Streams. Process the data by using Amazon EC2 instances that use Amazon Kinesis Client Library (KCL) and custom logic to remove all PHI from the data. Write the data to Amazon S3. B. Ingest the data by using Amazon Kinesis Data Firehose and write the data to Amazon S3. Have Amazon S3 invoke an AWS Lambda function that removes all PHI. C. Ingest the data by using Amazon Kinesis Data Streams to write the data to Amazon S3. Have Amazon S3 invoke an AWS Lambda function that removes all PHI. D. Ingest the data by using Amazon Kinesis Data Firehose. Invoke a Kinesis Data Firehose data transformation by using an AWS Lambda function to remove all PHI. Configure Kinesis Data Firehose so that Amazon S3 is the destination.
D. Ingest the data by using Amazon Kinesis Data Firehose. Invoke a Kinesis Data Firehose data transformation by using an AWS Lambda function to remove all PHI. Configure Kinesis Data Firehose so that Amazon S3 is the destination.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DAS-C01 exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.