Amazon DAS-C01 Online Practice
Questions and Exam Preparation
DAS-C01 Exam Details
Exam Code
:DAS-C01
Exam Name
:AWS Certified Data Analytics - Specialty (DAS-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:285 Q&As
Last Updated
:May 26, 2026
Amazon DAS-C01 Online Questions &
Answers
Question 111:
An Amazon Redshift database contains sensitive user data. Logging is necessary to meet compliance requirements. The logs must contain database authentication attempts, connections, and disconnections. The logs must also contain each query run against the database and record which database user ran each query.
Which steps will create the required logs?
A. Enable Amazon Redshift Enhanced VPC Routing. Enable VPC Flow Logs to monitor traffic. B. Allow access to the Amazon Redshift database using AWS IAM only. Log access using AWS CloudTrail. C. Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI. D. Enable and download audit reports from AWS Artifact.
C. Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI.
Explanation/Reference:
Amazon Redshift logs information in the following log files:
Connection log — logs authentication attempts, and connections and disconnections.
User log — logs information about changes to database user definitions.
User activity log — logs each query before it is run on the database.
A web retail company wants to implement a near-real-time clickstream analytics solution. The company wants to analyze the data with an open-source package. The analytics application will process the raw data only once, but other applications will need immediate access to the raw data for up to 1 year.
Which solution meets these requirements with the LEAST amount of operational effort?
A. Use Amazon Kinesis Data Streams to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Kinesis data stream. Set the retention period of the Kinesis data stream to 8.760 hours. B. Use Amazon Kinesis Data Streams to collect the data. Use Amazon Kinesis Data Analytics with Apache Flink to process the data in real time. Set the retention period of the Kinesis data stream to 8,760 hours. C. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Amazon MSK stream. Set the log retention hours to 8,760. D. Use Amazon Kinesis Data Streams to collect the data. Use Amazon EMR with Apache Flink to consume and process the data from the Kinesis data stream. Create an Amazon Kinesis Data Firehose delivery stream to store the data in Amazon S3. Set an S3 Lifecycle policy to delete the data after 365 days.
B. Use Amazon Kinesis Data Streams to collect the data. Use Amazon Kinesis Data Analytics with Apache Flink to process the data in real time. Set the retention period of the Kinesis data stream to 8,760 hours.
Explanation/Reference:
The solution that meets the requirements with the least amount of operational effort is Option B: Use Amazon Kinesis Data Streams to collect the data. Use Amazon Kinesis Data Analytics with Apache Flink to process the data in real-time. Set the retention period of the Kinesis data stream to 8,760 hours.
This solution uses Amazon Kinesis Data Streams to collect the data and processes it in real-time using Amazon Kinesis Data Analytics with Apache Flink. This allows for near-real-time clickstream analytics without the need for additional data processing or storage. Additionally, the retention period of the Kinesis data stream can be set to 8,760 hours (1 year), which allows other applications to have immediate access to the raw data for up to 1 year without the need for additional storage or processing. This solution requires the least amount of operational effort as it does not require additional steps for data processing or storage.
Question 113:
A data architect is building an Amazon S3 data lake for a bank. The goal is to provide a single data repository for customer data needs, such as personalized recommendations. The bank uses Amazon Kinesis Data Firehose to ingest customers' personal information bank accounts, and transactions in near-real time from a transactional relational database. The bank requires all personally identifiable information (PII) that is stored in the AWS Cloud to be masked.
Which solution will meet these requirements?
A. Invoke an AWS Lambda function from Kinesis Data Firehose to mask PII before delivering the data into Amazon S3. B. Use Amazon Made, and configure it to discover and mask PII. C. Enable server-side encryption (SSE) in Amazon S3. D. Invoke Amazon Comprehend from Kinesis Data Firehose to detect and mask PII before delivering the data into Amazon S3.
A. Invoke an AWS Lambda function from Kinesis Data Firehose to mask PII before delivering the data into Amazon S3.
Explanation/Reference:
A - CORRECT, Lambda function is used as on-the-fly masking due to the requirement "All PII on the AWS Cloud must be hidden" B - WRONG, the requirement "All PII on the AWS Cloud must be hidden" imposes that masking must be performed on-the-fly, while Macie can be applied to data already in S3. https://aws.amazon.com/blogs/big-data/automate-the-archivaland-deletion-of-sensitive-data-using-amazon-macie/ C - WRONG, encryption != masking D - WRONG, Amazon Comprehend PII API has no integration with KDF
Question 114:
A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON files in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained indefinitely for compliance requirements.
Which solution meets the company's requirements?
A. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation. B. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation. C. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed. D. Use an AWS Glue ETL job to partition and convert the data into a row-based data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after the object was last accessed. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after the last date the object was accessed.
A. Use an AWS Glue ETL job to compress, partition, and convert the data into a columnar data format. Use Athena to query the processed dataset. Configure a lifecycle policy to move the processed data into the Amazon S3 Standard-Infrequent Access (S3 Standard-IA) storage class 5 years after object creation. Configure a second lifecycle policy to move the raw data into Amazon S3 Glacier for long-term archival 7 days after object creation.
Explanation/Reference:
Correct answer is A as columnar data format store data efficiently by employing column-wise compression and enables split and parallel processing. Storing processed data in S3 in SA-IA and moving raw data in Glacier would help reduce costs.
Option B and D is wrong as it is recommended to use columnar data format for processing.
Options C is wrong as lifecycle rules are based on Object creation data and not last date when the object was accessed.
Question 115:
A company's system operators and security engineers need to analyze activities within specific date ranges of AWS CloudTrail logs. All log files are stored in an Amazon S3 bucket, and the size of the logs is more than 5 TB. The solution must be cost-effective and maximize query performance.
Which solution meets these requirements?
A. Copy the logs to a new S3 bucket with a prefix structure of . Use the date column as a partition key. Create a table on Amazon Athena based on the objects in the new bucket. Automatically add metadata partitions by using the MSCK REPAIR TABLE command in Athena. Use Athena to query the table and partitions. B. Create a table on Amazon Athena. Manually add metadata partitions by using the ALTER TABLE ADD PARTITION statement, and use multiple columns for the partition key. Use Athena to query the table and partitions. C. Launch an Amazon EMR cluster and use Amazon S3 as a data store for Apache HBase. Load the logs from the S3 bucket to an HBase table on Amazon EMR. Use Amazon Athena to query the table and partitions. D. Create an AWS Glue job to copy the logs from the S3 source bucket to a new S3 bucket and create a table using Apache Parquet file format, Snappy as compression codec, and partition by date. Use Amazon Athena to query the table and partitions.
D. Create an AWS Glue job to copy the logs from the S3 source bucket to a new S3 bucket and create a table using Apache Parquet file format, Snappy as compression codec, and partition by date. Use Amazon Athena to query the table and partitions.
Question 116:
A marketing company has data in Salesforce, MySQL, and Amazon S3. The company wants to use data from these three locations and create mobile dashboards for its users. The company is unsure how it should create the dashboards and needs a solution with the least possible customization and coding.
Which solution meets these requirements?
A. Use Amazon Athena federated queries to join the data sources. Use Amazon QuickSight to generate the mobile dashboards. B. Use AWS Lake Formation to migrate the data sources into Amazon S3. Use Amazon QuickSight to generate the mobile dashboards. C. Use Amazon Redshift federated queries to join the data sources. Use Amazon QuickSight to generate the mobile dashboards. D. Use Amazon QuickSight to connect to the data sources and generate the mobile dashboards.
D. Use Amazon QuickSight to connect to the data sources and generate the mobile dashboards.
Explanation/Reference:
Correct answer is D as QuickSight supports all the above data sources and can help create the dashboards and needs a solution with the least possible customization and coding.
Options A, B and C are wrong as they would not support all data sources.
Question 117:
A data analyst is designing a solution to interactively query datasets with SQL using a JDBC connection. Users will join data stored in Amazon S3 in Apache ORC format with data stored in Amazon OpenSearch Service (Amazon Elasticsearch Service) and Amazon Aurora MySQL.
Which solution will provide the MOST up-to-date results?
A. Use AWS Glue jobs to ETL data from Amazon ES and Aurora MySQL to Amazon S3. Query the data with Amazon Athena. B. Use Amazon DMS to stream data from Amazon ES and Aurora MySQL to Amazon Redshift. Query the data with Amazon Redshift. C. Query all the datasets in place with Apache Spark SQL running on an AWS Glue developer endpoint. D. Query all the datasets in place with Apache Presto running on Amazon EMR.
D. Query all the datasets in place with Apache Presto running on Amazon EMR.
Explanation/Reference:
Correct answer is D as Presto is a fast SQL query engine designed for interactive analytic queries over large datasets from multiple sources.
Option A is wrong as Glue is not ideal for interactive queries but more for batch ETL jobs.
Option B is wrong as it would not provide the up-to-date results as the data needs to copied over to Redshift for querying. Also, it does not cover S3 which would need Redshift Spectrum.
Option C is wrong as Spark SQL does not allow the capability to query multiple data sources. Also, Glue Developer Endpoints help test Glue ETL jobs.
Question 118:
A company wants to research user turnover by analyzing the past 3 months of user activities. With millions of users, 1.5 TB of uncompressed data is generated each day. A 30-node Amazon Redshift cluster with 2.56 TB of solid state drive
(SSD) storage for each node is required to meet the query performance goals.
The company wants to run an additional analysis on a year's worth of historical data to examine trends indicating which features are most popular. This analysis will be done once a week.
What is the MOST cost-effective solution?
A. Increase the size of the Amazon Redshift cluster to 120 nodes so it has enough storage capacity to hold 1 year of data. Then use Amazon Redshift for the additional analysis. B. Keep the data from the last 90 days in Amazon Redshift. Move data older than 90 days to Amazon S3 and store it in Apache Parquet format partitioned by date. Then use Amazon Redshift Spectrum for the additional analysis. C. Keep the data from the last 90 days in Amazon Redshift. Move data older than 90 days to Amazon S3 and store it in Apache Parquet format partitioned by date. Then provision a persistent Amazon EMR cluster and use Apache Presto for the additional analysis. D. Resize the cluster node type to the dense storage node type (DS2) for an additional 16 TB storage capacity on each individual node in the Amazon Redshift cluster. Then use Amazon Redshift for the additional analysis.
B. Keep the data from the last 90 days in Amazon Redshift. Move data older than 90 days to Amazon S3 and store it in Apache Parquet format partitioned by date. Then use Amazon Redshift Spectrum for the additional analysis.
Explanation/Reference:
Correct answer is B as the data can be stored in Redshift for 90 days for analyzing the past 3 months of user activities. Data older than 90 days can be moved to S3 and analyzed using Redshift Spectrum once a week. This provides the most
cost-effective solution.
Using Amazon Redshift Spectrum, you can efficiently query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive
parallelism to run very fast against large datasets. Much of the processing occurs in the Redshift Spectrum layer, and most of the data remains in Amazon S3. Multiple clusters can concurrently query the same dataset in Amazon S3 without
the need to make copies of the data for each cluster.
Options A and D are wrong as increasing the size of the Redshift cluster would not be cost-effective.
Option C is wrong as analyzing data using a persistent EMR cluster would not be cost-effective.
Question 119:
A company wants to provide its data analysts with uninterrupted access to the data in its Amazon Redshift cluster. All data is streamed to an Amazon S3 bucket with Amazon Kinesis Data Firehose. An AWS Glue job that is scheduled to run every 5 minutes issues a COPY command to move the data into Amazon Redshift.
The amount of data delivered is uneven throughout the day, and cluster utilization is high during certain periods. The COPY command usually completes within a couple of seconds. However, when load spike occurs, locks can exist and data can be missed. Currently, the AWS Glue job is configured to run without retries, with timeout at 5 minutes and concurrency at 1.
How should a data analytics specialist configure the AWS Glue job to optimize fault tolerance and improve data availability in the Amazon Redshift cluster?
A. Increase the number of retries. Decrease the timeout value. Increase the job concurrency. B. Keep the number of retries at 0. Decrease the timeout value. Increase the job concurrency. C. Keep the number of retries at 0. Decrease the timeout value. Keep the job concurrency at 1. D. Keep the number of retries at 0. Increase the timeout value. Keep the job concurrency at 1.
A. Increase the number of retries. Decrease the timeout value. Increase the job concurrency.
Explanation/Reference:
B is out - jobs will due to decreased timeout when there is no concurrency. C is out - locks will still exist D is out - Glue job that is scheduled to run every 5 minutes
Question 120:
A company wants to enrich application logs in near-real-time and use the enriched dataset for further analysis. The application is running on Amazon EC2 instances across multiple Availability Zones and storing its logs using Amazon CloudWatch Logs. The enrichment source is stored in an Amazon DynamoDB table.
Which solution meets the requirements for the event collection and enrichment?
A. Use a CloudWatch Logs subscription to send the data to Amazon Kinesis Data Firehose. Use AWS Lambda to transform the data in the Kinesis Data Firehose delivery stream and enrich it with the data in the DynamoDB table. Configure Amazon S3 as the Kinesis Data Firehose delivery destination. B. Export the raw logs to Amazon S3 on an hourly basis using the AWS CLI. Use AWS Glue crawlers to catalog the logs. Set up an AWS Glue connection for the DynamoDB table and set up an AWS Glue ETL job to enrich the data. Store the enriched data in Amazon S3. C. Configure the application to write the logs locally and use Amazon Kinesis Agent to send the data to Amazon Kinesis Data Streams. Configure a Kinesis Data Analytics SQL application with the Kinesis data stream as the source. Join the SQL application input stream with DynamoDB records, and then store the enriched output stream in Amazon S3 using Amazon Kinesis Data Firehose. D. Export the raw logs to Amazon S3 on an hourly basis using the AWS CLI. Use Apache Spark SQL on Amazon EMR to read the logs from Amazon S3 and enrich the records with the data from DynamoDB. Store the enriched data in Amazon S3.
A. Use a CloudWatch Logs subscription to send the data to Amazon Kinesis Data Firehose. Use AWS Lambda to transform the data in the Kinesis Data Firehose delivery stream and enrich it with the data in the DynamoDB table. Configure Amazon S3 as the Kinesis Data Firehose delivery destination.
Explanation/Reference:
Correct answer is A as CloudWatch logs can be integrated with Kinesis Data Firehose using subscription filters, the data can be enriched using Lambda doing a lookup on the DynamoDB tables and data storage to S3.
Options B and D are wrong as exporting the logs would not provide near-real-time data handling.
Option C is wrong as using Kinesis Data Stream for data collection with agents would increase overhead, also Kinesis Data Analytics does not support DynamoDB as a reference data source for enrichment.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DAS-C01 exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.