Amazon DAS-C01 Online Practice
Questions and Exam Preparation
DAS-C01 Exam Details
Exam Code
:DAS-C01
Exam Name
:AWS Certified Data Analytics - Specialty (DAS-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:285 Q&As
Last Updated
:May 26, 2026
Amazon DAS-C01 Online Questions &
Answers
Question 131:
A retail company is building its data warehouse solution using Amazon Redshift. As a part of that effort, the company is loading hundreds of files into the fact table created in its Amazon Redshift cluster. The company wants the solution to achieve the highest throughput and optimally use cluster resources when loading data into the company's fact table.
How should the company meet these requirements?
A. Use multiple COPY commands to load the data into the Amazon Redshift cluster. B. Use S3DistCp to load multiple files into the Hadoop Distributed File System (HDFS) and use an HDFS connector to ingest the data into the Amazon Redshift cluster. C. Use LOAD commands equal to the number of Amazon Redshift cluster nodes and load the data in parallel into each node. D. Use a single COPY command to load the data into the Amazon Redshift cluster.
D. Use a single COPY command to load the data into the Amazon Redshift cluster.
Explanation/Reference:
Correct answer is D as using a single COPY command would load the data in parallel.
Amazon Redshift can automatically load in parallel from multiple compressed data files.
However, if you use multiple concurrent COPY commands to load one table from multiple files, Amazon Redshift is forced to perform a serialized load. This type of load is much slower and requires a VACUUM process at the end if the table
has a sort column defined.
Option A is wrong as multiple COPY commands would force Redshift to perform a serialized load.
Option B is wrong as using EMR just makes the solution complicated.
Option C is wrong as there is no LOAD command with Redshift.
Question 132:
An airline has been collecting metrics on flight activities for analytics. A recently completed proof of concept demonstrates how the company provides insights to data analysts to improve on-time departures. The proof of concept used objects in Amazon S3, which contained the metrics in .csv format, and used Amazon Athena for querying the data. As the amount of data increases, the data analyst wants to optimize the storage solution to improve query performance.
Which options should the data analyst use to improve performance as the data lake grows? (Choose three.)
A. Add a randomized string to the beginning of the keys in S3 to get more throughput across partitions. B. Use an S3 bucket in the same account as Athena. C. Compress the objects to reduce the data transfer I/O. D. Use an S3 bucket in the same Region as Athena. E. Preprocess the .csv data to JSON to reduce I/O by fetching only the document keys needed by the query. F. Preprocess the .csv data to Apache Parquet to reduce I/O by fetching only the data blocks needed for predicates.
C. Compress the objects to reduce the data transfer I/O. D. Use an S3 bucket in the same Region as Athena. F. Preprocess the .csv data to Apache Parquet to reduce I/O by fetching only the data blocks needed for predicates.
Explanation/Reference:
Correct answers are C, D and F
Options C and F as using compression and columnar data format helps improve query performance and optimize storage
Option D as using Athena and S3 within the same region would help with query performance and cost.
Option A is wrong as S3 scales automatically now and is not bounded by the restriction.
Option B is wrong as using the same account does not help in optimizing the cost of query performance.
Option E is wrong as using JSON is the same as using CSV files and does help in n optimizing the cost or query performance.
Question 133:
An education provider's learning management system (LMS) is hosted in a 100 TB data lake that is built on Amazon S3. The provider's LMS supports hundreds of schools. The provider wants to build an advanced analytics reporting platform using Amazon Redshift to handle complex queries with optimal performance. System users will query the most recent 4 months of data 95% of the time while 5% of the queries will leverage data from the previous 12 months.
Which solution meets these requirements in the MOST cost-effective way?
A. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Use S3 lifecycle management rules to store data from the previous 12 months in Amazon S3 Glacier storage. B. Leverage DS2 nodes for the Amazon Redshift cluster. Migrate all data from Amazon S3 to Amazon Redshift. Decommission the data lake. C. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake. D. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift federated queries to join cluster data with the data lake to reduce costs. Ensure the S3 Standard storage class is in use with objects in the data lake.
C. Store the most recent 4 months of data in the Amazon Redshift cluster. Use Amazon Redshift Spectrum to query data in the data lake. Ensure the S3 Standard storage class is in use with objects in the data lake.
Explanation/Reference:
Correct answer is C as only the last 4 months of data is required for 95%, the data can be stored in the Redshift cluster. The data covering the 12 months can be moved to S3 and queried using Redshift Spectrum. A mix of S3 and Redshift
Option A is wrong as Redshift Spectrum cannot be used to query S3 Glacier storage data.
Option B is wrong as using Redshift for all the data is not cost-effective.
Option D is wrong as although Redshift federated queries would work, however, for 5% it would be cost-effective to query S3 directly instead of joining data from cluster and S3.
Question 134:
A company hosts its analytics solution on premises. The analytics solution includes a server that collects log files. The analytics solution uses an Apache Hadoop cluster to analyze the log files hourly and to produce output files. All the files are archived to another server for a specified duration.
The company is expanding globally and plans to move the analytics solution to multiple AWS Regions in the AWS Cloud. The company must adhere to the data archival and retention requirements of each country where the data is stored.
Which solution will meet these requirements?
A. Create an Amazon S3 bucket in one Region to collect the log files. Use S3 event notifications to invoke an AWS Glue job for log analysis. Store the output files in the target S3 bucket. Use S3 Lifecycle rules on the target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region. B. Create a Hadoop Distributed File System (HDFS) file system on an Amazon EMR cluster in one Region to collect the log files. Set up a bootstrap action on the EMR cluster to run an Apache Spark job. Store the output files in a target Amazon S3 bucket. Schedule a job on one of the EMR nodes to delete files that no longer need to be retained. C. Create an Amazon S3 bucket in each Region to collect log files. Create an Amazon EMR cluster. Submit steps on the EMR cluster for analysis. Store the output files in a target S3 bucket in each Region. Use S3 Lifecycle rules on each target S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region. D. Create an Amazon Kinesis Data Firehose delivery stream in each Region to collect log data. Specify an Amazon S3 bucket in each Region as the destination. Use S3 Storage Lens for data analysis. Use S3 Lifecycle rules on each destination S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
D. Create an Amazon Kinesis Data Firehose delivery stream in each Region to collect log data. Specify an Amazon S3 bucket in each Region as the destination. Use S3 Storage Lens for data analysis. Use S3 Lifecycle rules on each destination S3 bucket to set an expiration period that meets the retention requirements of the country that contains the Region.
Explanation/Reference:
Question 135:
An educational technology company is running an online assessment application that allows thousands of students to concurrently take assessments on the company's platform. The application uses a combination of relational databases running on an Amazon Aurora PostgreSQL DB cluster and Amazon DynamoDB tables for storing data. Users reported issues with application performance during a recent large-scale online assessment. As a result, the company wants to design a solution that captures metrics from all databases in a centralized location and queries the metrics to identify issues with performance.
How can this solution be designed with the LEAST operational overhead?
A. Configure AWS Database Migration Service (AWS DMS) to copy the database logs to an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena. B. Configure an Amazon CloudWatch metric stream with an Amazon Kinesis Firehose delivery stream destination that stores the data in an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena. C. Create an Apache Kafka cluster on Amazon EC2. Configure a Java Database Connectivity (JDBC) connector for Kafka Connect on each database to capture and stream the logs to a single Amazon CloudWatch log group. Query the CloudWatch log group with Amazon Athena. D. Install a server on Amazon EC2 to capture logs from Amazon RDS and DynamoDB by using Java Database Connectivity (JDBC) connectors. Stream the logs to an Amazon Kinesis Data Firehose delivery stream that stores the data in an Amazon S3 bucket. Query the output logs in the S3 bucket by using Amazon Athena.
B. Configure an Amazon CloudWatch metric stream with an Amazon Kinesis Firehose delivery stream destination that stores the data in an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena.
Explanation/Reference:
Question 136:
A company wants to run analytics on its Elastic Load Balancing logs stored in Amazon S3. A data analyst needs to be able to query all data from a desired year, month, or day. The data analyst should also be able to query a subset of the columns. The company requires minimal operational overhead and the most cost-effective solution.
Which approach meets these requirements for optimizing and querying the log data?
A. Use an AWS Glue job nightly to transform new log files into .csv format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data. B. Launch a long-running Amazon EMR cluster that continuously transforms new log files from Amazon S3 into its Hadoop Distributed File System (HDFS) storage and partitions by year, month, and day. Use Apache Presto to query the optimized format. C. Launch a transient Amazon EMR cluster nightly to transform new log files into Apache ORC format and partition by year, month, and day. Use Amazon Redshift Spectrum to query the data. D. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
D. Use an AWS Glue job nightly to transform new log files into Apache Parquet format and partition by year, month, and day. Use AWS Glue crawlers to detect new partitions. Use Amazon Athena to query data.
Explanation/Reference:
A - .csv format is not optimal B - long running EMR is not cost-effective and has operational over-head of cluster management C - Again EMR with running Redshift Cluster is not cost-effective So, Answer is Option D - low-cost and no operational over-head. Data Scanning cost by Athena can be minimized by partition pruning and subset of columns from parquet file.
Question 137:
A company's marketing team has asked for help in identifying a high performing long-term storage service for their data based on the following requirements:
The data size is approximately 32 TB uncompressed.
There is a low volume of single-row inserts each day.
There is a high volume of aggregation queries each day.
Multiple complex joins are performed.
The queries typically involve a small subset of the columns in a table.
Which storage service will provide the MOST performant solution?
A. Amazon Aurora MySQL B. Amazon Redshift C. Amazon Neptune D. Amazon Elasticsearch
B. Amazon Redshift
Explanation/Reference:
Correct answer is B as Redshift as it can be used for OLAP processing and meets all the requirements.
Amazon Redshift uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes using AWS-designed hardware and machine learning to deliver the best price-performance at any
scale.
Option A is wrong as Amazon Aurora MySQL is ideal for OLTP solutions and not OLAP.
Option C s wrong as Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications.
Option D is wrong as Amazon Elasticsearch would not allow complex queries.
Question 138:
A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?
A. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily. B. Use the AWS Management Console to spin up an Amazon EMR cluster with Python Hue. Hive, and Apache Oozie. Set the termination protection flag to true and use Spot Instances for the core nodes of the cluster. Configure an Oozie workflow in the cluster to invoke the Hive script daily. C. Create an AWS Glue job with the Hive script to perform the batch operation. Configure the job to run once a day using a time-based schedule. D. Use AWS Lambda layers and load the Hive runtime to AWS Lambda and copy the Hive script. Schedule the Lambda function to run daily by creating a workflow using AWS Step Functions.
A. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily.
Explanation/Reference:
Not B because we are not supposed to run core nodes in spot instances, just task nodes and it is more expensive because to schedule with oozie, our cluster have to be up all the time. It is not C because glue cannot run hive script, and it is not c because lambda cannot run hive scripts also. https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/RunLambdaSchedule.html
Question 139:
A company has collected more than 100 TB of log files in the last 24 months. The files are stored as raw text in a dedicated Amazon S3 bucket. Each object has a key of the form year-month-day_log_HHmmss.txt where HHmmss represents
the time the log file was initially created. A table was created in Amazon Athena that points to the S3 bucket. One-time queries are run against a subset of columns in the table several times an hour.
A data analyst must make changes to reduce the cost of running these queries. Management wants a solution with minimal maintenance overhead.
Which combination of steps should the data analyst take to meet these requirements? (Choose three.)
A. Convert the log files to Apace Avro format. B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data. C. Convert the log files to Apache Parquet format. D. Add a key prefix of the form year-month-day/ to the S3 objects to partition the data. E. Drop and recreate the table with the PARTITIONED BY clause. Run the ALTER TABLE ADD PARTITION statement. F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.
B. Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data. C. Convert the log files to Apache Parquet format. F. Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.
Explanation/Reference:
Option B: Add a key prefix of the form date=year-month-day/ to the S3 objects to partition the data.
EXPLAINATION: Your Amazon S3 bucket can support 3,500 PUT/COPY/POST/DELETE or 5,500 GET/HEAD requests per second per partitioned prefix.So with every partition prefix we get additional support and that is why it is wise to add
prefix especially when we have large set of data .
Option C: Convert the log files to Apache Parquet format.
EXPLAINATION:Parquet format is columnar based and which improves your query performance when done for Athena .
Option F: Drop and recreate the table with the PARTITIONED BY clause. Run the MSCK REPAIR TABLE statement.
EXPLAINATION:MSCK REPAIR TABLE compares the partitions in the table metadata and the partitions in S3. If new partitions are present in the S3 location that you specified when you created the table, it adds those partitions to the
metadata and to the Athena table.
Question 140:
A company creates daily and monthly business metrics from data that partners provide. Each day, the partners deliver JSON data files to an Amazon S3 bucket that the company owns. The S3 object keys use Apache Hive style date
partitions. The company uses an Amazon EventBridge rule to invoke an AWS Lambda function that reads all objects in the S3 bucket to aggregate the daily and monthly metrics.
The company performs occasional analysis that requires access to historical data. As more data has accumulated, the Lambda function is timing out frequently. A data analytics specialist must prevent the Lambda function timeouts.
Which solution will meet these requirements with the LEAST operational overhead?
A. Update the EventBridge rule to invoke AWS Step Functions to retry the Lambda function if the function fails. B. Modify the Lambda function to delete older S3 objects during the daily processing. C. Modify the Lambda function to query the S3 objects by using Amazon Athena with date filters. D. Create an AWS Glue job to invoke the Lambda function. Update the EventBridge rule to invoke the AWS Glue job.
B. Modify the Lambda function to delete older S3 objects during the daily processing.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DAS-C01 exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.