Amazon DAS-C01 Online Practice
Questions and Exam Preparation
DAS-C01 Exam Details
Exam Code
:DAS-C01
Exam Name
:AWS Certified Data Analytics - Specialty (DAS-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:285 Q&As
Last Updated
:May 26, 2026
Amazon DAS-C01 Online Questions &
Answers
Question 191:
A company is building a data lake and needs to ingest data from a relational database that has time-series data. The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.
What is the MOST cost-effective approach to meet these requirements?
A. Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks. B. Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an Amazon DynamoDB table and ingest the data using the updated key as a filter. C. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire dataset. Use appropriate Apache Spark libraries to compare the dataset, and find the delta. D. Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full data. Use AWS DataSync to ensure the delta only is written into Amazon S3.
A. Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks.
A university intends to use Amazon Kinesis Data Firehose to collect JSON-formatted batches of water quality readings in Amazon S3. The readings are from 50 sensors scattered across a local lake. Students will query the stored data using Amazon Athena to observe changes in a captured metric over time, such as water temperature or acidity. Interest has grown in the study, prompting the university to reconsider how data will be stored.
Which data format and partitioning choices will MOST significantly reduce costs? (Choose two.)
A. Store the data in Apache Avro format using Snappy compression. B. Partition the data by year, month, and day. C. Store the data in Apache ORC format using no compression. D. Store the data in Apache Parquet format using Snappy compression. E. Partition the data by sensor, year, month, and day.
B. Partition the data by year, month, and day. D. Store the data in Apache Parquet format using Snappy compression.
Explanation/Reference:
B and D are the right answers.
Some background: Snappy compresses the data to help with I/O, it roughly does the same level of compression for both parquet and AVRO. AVRO stores the data in row format and does not compresses the data. However, Parquet is a
columnar store (without any additional compression algorithm like snappy applied), it natively compresses the data by 2X to 5X on average.
A) Since Parquet does a better job in compression, this option is incorrect
B) This is correct since data is partitioned with keys (year, month, day) with medium cardinality.
C) Even though ORC and Parquet are both columnar storage formats and both supported by Athena, Since no compression is used in this option, we can safely ignore this.
D) Parquet with Snappy is a better choice than ORC with no compression, so this is correct.
E) Adding sensor(ID) to the partition creates high cardinality on the partitions and may lead to multiple small files under each partition which will slow down performance. So, B is a better option as you can keep all 50 sensor data in a single
file for a day.
Question 193:
A large fashion retailer wants to transform a source dataset to a consumable format. The retailer is building an ETL pipeline and needs to deduplicate the data because the retailer's various departments share similar customer and stock information. The retailer wants to build a data lake in Amazon S3 after the transformation and deduplication processes are completed.
Which solution MOST cost-effectively meets these requirements?
A. Load the data into Amazon Redshift and build custom deduplication scripts by using SQL. Use the UNLOAD command in Amazon Redshift to store the data in Amazon S3. B. Use AWS Glue to transform the data and use FindMatches to deduplicate the data. Store the output in Amazon S3. C. Use Amazon EMR to transform the data. Deduplicate the data by using custom Spark SQL scripts and use EMRFS to store the output in Amazon S3. D. Use an Amazon Athena federated query to load the data from the sources. Build custom Athena SQL scripts to deduplicate and store the output to Amazon S3.
C. Use Amazon EMR to transform the data. Deduplicate the data by using custom Spark SQL scripts and use EMRFS to store the output in Amazon S3.
Explanation/Reference:
Question 194:
An airline has .csv-formatted data stored in Amazon S3 with an AWS Glue Data Catalog. Data analysts want to join this data with call center data stored in Amazon Redshift as part of a dally batch process. The Amazon Redshift cluster is already under a heavy load. The solution must be managed, serverless, well-functioning, and minimize the load on the existing Amazon Redshift cluster. The solution should also require minimal effort and development activity.
Which solution meets these requirements?
A. Unload the call center data from Amazon Redshift to Amazon S3 using an AWS Lambda function. Perform the join with AWS Glue ETL scripts. B. Export the call center data from Amazon Redshift using a Python shell in AWS Glue. Perform the join with AWS Glue ETL scripts. C. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift. D. Export the call center data from Amazon Redshift to Amazon EMR using Apache Sqoop. Perform the join with Apache Hive.
C. Create an external table using Amazon Redshift Spectrum for the call center data and perform the join with Amazon Redshift.
Explanation/Reference:
Spectrum is serverless as well. Ques also asks for minimum development effort. For option A, you need to develop Lambda and Glue code.
Question 195:
A large company has several independent business units. Each business unit is responsible for its own data, but needs to share data with other units for collaboration. Each unit stores data in an Amazon S3 data lake created with AWS Lake Formation. To create dashboard reports, the marketing team wants to join its data stored in an Amazon Redshift cluster with the sales team customer table stored in the data lake. The sales team has a large number of tables and schemas, but the marketing team should only have access to the customer table. The solution must be secure and scalable.
Which set of actions meets these requirements?
A. The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the named resource method. The marketing team accepts the datashare using AWS Resource Access Manager (AWS RAM) and creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum. B. The marketing team creates an S3 cross-account replication between the sales team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum. C. The marketing team creates an AWS Lambda function in the sales team's account to replicate data between the sale team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum. D. The sales team shares the AWS Glue Data Catalog customer table with the marketing team in read-only mode using the Lake Formation tag-based access control (LF-TBAC) method. The sales team updates the AWS Glue Data Catalog resource policy to add relevant permissions for the marketing team. The marketing team creates a resource link to the shared customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
B. The marketing team creates an S3 cross-account replication between the sales team's S3 bucket as the source and the marketing team's S3 bucket as the destination. The marketing team runs an AWS Glue crawler on the replicated data in its AWS account to create an AWS Glue Data Catalog customer table. The marketing team joins its data with the customer table using Amazon Redshift Spectrum.
Explanation/Reference:
Question 196:
A company uses an Amazon EMR cluster with 50 nodes to process operational data and make the data available for data analysts These jobs run nightly use Apache Hive with the Apache Jez framework as a processing model and write results to Hadoop Distributed File System (HDFS) In the last few weeks, jobs are failing and are producing the following error message:
"File could only be replicated to 0 nodes instead of 1".
A data analytics specialist checks the DataNode logs the NameNode logs and network connectivity for potential issues that could have prevented HDFS from replicating data The data analytics specialist rules out these factors as causes for the issue.
Which solution will prevent the jobs from failing'?
A. Monitor the HDFSUtilization metric. If the value crosses a user-defined threshold add task nodes to the EMR cluster B. Monitor the HDFSUtilization metri.c If the value crosses a user-defined threshold add core nodes to the EMR cluster C. Monitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add task nodes to the EMR cluster D. Monitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add core nodes to the EMR cluster.
C. Monitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add task nodes to the EMR cluster
Question 197:
A manufacturing company has been collecting IoT sensor data from devices on its factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A data analyst has determined that, at an expected ingestion rate of about 2 TB per day, the cluster will be undersized in less than 4 months. A long-term solution is needed. The data analyst has indicated that most queries only reference the most recent 13 months of data, yet there are also quarterly reports that need to query all the data generated from the past 7 years. The chief technology officer (CTO) is concerned about the costs, administrative effort, and performance of a long-term solution.
Which solution should the data analyst use to meet these requirements?
A. Create a daily job in AWS Glue to UNLOAD records older than 13 months to Amazon S3 and delete those records from Amazon Redshift. Create an external table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum to join to data that is older than 13 months. B. Take a snapshot of the Amazon Redshift cluster. Restore the cluster to a new cluster using dense storage nodes with additional storage capacity. C. Execute a CREATE TABLE AS SELECT (CTAS) statement to move records that are older than 13 months to quarterly partitioned data in Amazon Redshift Spectrum backed by Amazon S3. D. Unload all the tables in Amazon Redshift to an Amazon S3 bucket using S3 Intelligent-Tiering. Use AWS Glue to crawl the S3 bucket location to create external tables in an AWS Glue Data Catalog. Create an Amazon EMR cluster using Auto Scaling for any daily analytics needs, and use Amazon Athena for the quarterly reports, with both using the same AWS Glue Data Catalog.
A. Create a daily job in AWS Glue to UNLOAD records older than 13 months to Amazon S3 and delete those records from Amazon Redshift. Create an external table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum to join to data that is older than 13 months.
Explanation/Reference:
B is not correct because snapshotting will save costs but not solve problem of cluster being undersized
C is not correct because - CTAS is not used to move data to S3 via spectrum. CTAS Creates a new table based on a query. The owner of this table is the user that issues the command.
D is incorrect because EMR cannot be used as Data Warehouse solution And they do not need interactive query with Athena.
A is correct because that exactly specifies how to move data to Redshift spectrum and reduce cluster space: https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html
Question 198:
A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS Lambda function retrieves the records and validates the content before loading the posts into an Amazon Elasticsearch cluster. The validation process needs to receive the posts for a given user in the order they were received. A data analyst has noticed that, during peak hours, the social media platform posts take more than an hour to appear in the Elasticsearch cluster.
What should the data analyst do reduce this latency?
A. Migrate the validation process to Amazon Kinesis Data Firehose. B. Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer. C. Increase the number of shards in the stream. D. Configure multiple Lambda functions to process the stream.
D. Configure multiple Lambda functions to process the stream.
Question 199:
A company needs to launch an Amazon EMR cluster in a VPC. The EMR cluster must not have any access to the internet. Additionally, the EMR cluster's access to other AWS services must not be through the internet.
Which solution will meet these requirements?
A. Launch the EMR cluster in a private subnet. Configure a NAT gateway for access to other AWS services. B. Launch the EMR cluster in a private subnet. Configure a NAT instance for access to other AWS services. C. Launch the EMR cluster in a private subnet. Configure a VPC endpoint for access to other AWS services. D. Launch the EMR cluster in a public subnet. Configure a VPC endpoint for access to other AWS services.
A. Launch the EMR cluster in a private subnet. Configure a NAT gateway for access to other AWS services.
Question 200:
A company's data analyst needs to ensure that queries run in Amazon Athena cannot scan more than a prescribed amount of data for cost control purposes. Queries that exceed the prescribed threshold must be canceled immediately. What should the data analyst do to achieve this?
A. Configure Athena to invoke an AWS Lambda function that terminates queries when the prescribed threshold is crossed. B. For each workgroup, set the control limit for each query to the prescribed threshold. C. Enforce the prescribed threshold on all Amazon S3 bucket policies D. For each workgroup, set the workgroup-wide data usage control limit to the prescribed threshold.
B. For each workgroup, set the control limit for each query to the prescribed threshold.
Explanation/Reference:
Correct answer is B as Athena Workgroups help set control limits and per-query control limit helps specific a limit which if exceeded by a query it would be canceled.
A is wrong as you can't configure Atehna for this purpose
C is incorrect because you can't set a threshold in Athena using S3 bucket policies.
D is incorrect because the workgroup-wide data usage control limit specifies the total amount of data scanned for all queries that run in the entire workgroup, and not on a specific query only. Remember that the requirement is to immediately
cancel queries that exceed the recommended threshold.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DAS-C01 exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.