Exam Details

  • Exam Code
    :DAS-C01
  • Exam Name
    :AWS Certified Data Analytics - Specialty (DAS-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :285 Q&As
  • Last Updated
    :Apr 27, 2025

Amazon Amazon Certifications DAS-C01 Questions & Answers

  • Question 61:

    An online retail company stores Application Load Balancer (ALB) access logs in an Amazon S3 bucket. The analytics team wants to query the logs by using Amazon Athena to analyze traffic distribution and patterns. An unpartitioned table has been created in Athena. The most common query obtains log information by year, month, and day. However, as the size of the data increases, the response time for the same query increases.

    Which solution should improve the query performance in Athena with the LEAST operational effort?

    A. Create an AWS Glue job with a transform that infers the schema of all ALB access logs and populates the partition metadata by year, month, and day to the AWS Glue Data Catalog.

    B. Create an AWS Glue crawler with a classifier that infers the schema of all ALB access logs and populates the partition metadata by year, month, and day to the AWS Glue Data Catalog.

    C. Create an AWS Lambda function to transform all ALB access logs. Save the results to Amazon S3 in Apache Parquet format and partition the results by year, month, and day. Use Athena to query the transformed data.

    D. Use AWS Data Pipeline to periodically start a transient Amazon EMR cluster with a Spark step. Use PySpark to create files in Amazon S3 and partition the files by year, month, and day.

  • Question 62:

    A telecommunications company needs to send customer call data records from its on-premise database to AWS to generate near-real-time insights. The solution captures and loads continuously changing updates from the operational data stores that run in PostgreSQL databases. A data analyst has configured an AWS Data Migration Service (AWS DMS) ongoing replication task to read changes in near-real time from the PostgreSQL source database transaction logs for each table and send the data to an Amazon Redshift cluster for further processing.

    The data analytics team has reported latency issues during the change data capture (CDC) of the AWS DMS task. The team thinks that the PostgreSQL databases are causing the high latency.

    Which set of actions will confirm that the PostgreSQL databases are the source of high latency?

    A. Enable Amazon CloudWatch for the AWS DMS task and look for the CDCIncomingChanges metric to identify delays in capturing the changes from the source database.

    B. Verify that logical replication is configured for the source database using the postgresql.conf configuration file.

    C. Enable Amazon CloudWatch Logs for the AWS DMS endpoint of the source database and check for error messages.

    D. Enable Amazon CloudWatch for the AWS DMS task and look for the CDCLatencySource metric to identify delays in capturing the changes from the source database.

  • Question 63:

    A company needs a solution to control data access for the company's Amazon S3 data lake. The company expects the number of data sources in the data lake and the number of users that access the data to increase rapidly. All the data in the data lake is cataloged in an AWS Glue Data Catalog. Users access the data by using Amazon Athena and Amazon QuickSight.

    A data analytics specialist must implement a solution that controls which users can ingest new data into the data lake. The solution also must restrict access to data at the column level and must provide audit capabilities.

    Which solution will meet these requirements?

    A. Use IAM resource-based policies to allow access to required S3 prefixes only. Use AWS CloudTrail for audit logs.

    B. Use AWS Lake Formation access controls for the data in the data lake. Use AWS CloudTrail for audit logs.

    C. Use IAM identity-based policies to allow access to authorized users only. Use Amazon CloudWatch for audit logs.

    D. Use Athena federated queries to access the data in the data lake. Use S3 server access logs for audit logs.

  • Question 64:

    A company collects data from parking garages. Analysts have requested the ability to run reports in near real time about the number of vehicles in each garage.

    The company wants to build an ingestion pipeline that loads the data into an Amazon Redshift cluster. The solution must alert operations personnel when the number of vehicles in a particular garage exceeds a specific threshold. The alerting

    query will use garage threshold values as a static reference. The threshold values are stored in Amazon S3.

    What is the MOST operationally efficient solution that meets these requirements?

    A. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Create an Amazon Kinesis Data Analytics application that uses the same delivery stream as an input source. Create a reference data source in Kinesis Data Analytics to temporarily store the threshold values from Amazon S3 and to compare the number of vehicles in a particular garage to the corresponding threshold value. Configure an AWS Lambda function to publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold.

    B. Use an Amazon Kinesis data stream to collect the data. Use an Amazon Kinesis Data Firehose delivery stream to deliver the data to Amazon Redshift. Create another Kinesis data stream to temporarily store the threshold values from Amazon S3. Send the delivery stream and the second data stream to Amazon Kinesis Data Analytics to compare the number of vehicles in a particular garage to the corresponding threshold value. Configure an AWS Lambda function to publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold.

    C. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Automatically initiate an AWS Lambda function that queries the data in Amazon Redshift. Configure the Lambda function to compare the number of vehicles in a particular garage to the corresponding threshold value from Amazon S3. Configure the Lambda function to also publish an Amazon Simple Notification Service (Amazon SNS) notification if the number of vehicles exceeds the threshold.

    D. Use an Amazon Kinesis Data Firehose delivery stream to collect the data and to deliver the data to Amazon Redshift. Create an Amazon Kinesis Data Analytics application that uses the same delivery stream as an input source. Use Kinesis Data Analytics to compare the number of vehicles in a particular garage to the corresponding threshold value that is stored in a table as an in-application stream. Configure an AWS Lambda function as an output for the application to publish an Amazon Simple Queue Service (Amazon SQS) notification if the number of vehicles exceeds the threshold.

  • Question 65:

    A data analytics specialist needs to prepare an inventory report for a company's online store. The data for the report is contained in MySQL databases, PostgreSQL databases, Amazon DynamoDB tables, and Amazon S3 buckets. How can the data analytics specialist prepare the report with the LEAST operational overhead?

    A. Use an AWS Glue crawler to catalog the databases. Configure an AWS Glue ETL job to transfer the required data from the databases to Amazon S3. Query the data in Amazon Athena to generate the report.

    B. Launch an Amazon EMR cluster to transfer the required data from the databases to Amazon S3. Query the data in Amazon Athena to generate the report.

    C. Deploy Amazon Athena data source connectors for MySQL, PostgreSQL, and DynamoDB. Use Amazon Athena Federated Query to generate the report.

    D. Use AWS Database Migration Service (AWS DMS) to transfer the required data from the databases to Amazon S3. Query the data in Amazon Athena to generate the report.

  • Question 66:

    A large ecommerce company uses Amazon DynamoDB with provisioned read capacity and auto scaled write capacity to store its product catalog. The company uses Apache HiveQL statements on an Amazon EMR cluster to query the DynamoDB table. After the company announced a sale on all of its products, wait times for each query have increased. The data analyst has determined that the longer wait times are being caused by throttling when querying the table.

    Which solution will solve this issue?

    A. Increase the size of the EMR nodes that are provisioned.

    B. Increase the number of EMR nodes that are in the cluster.

    C. Increase the DynamoDB table's provisioned write throughput.

    D. Increase the DynamoDB table's provisioned read throughput.

  • Question 67:

    A large media company is looking for a cost-effective storage and analysis solution for its daily media recordings formatted with embedded metadata. Daily data sizes range between 10-12 TB with stream analysis required on timestamps, video resolutions, file sizes, closed captioning, audio languages, and more. Based on the analysis, processing the datasets is estimated to take between 30-180 minutes depending on the underlying framework selection. The analysis will be done by using business intelligence (BI) tools that can be connected to data sources with AWS or Java Database Connectivity (JDBC) connectors.

    Which solution meets these requirements?

    A. Store the video files in Amazon DynamoDB and use AWS Lambda to extract the metadata from the files and load it to DynamoDB. Use DynamoDB to provide the data to be analyzed by the BI tools.

    B. Store the video files in Amazon S3 and use AWS Lambda to extract the metadata from the files and load it to Amazon S3. Use Amazon Athena to provide the data to be analyzed by the BI tools.

    C. Store the video files in Amazon DynamoDB and use Amazon EMR to extract the metadata from the files and load it to Apache Hive. Use Apache Hive to provide the data to be analyzed by the BI tools.

    D. Store the video files in Amazon S3 and use AWS Glue to extract the metadata from the files and load it to Amazon Redshift. Use Amazon Redshift to provide the data to be analyzed by the BI tools.

  • Question 68:

    An invoice tracking application stores invoice images within an Amazon S3 bucket. After invoice images are uploaded, they are accessed often by applications users for 30 days. After 30 days, the invoice images are rarely accessed. The application guarantees uploaded images will never be deleted and will be immediately available upon request by users. The application has 1 million users and 20,000 read requests each second during peak usage.

    Which combination of storage solutions MOST cost-effectively meet these requirements? (Choose two.)

    A. Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 Standard-Infrequent Access (S3 Standard-IA) storage class 30 days after upload.

    B. Create one S3 key prefix for each user in the S3 bucket and store the invoice images under the user-specific prefix.

    C. Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 Glacier Instant Retrieval storage class 30 days after upload.

    D. Store the invoice images by using the S3 Standard storage class. Apply a lifecycle policy to transition the images to the S3 One Zone-Infrequent Access (S3 One Zone-IA) storage class 30 days after upload.

    E. Create one S3 key prefix for each day in the S3 bucket and store the invoice images under the upload date-specific prefix.

  • Question 69:

    An educational technology company is running an online assessment application that allows thousands of students to concurrently take assessments on the company's platform. The application uses a combination of relational databases running on an Amazon Aurora PostgreSQL DB cluster and Amazon DynamoDB tables for storing data. Users reported issues with application performance during a recent large-scale online assessment. As a result, the company wants to design a solution that captures metrics from all databases in a centralized location and queries the metrics to identify issues with performance.

    How can this solution be designed with the LEAST operational overhead?

    A. Configure AWS Database Migration Service (AWS DMS) to copy the database logs to an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena.

    B. Configure an Amazon CloudWatch metric stream with an Amazon Kinesis Firehose delivery stream destination that stores the data in an Amazon S3 bucket. Schedule an AWS Glue crawler to periodically populate an AWS Glue table. Query the AWS Glue table with Amazon Athena.

    C. Create an Apache Kafka cluster on Amazon EC2. Configure a Java Database Connectivity (JDBC) connector for Kafka Connect on each database to capture and stream the logs to a single Amazon CloudWatch log group. Query the CloudWatch log group with Amazon Athena.

    D. Install a server on Amazon EC2 to capture logs from Amazon RDS and DynamoDB by using Java Database Connectivity (JDBC) connectors. Stream the logs to an Amazon Kinesis Data Firehose delivery stream that stores the data in an Amazon S3 bucket. Query the output logs in the S3 bucket by using Amazon Athena.

  • Question 70:

    A data architect at a large financial institution is building a data platform on AWS with the intent of implementing fraud detection by identifying duplicate customer accounts. The fraud detection algorithm will run in a batch mode to identify when a newly created account matches one for a user that was previously fraudulent.

    Which approach MOST cost-effectively meets these requirements?

    A. Build a custom deduplication script by using Apache Spark on an Amazon EMR cluster. Use PySpark to compare the data frames that represent the new customers and the fraudulent customer set to identify matches.

    B. Load the data to an Amazon Redshift cluster. Use custom SQL to build deduplication logic.

    C. Load the data to Amazon S3 to form the basis of a data lake. Use Amazon Athena to build a deduplication script.

    D. Load the data to Amazon S3. Use the AWS Glue FindMatches transform to implement deduplication logic.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DAS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.