DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :263 Q&As
  • Last Updated
    :Jan 12, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 1:

    A gaming company uses AWS Glue to perform read and write operations on Apache Iceberg tables for real-time streaming data. The data in the Iceberg tables is in Apache Parquet format. The company is experiencing slow query performance.

    Which solutions will improve query performance? (Choose two.)

    A. Use AWS Glue Data Catalog to generate column-level statistics for the Iceberg tables on a schedule.
    B. Use AWS Glue Data Catalog to automatically compact the Iceberg tables.
    C. Use AWS Glue Data Catalog to automatically optimize indexes for the Iceberg tables.
    D. Use AWS Glue Data Catalog to enable copy-on-write for the Iceberg tables.
    E. Use AWS Glue Data Catalog to generate views for the Iceberg tables.

  • Question 2:

    A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original company modifies a query that aggregates sales revenue data to join sales tables from both companies. The sales table for the first company is named Table S1. The sales table for the second company is named Table S2. Table S1 contains 10 billion records. Table S2 contains 900 million records. The query becomes slow after the modification. A data engineer must improve the query performance.

    Which solutions will meet these requirements? (Choose two.)

    A. Use the KEY distribution style for both sales tables. Select a low cardinality column to use for the join.
    B. Use the KEY distribution style for both sales tables. Select a high cardinality column to use for the join.
    C. Use the EVEN distribution style for Table S1. Use the ALL distribution style for Table S2.
    D. Use the Amazon Redshift query optimizer to review and select optimizations to implement.
    E. Use Amazon Redshift Advisor to review and select optimizations to implement.

  • Question 3:

    A company uses a data stream in Amazon Kinesis Data Streams to collect transactional data from multiple sources. The company uses an AWS Glue extract, transform, and load (ETL) pipeline to look for outliers in the data from the stream. When the workflow detects an outlier, it sends a notification to an Amazon Simple Notification Service (Amazon SNS) topic. The SNS topic initiates a second workflow to retrieve logs for the outliers and stores the logs in an Amazon S3 bucket. The company experiences delays in the notifications to the SNS topic during periods when the data stream is processing a high volume of data. When the company examines Amazon CloudWatch logs, the company notices a high value for the glue.driver. BlockManager.disk.diskSpaceUsed_MB metric when the traffic is high. The company must resolve this issue.

    Which solution will meet this requirement with the LEAST operational effort?

    A. Increase the number of data processing units (DPUs) in AWS Glue ETL jobs.
    B. Use Amazon EMR to manage the ETL pipeline instead of AWS Glue.
    C. Use AWS Step Functions to orchestrate a parallel workflow state.
    D. Enable auto scaling for the AWS Glue ETL jobs.

  • Question 4:

    A company stores information about its subscribers in an Amazon S3 bucket. The company runs an analysis every time a subscriber ends their subscription. The company uses AWS Lambda functions to respond to events from the S3 bucket by performing analyses.

    The Lambda functions clean data from the S3 bucket and initiate an AWS Glue workflow. The Lambda functions have 128 MB of memory and 512 MB of ephemeral storage. The Lambda functions have a timeout of 15 seconds. All three functions successfully finish running. However, CPU usage is often near 100%, which causes slow performance. The company wants to improve the performance of the functions and reduce the total runtime of the pipeline.

    Which solution will meet these requirements?

    A. Increase the memory of the Lambda functions to 512 MB.
    B. Increase the number of retries by using the Maximum Retry Attempts setting.
    C. Configure the Lambda functions to run in the company's VPC.
    D. Increase the timeout value for the Lambda functions from 15 seconds to 30 seconds.

  • Question 5:

    A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values:

    RidelD RiderlD DriverlD RideStatus n-ipStartTime TripEndTime

    XA1231 AXEF1 BN123 Active 2025-02-11 12:23:34:00 NULL

    XA1232 AXEF2 BN 124 Completed 2025-02-11 08:36:12:00 202542-11 08:55:02:00

    The table currently contains billions of items. The table is partitioned by RidelD and uses TripStartTime as the sort key. The company wants to use the data to build a personal interface to give drivers the ability to view the rides that each driver has completed, based on RideStatus. The solution must access the necessary data without scanning the entire table.

    Which solution will meet these requirements?

    A. Create a local secondary index (LSI) on DriverlD.
    B. Create a global secondary index (GSI) that uses RiderlD as the partition key and RideStatus as the sort key.
    C. Create a global secondary index (GSI) that uses DriverlD as the partition key and RideStatus as the sort key.
    D. Create a filter expression that uses RiderlD and RideStatus.

  • Question 6:

    A data engineer is building a new data pipeline that stores metadata in an Amazon DynamoDB table. The data engineer must ensure that all items that are older than a specified age are removed from the DynamoDB table daily.

    Which solution will meet this requirement with the LEAST configuration effort?

    A. Enable DynamoDB TTL on the DynamoDB table. Adjust the application source code to set the TTL attribute appropriately.
    B. Create an Amazon EventBridge rule that uses a daily cron expression to trigger an AWS Lambda function to delete items that are older than the specified age.
    C. Add a lifecycle configuration to the DynamoDB table that deletes items that are older than the specified age.
    D. Create a DynamoDB stream that has an AWS Lambda function that reacts to data modifications. Configure the Lambda function to delete items that are older than the specified age.

  • Question 7:

    A data engineer is building a solution to detect sensitive information that is stored in a data lake across multiple Amazon S3 buckets. The solution must detect personally identifiable information (Pll) that is in a proprietary data format.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use the AWS Glue Detect Pll transform with specific patterns.
    B. Use Amazon Made with managed data identifiers.
    C. Use an AWS Lambda function with custom regular expressions.
    D. Use Amazon Athena with a SQL query to match the custom formats.

  • Question 8:

    A data engineer must implement a data cataloging solution to track schema changes in an Amazon Redshift table. Which solution will meet these requirements?

    A. Schedule an AWS Glue crawler to run every day on the table by using the Java Database Connectivity (JDBC) driver. Configure the crawler to update an AWS Glue Data Catalog.
    B. Use AWS DataSync to log the table metadata to an AWS Glue Data Catalog. Use an AWS Glue crawler to update the Data Catalog every day.
    C. Use the AWS Schema Conversion Tool (AWS SCT) to log the table metadata to an Apache Hive metastore. Use Amazon EventBridge Scheduler to update the metastore every day.
    D. Schedule an AWS Glue crawler to run every day on the table. Configure the crawler to update an Apache Hive metastore.

  • Question 9:

    A company adds new data to a large CSV file in an Amazon S3 bucket every day. The file contains company sales data from the previous 5 years. The file currently includes more than 5,000 rows. The CSV file structure is shown below with sample data:

    ID SALE_DATE ITEM_SOLD SALE_PRICE SALES_REP STORE_NAME 01-Jan-2024 TV Terry Alaska 02-Jan-2024 DVD player Diego Boston

    The company needs to use Amazon Athena to run queries on the CSV file to fetch data from a specific time period. Which solution will meet this requirement MOST cost-effectively?

    A. Write an Apache Spark script to convert the CSV data to JSON format. Create an AWS Glue job to run the script every day. Catalog the JSON data in AWS Glue. Run the Athena queries on the JSON data.
    B. Use prefixes to partition the data in the S3 bucket. Use the SALE_DATE column to create a partition for each day. Catalog the data in AWS Glue and ensure that the partitions are added. Update the Athena queries to use the new partitions.
    C. Launch an Amazon EMR cluster. Specify AWS Glue Data Catalog as the default Apache Hive metastore. Use Trino (Presto) to run queries on the data.
    D. Create an Amazon RDS database. Create a table named SALES that matches the schema of the CSV file. Create an index on the SALE_DATE column. Create an AWS Lambda function to load the CSV data into the RDS database. Use S3 Event Notifications to invoke the Lambda function.

  • Question 10:

    A company wants to use Apache Spark jobs that run on an Amazon EMR cluster to process streaming data. The Spark jobs will transform and store the data in an Amazon S3 bucket. The company will use Amazon Athena to perform analysis. The company needs to optimize the data format for analytical queries.

    Which solutions will meet these requirements with the SHORTEST query times? (Choose two.)

    A. Use Avro format. Use AWS Glue Data Catalog to track schema changes.
    B. Use ORC format. Use AWS Glue Data Catalog to track schema changes.
    C. Use Apache Parquet format. Use an external Amazon DynamoDB table to track schema changes.
    D. Use Apache Parquet format. Use AWS Glue Data Catalog to track schema changes.
    E. Use ORC format. Store schema definitions in separate files in Amazon S3.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.