DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 251:

    A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values:

    The table currently contains billions of items. The table is partitioned by RidelD and uses TripStartTime as the sort key. The company wants to use the data to build a personal interface to give drivers the ability to view the rides that each driver has completed, based on RideStatus. The solution must access the necessary data without scanning the entire table.

    Which solution will meet these requirements?

    A. Create a local secondary index (LSI) on DriverlD.
    B. Create a global secondary index (GSI) that uses RiderlD as the partition key and RideStatus as the sort key.
    C. Create a global secondary index (GSI) that uses DriverlD as the partition key and RideStatus as the sort key.
    D. Create a filter expression that uses RiderlD and RideStatus.

  • Question 252:

    A company has a data warehouse that contains a table that is named Sales. The company stores the table in Amazon Redshift The table includes a column that is namedcity_name.

    The company wants to query the table to find all rows that have a city_name that starts with "San" or "El."

    Which SQL query will meet this requirement?

    A. Select * from Sales where city_name - '$(San|EI)";
    B. Select * from Sales where city_name ~ `^(San|El)*';
    C. Select * from Sales where city_name ~'$(San&El)*';
    D. Select * from Sales where city_name ~ `^(San&El)*';

  • Question 253:

    A company stores a large dataset in an Amazon S3 bucket. A data engineer frequently runs complex queries on the dataset by using Amazon Athena. The data engineer needs to optimize query performance and optimize costs for queries that are run multiple times with the same parameters.

    Which solution will meet these requirements?

    A. Convert the dataset to JSON format before running Athena queries.
    B. Use Amazon EMR to pre-process the data before running Athena queries.
    C. Configure query result reuse settings in the Athena workgroup.
    D. Use Amazon Redshift Spectrum to query the data in Amazon S3.

  • Question 254:

    A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on the data to make the data accessible to business partners.

    The company needs a solution that will give multiple business partners the ability to run SQL queries on the central data lake during normal business hours.

    Which solution will meet these requireme nts MOST cost-effectively?

    A. Use a provisioned Amazon EMR cluster after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepa red data into Amazon Redshift Serverless.
    B. Use an AWS Glue Flex Job after normal business hours to process the previous day's data, apply all necessary transformations. and load the prepared data into Amazon Redshift Serverless.
    C. Use an AWS Lambda function after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.
    D. Use an AWS Glue Flex job after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

  • Question 255:

    A data pipeline has three stages. The second stage must run only after the first stage succeeds. If the second stage fails, the pipeline must retry twice and then send a notification before stopping.

    Which service should a data engineer use to coordinate this workflow with built-in state transitions and error handling?

    A. AWS Step Functions
    B. AWS Glue crawler
    C. Amazon S3 Event Notifications only
    D. Amazon Redshift materialized views

  • Question 256:

    Which AWS service most cost-effectively orchestrates an AWS Glue ETL pipeline that crawls Microsoft SQL Server and loads data to S3?

    A. AWS Step Functions
    B. AWS Glue workflows
    C. AWS Glue Studio
    D. Amazon MWAA

  • Question 257:

    A company is building a governed data lake on AWS. The solution must store raw and curated datasets in object storage, support SQL queries without provisioning database servers, and enforce centralized fine-grained access policies.

    Which combination of services should the data engineer choose? (Choose three.)

    A. Amazon S3
    B. Amazon Athena
    C. AWS Lake Formation
    D. AWS Shield
    E. Amazon CloudFront
    F. Amazon ECR

  • Question 258:

    A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.

    The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.

    Which solution will meet this requirement with the LEAST coding effort?

    A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
    B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
    C. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
    D. configure the ETL jobs to delete processed objects from Amazon S3 after each run.

  • Question 259:

    A manufacturing company is setting up an IoT monitoring system that generates large, complex data streams. The company wants to store the data in an Amazon S3 data lake for real-time and historical analysis. The company needs a solution that can process data quickly, provide short query times, and use resources efficiently without slowing down data ingestion.

    The solution must use a Spark streaming extract, transform, and load (ETL) job on Amazon EMR that is configured to write data to an Iceberg table.

    Which solution will meet these requirements?

    A. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with copy on write (CoW) mode. Enable the AWS Glue Data Catalog compaction optimizer.
    B. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Configure the Iceberg table with copy on write (CoW) mode. Schedule an AWS Glue job for compaction to optimize the Iceberg table.
    C. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with merge on read (MoR) mode. Enable the AWS Glue Data Catalog compaction optimizer.
    D. Use Amazon Data Firehose to ingest the data. Use an AWS Lambda function to handle nested schema. Write the data to an Iceberg table with merge on read (MoR) mode in an Amazon S3 table bucket.

  • Question 260:

    A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.

    The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.
    B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.
    C. Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.
    D. Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.