DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 251:

A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values:
The table currently contains billions of items. The table is partitioned by RidelD and uses TripStartTime as the sort key. The company wants to use the data to build a personal interface to give drivers the ability to view the rides that each driver has completed, based on RideStatus. The solution must access the necessary data without scanning the entire table.
Which solution will meet these requirements?
A. Create a local secondary index (LSI) on DriverlD.
B. Create a global secondary index (GSI) that uses RiderlD as the partition key and RideStatus as the sort key.
C. Create a global secondary index (GSI) that uses DriverlD as the partition key and RideStatus as the sort key.
D. Create a filter expression that uses RiderlD and RideStatus.

C. Create a global secondary index (GSI) that uses DriverlD as the partition key and RideStatus as the sort key.
Explanation
To let drivers efficiently query only their completed rides, you need a global secondary index (GSI) with DriverlD as the partition key (so queries can be targeted per driver) and RideStatus as the sort key (so you can query for "Completed" rides without scanning the full table). This avoids costly scans and supports fast, targeted lookups at scale.
Question 252:

A company has a data warehouse that contains a table that is named Sales. The company stores the table in Amazon Redshift The table includes a column that is namedcity_name.
The company wants to query the table to find all rows that have a city_name that starts with "San" or "El."
Which SQL query will meet this requirement?
A. Select * from Sales where city_name - '$(San|EI)";
B. Select * from Sales where city_name ~ `^(San|El)*';
C. Select * from Sales where city_name ~'$(San&El)*';
D. Select * from Sales where city_name ~ `^(San&El)*';

B. Select * from Sales where city_name ~ `^(San|El)*';
Question 253:

A company stores a large dataset in an Amazon S3 bucket. A data engineer frequently runs complex queries on the dataset by using Amazon Athena. The data engineer needs to optimize query performance and optimize costs for queries that are run multiple times with the same parameters.
Which solution will meet these requirements?
A. Convert the dataset to JSON format before running Athena queries.
B. Use Amazon EMR to pre-process the data before running Athena queries.
C. Configure query result reuse settings in the Athena workgroup.
D. Use Amazon Redshift Spectrum to query the data in Amazon S3.

C. Configure query result reuse settings in the Athena workgroup.
Question 254:

A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on the data to make the data accessible to business partners.
The company needs a solution that will give multiple business partners the ability to run SQL queries on the central data lake during normal business hours.
Which solution will meet these requireme nts MOST cost-effectively?
A. Use a provisioned Amazon EMR cluster after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepa red data into Amazon Redshift Serverless.
B. Use an AWS Glue Flex Job after normal business hours to process the previous day's data, apply all necessary transformations. and load the prepared data into Amazon Redshift Serverless.
C. Use an AWS Lambda function after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.
D. Use an AWS Glue Flex job after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepared data into an Amazon Redshift provisioned cluster.

B. Use an AWS Glue Flex Job after normal business hours to process the previous day's data, apply all necessary transformations. and load the prepared data into Amazon Redshift Serverless.
Question 255:

A data pipeline has three stages. The second stage must run only after the first stage succeeds. If the second stage fails, the pipeline must retry twice and then send a notification before stopping.
Which service should a data engineer use to coordinate this workflow with built-in state transitions and error handling?
A. AWS Step Functions
B. AWS Glue crawler
C. Amazon S3 Event Notifications only
D. Amazon Redshift materialized views

A. AWS Step Functions
Explanation
Step Functions state machines coordinate ordered steps and provide retry, catch, and branching behavior for workflows. A Glue crawler updates catalog metadata but does not coordinate multi-stage error handling. S3 Event Notifications can trigger a target when an object changes, but they do not model the whole state machine. Redshift materialized views optimize queries and are not workflow orchestrators.
Question 256:

Which AWS service most cost-effectively orchestrates an AWS Glue ETL pipeline that crawls Microsoft SQL Server and loads data to S3?
A. AWS Step Functions
B. AWS Glue workflows
C. AWS Glue Studio
D. Amazon MWAA

B. AWS Glue workflows
Question 257:

A company is building a governed data lake on AWS. The solution must store raw and curated datasets in object storage, support SQL queries without provisioning database servers, and enforce centralized fine-grained access policies.
Which combination of services should the data engineer choose? (Choose three.)
A. Amazon S3
B. Amazon Athena
C. AWS Lake Formation
D. AWS Shield
E. Amazon CloudFront
F. Amazon ECR

A. Amazon S3
B. Amazon Athena
C. AWS Lake Formation
Explanation
Amazon S3 is the object storage foundation for the data lake. Athena provides serverless SQL access to S3 data. Lake Formation centralizes data lake permissions and fine-grained governance. Shield and CloudFront address edge protection and content delivery, not data lake query governance. ECR stores container images.
Question 258:

A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.
The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.
Which solution will meet this requirement with the LEAST coding effort?
A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.
B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
C. Enable job metrics for the ETL jobs to help keep track of processed objects in Amazon CloudWatch.
D. configure the ETL jobs to delete processed objects from Amazon S3 after each run.

B. Enable job bookmarks for the ETL jobs to update the state after a run to keep track of previously processed data.
Question 259:

A manufacturing company is setting up an IoT monitoring system that generates large, complex data streams. The company wants to store the data in an Amazon S3 data lake for real-time and historical analysis. The company needs a solution that can process data quickly, provide short query times, and use resources efficiently without slowing down data ingestion.
The solution must use a Spark streaming extract, transform, and load (ETL) job on Amazon EMR that is configured to write data to an Iceberg table.
Which solution will meet these requirements?
A. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with copy on write (CoW) mode. Enable the AWS Glue Data Catalog compaction optimizer.
B. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) to ingest the data. Configure the Iceberg table with copy on write (CoW) mode. Schedule an AWS Glue job for compaction to optimize the Iceberg table.
C. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with merge on read (MoR) mode. Enable the AWS Glue Data Catalog compaction optimizer.
D. Use Amazon Data Firehose to ingest the data. Use an AWS Lambda function to handle nested schema. Write the data to an Iceberg table with merge on read (MoR) mode in an Amazon S3 table bucket.

C. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with merge on read (MoR) mode. Enable the AWS Glue Data Catalog compaction optimizer.
Question 260:

A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.
B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.
C. Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.
D. Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.

B. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.
Explanation
This solution will meet the requirements with the least operational overhead because it uses the AWS Glue Data Catalog as the central metadata repository for data sources that run in the AWS Cloud. The AWS Glue Data Catalog is a fully managed service that provides a unified view of your data assets across AWS and on-premises data sources. It stores the metadata of your data in tables, partitions, and columns, and enables you to access and query your data using various AWS services, such as Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum. You can use AWS Glue crawlers to connect to multiple data stores, such as Amazon RDS, Amazon Redshift, and Amazon S3, and to update the Data Catalog with metadata changes. AWS Glue crawlers can automatically discover the schema and partition structure of your data, and create or update the corresponding tables in the Data Catalog. You can schedule the crawlers to run periodically to update the metadata catalog, and configure them to detect changes to the source metadata, such as new columns, tables, or partitions.
The other options are not optimal for the following reasons:
Option A: Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically. This option is not recommended, as it would require more operational overhead to create and manage an Amazon Aurora database as the data catalog, and to write and maintain AWS Lambda functions to gather and update the metadata information from multiple sources. Moreover, this option would not leverage the benefits of the AWS Glue Data Catalog, such as data cataloging, data transformation, and data governance.
Option C: Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically. This option is also not recommended, as it would require more operational overhead to create and manage an Amazon DynamoDB table as the data catalog, and to write and maintain AWS Lambda functions to gather and update the metadata information from multiple sources. Moreover, this option would not leverage the benefits of the AWS Glue Data Catalog, such as data cataloging, data transformation, and data governance.
Option D: Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. UseAWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog. This option is not optimal, as it would require more manual effort to extract the schema for Amazon RDS and Amazon Redshift sources, and to build the Data Catalog. This option would not take advantage of the AWS Glue crawlers' ability to automatically discover the schema and partition structure of your data from various data sources, and to create or update the corresponding tables in the Data Catalog.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 251:

Question 252:

Question 253:

Question 254:

Question 255:

Question 256:

Question 257:

Question 258:

Question 259:

Question 260:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 251:

Question 252:

Question 253:

Question 254:

Question 255:

Question 256:

Question 257:

Question 258:

Question 259:

Question 260:

Related Exams:

Tips on How to Prepare for the Exams