DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 141:

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.
Which solution will meet these requirements with the LEAST operational overhead?
A. Confiqure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe-Write a SQL SELECT statement on the dataframe to query the required column.
B. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
C. Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.
D. Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

B. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
Explanation
Option B is the best solution to meet the requirements with the least operational overhead because S3 Select is a feature that allows you to retrieve only a subset of data from an S3 object by using simple SQL expressions. S3 Select works on objects stored in CSV, JSON, or Parquet format. By using S3 Select, you can avoid the need to download and process the entire S3 object, which reduces the amount of data transferred and the computation time. S3 Select is also easy to use and does not require any additional services or resources.
Option A is not a good solution because it involves writing custom code and configuring an AWS Lambda function to load data from the S3 bucket into a pandas dataframe and query the required column. This option adds complexity and latency to the data retrieval process and requires additional resources and configuration. Moreover, AWS Lambda has limitations on the execution time, memory, and concurrency, which may affect the performance and reliability of the data retrieval process.
Option C is not a good solution because it involves creating and running an AWS Glue DataBrew project to consume the S3 objects and query the required column. AWS Glue DataBrew is a visual data preparation tool that allows you to clean, normalize, and transform data without writing code. However, in this scenario, the data is already in Parquet format, which is a columnar storage format that is optimized for analytics.
Therefore, there is no need to use AWS Glue DataBrew to prepare the data. Moreover, AWS Glue DataBrew adds extra time and cost to the data retrieval process and requires additional resources and configuration. Option D is not a good solution because it involves running an AWS Glue crawler on the S3 objects and using a SQL SELECT statement in Amazon Athena to query the required column. An AWS Glue crawler is a service that can scan data sources and create metadata tables in the AWS Glue Data Catalog. The Data Catalog is a central repository that stores information about the data sources, such as schema, format, and location.Amazon Athena is a serverless interactive query service that allows you to analyze data in S3 using standard SQL. However, in this scenario, the schema and format of the data are already known and fixed, so there is no need to run a crawler to discover them. Moreover, running a crawler and using Amazon Athena adds extra time and cost to the data retrieval process and requires additional services and configuration.
References:
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide
S3 Select and Glacier Select - Amazon Simple Storage Service
AWS Lambda - FAQs
What Is AWS Glue DataBrew? - AWS Glue DataBrew Populating the AWS Glue Data Catalog - AWS Glue.
What is Amazon Athena? - Amazon Athena
Question 142:

An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.
A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use the FindMatches feature of AWS Glue to remove duplicate records.
B. Use non-Windows functions in Amazon Athena to remove duplicate records.
C. Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
D. Use the global tables feature of Amazon DynamoDB to prevent duplicate data.

A. Use the FindMatches feature of AWS Glue to remove duplicate records.
Question 143:

A company has a data processing pipeline that includes several dozen steps. The data processing pipeline needs to send alerts in real time when a step fails or succeeds. The data processing pipeline uses a combination of Amazon S3 buckets, AWS Lambda functions, and AWS Step Functions state machines.
A data engineer needs to create a solution to monitor the entire pipeline.
Which solution will meet these requirements?
A. Configure the Step Functions state machines to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.
B. Configure the AWS Lambda functions to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.
C. Use AWS CloudTrail to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications when a state machine fails to run or succeeds to run.
D. Configure an Amazon EventBridge rule to react when the execution status of a state machine changes. Configure the rule to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications.

D. Configure an Amazon EventBridge rule to react when the execution status of a state machine changes. Configure the rule to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications.
Explanation
Amazon EventBridge can monitor the execution status of AWS Step Functions state machines and trigger specific actions based on state changes, such as when a state machine succeeds or fails. By configuring an EventBridge rule to capture these execution status changes and forward the details to an Amazon SNS topic, real-time notifications can be sent to alert users. This solution ensures a scalable and event-driven approach to monitoring the entire data processing pipeline in real time.
Question 144:

A company is uploading log files from on-premises servers to an Amazon S3 bucket. The company needs to validate that the logs from the on-premises server are the same as the logs that are stored in the S3 bucket.
Which solution will meet this requirement?
A. Use the AWS SDK to automatically compute CRC32 checksums during the upload. Store the checksums in S3 object metadata.
B. Create an AWS Lambda function to calculate SHA-256 checksums. Store results in a separate metadata table. Validate the logs after the upload.
C. Enable S3 Object Lock in compliance mode on the S3 bucket. Upload the objects to the bucket.
D. After uploading the objects to the S3 bucket, enable S3 Object Lock in governance mode on the S3 objects.

A. Use the AWS SDK to automatically compute CRC32 checksums during the upload. Store the checksums in S3 object metadata.
Question 145:

A company wants to use machine learning (ML) to perform analytics on data that is in an Amazon S3 data lake. The company has two data transformation requirements that will give consumers within the company the ability to create reports.
The company must perform daily transformations on 300 GB of data that is in a variety format that must arrive in Amazon S3 at a scheduled time. The company must perform one-time transformations of terabytes of archived data that is in the S3 data lake. The company uses Amazon Managed workflows for Apache Airflow (Amazon MWAA) Directed Acyclic Graphs (DAGs) to orchestrate processing.
Which combination of tasks should the company schedule in the Amazon MWAA DAGs to meet these requirements MOST cost-effectively? (Choose two.)
A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
B. For daily incoming data, use Amazon Athena to scan and identify the schema.
C. For daily incoming data, use Amazon Redshift to perform transformations.
D. For daily and archived data, use Amazon EMR to perform data transformations.
E. For archived data, use Amazon SageMaker to perform data transformations.

A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
D. For daily and archived data, use Amazon EMR to perform data transformations.
Question 146:

A company uses Amazon SageMaker AI for its machine learning (ML) workflows. The company is organized into several project groups that use sensitive data. The company needs to give the project groups the ability to discover available datasets across different AWS accounts. The solution must maintain access controls and track all data access for compliance purposes.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon SageMaker Assets to publish, discover, and request access to datasets through the asset catalog with approval workflows that track data access.
B. Set up Amazon SageMaker Feature Store with cross-account access policies to automatically share data between AWS accounts without requiring approval workflows.
C. Set up IAM roles for each project group with permissions to access all datasets across all AWS accounts. Use AWS CloudTrail to record data access activity.
D. Create separate Amazon SageMaker Studio domains for each project group with isolated environments and no ability to share data between domains.

A. Use Amazon SageMaker Assets to publish, discover, and request access to datasets through the asset catalog with approval workflows that track data access.
Question 147:

A data engineer needs to create a new empty table in Amazon Athena that has the same schema as an existing table named old-table.
Which SQL statement should the data engineer use to meet this requirement?
A. CREATE TABLE new_table AS SELECT * FROM old_tables;
B. INSERT INTO new_table SELECT * FROM old_table;
C. CREATE TABLE new_table (LIKE old_table);
D. CREATE TABLE new_table AS (SELECT * FROM old_table) WITH NO DATA;

D. CREATE TABLE new_table AS (SELECT * FROM old_table) WITH NO DATA;
Explanation
Problem Analysis:
The goal is to create a new empty table in Athena with the same schema as an existing table (old_table).
The solution must avoid copying any data.
Key Considerations:
CREATE TABLE AS (CTAS)is commonly used in Athena for creating new tables based on an existing table.
Adding the WITH NO DATA clause ensures only the schema is copied, without transferring any data.
Solution Analysis:
Option A: Copies both schema and data. Does not meet the requirement for an empty table.
Option B: Inserts data into an existing table, which does not create a new table.
Option C: Creates an empty table but does not copy the schema.
Option D: Creates a new table with the same schema and ensures it is empty by using WITH NO DATA.
Final Recommendation:
Use D. CREATE TABLE new_table AS (SELECT * FROM old_table) WITH NO DATA to create an empty table with the same schema.
References:
Athena CTAS Queries
CREATE TABLE Statement in Athena
Question 148:

A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.
The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the data, and filter the results by a numeric value of a column.
Which solution will meet these requirements with the LEAST development effort?
A. Use AWS Glue Python jobs to read and transform the CSV files.
B. Use an AWS Glue custom crawler to read and transform the CSV files.
C. Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.
D. Use AWS Glue DataBrew recipes to read and transform the CSV files.

D. Use AWS Glue DataBrew recipes to read and transform the CSV files.
Explanation
The requirement involves transforming CSV files by renaming columns, removing rows, and other operations with minimal development effort. AWS Glue DataBrew is the best solution here because it allows you to visually create transformation recipes without writing extensive code.
Option D: Use AWS Glue DataBrew recipes to read and transform the CSV files.DataBrew provides a visual interface where you can build transformation steps (e.g., renaming columns, filtering rows, creating new columns, etc.) as a "recipe" that can be applied to datasets, making it easy to handle complex transformations on CSV files with minimal coding.
Other options (A, B, C) involve more manual development and configuration effort (e.g., writing Python jobs or creating custom workflows in Glue) compared to the low-code/no-code approach of DataBrew.
References:
AWS Glue DataBrew Documentation
Question 149:

A data engineer is building a solution to detect sensitive information that is stored in a data lake across multiple Amazon S3 buckets. The solution must detect personally identifiable information (Pll) that is in a proprietary data format.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use the AWS Glue Detect Pll transform with specific patterns.
B. Use Amazon Made with managed data identifiers.
C. Use an AWS Lambda function with custom regular expressions.
D. Use Amazon Athena with a SQL query to match the custom formats.

A. Use the AWS Glue Detect Pll transform with specific patterns.
Explanation
The AWS Glue Detect Pll transform is a built-in feature that can automatically identify personally identifiable information (Pll) using predefined patterns or custom regular expressions. It works directly within AWS Glue jobs on proprietary data formats and integrates with the Glue Data Catalog, delivering a managed, low-overhead solution compared to building and maintaining custom Lambda or SQL-based detection.
Question 150:

A company uses Amazon Redshift as a data warehouse solution. One of the datasets that the company stores in Amazon Redshift contains data for a vendor.
Recently, the vendor asked the company to transfer the vendor's data into the vendor's Amazon S3 bucket once each week.
Which solution will meet this requirement?
A. Create an AWS Lambda function to connect to the Redshift data warehouse. Configure the Lambda function to use the Redshift COPY command to copy the required data to the vendor's S3 bucket on a schedule.
B. Create an AWS Glue job to connect to the Redshift data warehouse. Configure the AWS Glue job to use the Redshift UNLOAD command to load the required data to the vendor's S3 bucket on a schedule.
C. Use the Amazon Redshift data sharing feature. Set the vendor's S3 bucket as the destination. Configure the source to be as a custom SQL query that selects the required data.
D. Configure Amazon Redshift Spectrum to use the vendor's S3 bucket as destination. Enable dataquerying in both directions.

B. Create an AWS Glue job to connect to the Redshift data warehouse. Configure the AWS Glue job to use the Redshift UNLOAD command to load the required data to the vendor's S3 bucket on a schedule.
Explanation
An AWS Glue job can connect to your Amazon Redshift cluster and execute the UNLOAD command to export the specified vendor data directly into the vendor's S3 bucket. Scheduling the Glue job to run weekly requires minimal operational effort, and UNLOAD is optimized for exporting large result sets efficiently.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 141:

Question 142:

Question 143:

Question 144:

Question 145:

Question 146:

Question 147:

Question 148:

Question 149:

Question 150:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 141:

Question 142:

Question 143:

Question 144:

Question 145:

Question 146:

Question 147:

Question 148:

Question 149:

Question 150:

Related Exams:

Tips on How to Prepare for the Exams