DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 291:

A company has a data pipeline that uses an Amazon RDS instance, AWS Glue jobs, and an Amazon S3 bucket. The RDS instance and AWS Glue jobs run in a private subnet of a VPC and in the same security group. A user made a change to the security group that prevents the AWS Glue jobs from connecting to the RDS instance. After the change, the security group contains a single rule that allows inbound SSH traffic from a specific IP address.
The company must resolve the connectivity issue.
Which solution will meet this requirement?
A. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the security group as the source.
B. Add an inbound rule that allows all TCP traffic on all UDP ports. Set the private IP address of the RDS instance as the source.
C. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the DNS name of the RDS instance as the source.
D. Replace the source of the existing SSH rule with the private IP address of the RDS instance. Create an outbound rule with the same source, destination, and protocol as the inbound SSH rule.

A. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the security group as the source.
Explanation
Since both the RDS instance and the Glue jobs share the same security group, allowing that security group as the source for inbound TCP traffic (on all ports) immediately restores connectivity from the Glue-attached ENIs to the database.
This single rule change requires no additional infrastructure and incurs no extra maintenance.
Question 292:

A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:
Which solution will meet this requirement with the LEAST coding effort?
A. Use AWS Glue DataBrew to read the files. Use the NEST TO ARRAY transformation to create the new column.
B. Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.
C. Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
D. Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

B. Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.
Explanation
The NEST TO MAP transformation allows you to combine multiple columns into a single column that contains a JSON object with key-value pairs. This is the easiest way to achieve the desired format for the physical address data, as you can simply select thecolumns to nest and specify the keys for each column.
The NEST TO ARRAY transformation creates a single column that contains an array of values, which is not the same as the JSON object format. The PIVOT transformation reshapes the data by creating new columns from unique values in a selected column, which is not applicable for this use case. Writing a Lambda function in Python requires more coding effort than using AWS Glue DataBrew, which provides a visual and interactive interface for data transformations.
Question 293:

An auditor needs to investigate who changed encryption settings for data pipeline resources last month.
The team wants a managed AWS service that records API activity for the account and supports audit review.
Which service should the team use as the primary source of API activity?
A. AWS CloudTrail
B. Amazon CloudWatch metrics
C. AWS Glue Data Catalog
D. Amazon S3 Inventory

A. AWS CloudTrail
Explanation
CloudTrail records AWS API calls and is the primary service for auditing who made account and resource changes. CloudWatch metrics track operational metrics but do not provide the same API event history.
Glue Data Catalog stores metadata about data assets. S3 Inventory lists object metadata and does not show who changed resource configuration.
Question 294:

A company stores raw clickstream data in an Amazon S3 bucket. The company needs a solution to process the data every day by using complex PySpark transformations that rely on custom internal libraries. After the data is transformed, the company must store the data in Amazon Redshift for analytics.
The solution must be highly scalable to handle large data workloads.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use AWS Glue Studio to build and schedule PySpark jobs. Configure an AWS Glue data connection that includes the custom libraries.
B. Use Amazon EC2 Auto Scaling groups with a custom AMI that contains the custom libraries to run a PySpark application.
C. Use Amazon EMR to run PySpark jobs. Use bootstrap actions to install the custom libraries
D. Use Amazon SageMaker Processing jobs to run PySpark code that uses native SageMaker libraries.

A. Use AWS Glue Studio to build and schedule PySpark jobs. Configure an AWS Glue data connection that includes the custom libraries.
Question 295:

A company is developing machine learning (ML) models. A data engineer needs to apply data quality rules to training data. The company stores the training data in an Amazon S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create an AWS Lambda function to check data quality and to raise exceptions in the code. Run the function when data is added to the S3 bucket. Create an Amazon CloudWatch alarm for exceptions in the code.
B. Create an AWS Glue DataBrew project for the data in the S3 bucket. Create a ruleset for the data quality rules. Create a profile job to run the data quality rules. Use Amazon EventBridge to run the profile job when data is added to the S3 bucket.
C. Create an Amazon EMR provisioned cluster. Add a Python open source data quality package to the EMR cluster. Use the Python package to write code for data quality rules and to copy the data from the S3 bucket to the EMR cluster. Copy the data from the S3 bucket to the EMR cluster. Run the data quality rules.
D. Create AWS Lambda functions to evaluate data quality rules. Use AWS Step Functions to orchestrate a workflow that publishes notifications when the data fails to meet data quality rules.

B. Create an AWS Glue DataBrew project for the data in the S3 bucket. Create a ruleset for the data quality rules. Create a profile job to run the data quality rules. Use Amazon EventBridge to run the profile job when data is added to the S3 bucket.
Question 296:

A company uses Amazon EMR as an extract, transform, and load (ETL) pipeline to transform data that comes from multiple sources. A data engineer must orchestrate the pipeline to maximize performance.
Which AWS service will meet this requirement MOST cost effectively?
A. Amazon EventBridge
B. Amazon Managed workflows for Apache Airflow (Amazon MWAA)
C. AWS Step Functions
D. AWS Glue workflows

C. AWS Step Functions
Question 297:

An online retail company has an application that runs on Amazon EC2 instances that are in a VPC. The company wants to collectflow logs for the VPC and analyze network traffic.
Which solution will meet these requirements MOST cost-effectively?
A. Publishflow logs to Amazon CloudWatch Logs. Use Amazon Athena for analytics.
B. Publishflow logs to Amazon CloudWatch Logs. Use an Amazon OpenSearch Service cluster for analytics.
C. Publishflow logs to Amazon S3 in text format. Use Amazon Athena for analytics.
D. Publishflow logs to Amazon S3 in Apache Parquet format. Use Amazon Athena for analytics.

D. Publishflow logs to Amazon S3 in Apache Parquet format. Use Amazon Athena for analytics.
Question 298:

A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.
The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team's BI cluster.
The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.
Which solution will meet these requirements?
A. Set up the sales team Bl cluster as a consumer of the ETL cluster by using Redshift data sharing.
B. Create materialized views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.
C. Create database views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.
D. Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

A. Set up the sales team Bl cluster as a consumer of the ETL cluster by using Redshift data sharing.
Explanation
Redshift data sharing is a feature that enables you to share live data across different Redshift clusters without the need to copy or move data. Data sharing provides secure and governed access to data, while preserving the performance and concurrency benefits of Redshift. By setting up the sales team BI cluster as a consumer of the ETL cluster, the company can share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution also minimizes the usage of the computing resources of the ETL cluster, as the data sharing does not consume any storage space or compute resources from the producer cluster. The other options are either not feasible or not efficient. Creating materialized views or database views would require the sales team to have direct access to the ETL cluster, which could interfere with the critical analysis tasks. Unloading a copy of the data from the ETL cluster to an Amazon
S3 bucket every week would introduce additional latency and cost, as well as create data inconsistency issues.
Question 299:

A retail company stores point-of-sale transaction data in an Amazon RDS for MySQL database. The company maintains historical sales analytics in Amazon Redshift. The company needs to create daily reports that combine the current day's transactions with historical sales patterns for trend analysis. The company requires a solution that provides near real-time insights while minimizing data transfer costs and maintenance overhead.
Which solution will meet these requirements?
A. Configure AWS Database Migration Service (AWS DMS) to continuously replicate data from RDS for MySQL to Amazon Redshift. Use Redshift queries to create consolidated reports.
B. Implement Amazon Redshift federated queries to directly access RDS for MySQL data and join it with existing Redshift tables in a single query.
C. Use AWS Glue to create an extract, transform, an d load (ETL) pipeline that runs every hour to copy incremental data from RDS for MySQL to Amazon Redshift. Generate reports.
D. Export RDS for MySQL datato an Amazon S3 bucket on a regular schedule. Use the COPY command to load the data into Amazon Redshift staging tables. Join the data with historical data.

B. Implement Amazon Redshift federated queries to directly access RDS for MySQL data and join it with existing Redshift tables in a single query.
Question 300:

An application consumes messages from an Amazon Simple Queue Service (Amazon SQS) queue. The application experiences occasional downtime. As a result of the downtime, messages within the queue expire and are deleted after 1 day. The message deletions cause data loss for the application.
Which solutions will minimize data loss for the application? (Choose two.)
A. Increase the message retention period
B. Increase the visibility timeout.
C. Attach a dead-letter queue (DLQ) to the SQS queue.
D. Use a delay queue to delay message delivery
E. Reduce message processing time.

A. Increase the message retention period
C. Attach a dead-letter queue (DLQ) to the SQS queue.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 291:

Question 292:

Question 293:

Question 294:

Question 295:

Question 296:

Question 297:

Question 298:

Question 299:

Question 300:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 291:

Question 292:

Question 293:

Question 294:

Question 295:

Question 296:

Question 297:

Question 298:

Question 299:

Question 300:

Related Exams:

Tips on How to Prepare for the Exams