DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 301:

A mobile gaming company wants to capture data from its gaming app. The company wants to make the data available to three internal consumers of the data. The data records are approximately 20 KB in size.
The company wants to achieve optimal throughput from each device that runs the gaming app.
Additionally, the company wants to develop an application to process data streams. The stream-processing application must have dedicated throughput for each internal consumer.
Which solution will meet these requirements?
A. Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature with a stream for each internal consumer.
B. Configure the mobile app to call the PutRecordBatch API operation to send data to Amazon Data Firehose. Submit an AWS Support case to turn on dedicated throughput for the company's AWS account. Allow each internal consumer to access the stream.
C. Configure the mobile app to use the Amazon Kinesis Producer Library (KPL) to send data to Amazon Data Firehose. Use the enhanced fan-out feature with a stream for each internal consumer.
D. Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Host the stream-processing application for each internal consumer on Amazon EC2 instances. Configure auto scaling for the EC2 instances.

A. Configure the mobile app to call the PutRecords API operation to send data to Amazon Kinesis Data Streams. Use the enhanced fan-out feature with a stream for each internal consumer.
Explanation
Problem Analysis:
Input Requirements: Gaming app generates approximately 20 KB data records , which must be ingested and made available to three internal consumers with dedicated throughput .
Key Requirements:
High throughput for ingestion from each device.
Dedicated processing bandwidth for each consumer.
Key Considerations:
Amazon Kinesis Data Streamssupports high-throughput ingestion with PutRecords API for batch writes.
The Enhanced Fan-Out feature provides dedicated throughput to each consumer, avoiding bandwidth contention.
This solution avoids bottlenecks and ensures optimal throughput for the gaming application and consumers.
Solution Analysis:
Option A: Kinesis Data Streams + Enhanced Fan-Out PutRecords API is designed for batch writes, improving ingestion performance.
Enhanced Fan-Out allows each consumer to process the stream independently with dedicated throughput.
Option B: Data Firehose + Dedicated Throughput Request
Firehose is not designed for real-time stream processing or fan-out. It delivers data to destinations like S3, Redshift, or OpenSearch, not multiple independent consumers.
Option C: Data Firehose + Enhanced Fan-Out
Firehose does not support enhanced fan-out. This option is invalid.
Option D: Kinesis Data Streams + EC2 Instances
Hosting stream-processing applications on EC2 increases operational overhead compared to native enhanced fan-out.
Final Recommendation:
Use Kinesis Data Streams with Enhanced Fan-Out for high-throughput ingestion and dedicated consumer bandwidth.
References:
Kinesis Data Streams Enhanced Fan-Out
PutRecords API for Batch Writes
Question 302:

A data engineer needs a fully automated solution to check for new data in multiple databases and process data that the solution finds. The solution must run every hour. The solution must be compatible with Amazon RDS, Amazon DynamoDB, and Amazon OpenSearch Service. The solution must be able to process up to 10 MB of data at one time. The solution must be optimized for costs and operational overhead. The solution must have robust error handling capabilities.
Which solution will meet these requirements?
A. Use Amazon EventBridge to invoke AWS Step Functions every hour to deploy an AWS Lambda function to check for data. Configure Step Functions steps to process data that the Lambda function finds. Implement error handling in each state.
B. Use Amazon EventBridge to invoke an AWS Lambda function every hour to check for data. Configure the function to send a message to an Amazon Simple Queue Service (Amazon SQS) queue when the function finds new data. Use a second Lambda function to read the queue and perform the processing.
C. Configure an Apache Spark application to run on Amazon EMR to check for data. Implement error handling in the application. Use Amazon EventBridge to invoke the application every hour.
D. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to create a workflow that runs a directed acyclic graph (DAG) every hour to check for data. Configure the DAG to process identified data. Implement error handling in a Python operator.

A. Use Amazon EventBridge to invoke AWS Step Functions every hour to deploy an AWS Lambda function to check for data. Configure Step Functions steps to process data that the Lambda function finds. Implement error handling in each state.
Question 303:

A retail company stores order information in an Amazon Aurora table named Orders. The company needs to create operational reports from the Orders table with minimal latency. The Orders table contains billions of rows, and over 100,000 transactions can occur each second.
A marketing team needs to join the Orders data with an Amazon Redshift table named Campaigns in the marketing team's data warehouse. The operational Aurora database must not be affected.
Which solution will meet these requirements with the LEAST operational effort?
A. Use AWS Database Migration Service (AWS DMS) Serverle to replicate the Orders table to Amazon Redshift. Create a materialized view in Amazon Redshift to join with the Campaigns table.
B. Use the Aurora zero-ETL integration with Amazon Redshift to replicate the Orders table. Create a materialized view in Amazon Redshift to join with the Campaigns table.
C. Use AWS Glue to replicate the Orders table to Amazon Redshift. Create a materialized view in Amazon Redshift to join with the Campaigns table.
D. Use federated queries to query the Orders table directly from Aurora. Create a materialized view in Amazon Redshift to join with the Campaigns table.

B. Use the Aurora zero-ETL integration with Amazon Redshift to replicate the Orders table. Create a materialized view in Amazon Redshift to join with the Campaigns table.
Explanation
Aurora's zero-ETL integration with Redshift automatically and continuously streams data changes from the Orders table into a Redshift table with virtually no setup or maintenance. You can then define a materialized view in Redshift to join that replicated Orders table with Campaigns. This approach ensures minimal impact on the production Aurora workload and requires far le operational effort than building and managing your own ETL or replication pipelines.
Question 304:

A company stores server logs in an Amazon 53 bucket. The company needs to keep the logs for 1 year.
The logs are not required after 1 year.
A data engineer needs a solution to automatically delete logs that are older than 1 year.
Which solution will meet these requirements with the LEAST operational overhead?
A. Define an S3 Lifecycle configuration to delete the logs after 1 year.
B. Create an AWS Lambda function to delete the logs after 1 year.
C. Schedule a cron job on an Amazon EC2 instance to delete the logs after 1 year.
D. Configure an AWS Step Functions state machine to delete the logs after 1 year.

A. Define an S3 Lifecycle configuration to delete the logs after 1 year.
Explanation
An S3 Lifecycle configuration is the best choice for this requirement because it allows the company to define a rule that will automatically delete objects in an S3 bucket after a specified period, in this case, 1 year. This solution provides the least operational overhead because it is a built-in feature of Amazon S3, requires no additional infrastructure or management, and is designed specifically for managing object lifecycles.
Question 305:

A company stores customer records in Amazon S3. The company must not delete or modify the customer record data for 7 years after each record is created. The root user also must not have the ability to delete or modify the data.
A data engineer wants to use S3 Object Lock to secure the data.
Which solution will meet these requirements?
A. Enable governance mode on the S3 bucket. Use a default retention period of 7 years.
B. Enable compliance mode on the S3 bucket. Use a default retention period of 7 years.
C. Place a legal hold on individual objects in the S3 bucket. Set the retention period to 7 years.
D. Set the retention period for individual objects in the S3 bucket to 7 years.

B. Enable compliance mode on the S3 bucket. Use a default retention period of 7 years.
Explanation
The company wants to ensure that no customer records are deleted or modified for 7 years, and even the root user should not have the ability to change the data. S3 Object Lock Compliance Mode in is the correct solution for this scenario.
Option B: Enable compliance mode on the S3 bucket. Use a default retention period of 7 years.In Compliance Mode , even the root user cannot delete or modify locked objects during the retention period.
This ensures that the data is protected for the entire 7-year duration as required. Compliance mode is stricter than governance mode and prevents all forms of alteration, even by privileged users.
Option A (Governance Mode)still allows certain privileged users (like the root user) to bypass the lock, which does not meet the company's requirement.
Option C (legal hold) and Option D (setting retention per object) do not fully address the requirement to block root user modifications.
Question 306:

A financial company wants to implement a data mesh. The data mesh must support centralized data governance, data analysis, and data access control. The company has decided to use AWS Glue for data catalogs and extract, transform, and load (ETL) operations.
Which combination of AWS services will implement a data mesh? (Choose two.)
A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned cluster for data analysis.
B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
C. Use AWS Glue DataBrewfor centralized data governance and access control.
D. Use Amazon RDS for data storage. Use Amazon EMR for data analysis.
E. Use AWS Lake Formation for centralized data governance and access control.

B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
E. Use AWS Lake Formation for centralized data governance and access control.
Explanation
A data mesh is an architectural framework that organizes data into domains and treats data as products that are owned and offered for consumption by different teams. A data mesh requires a centralized layer for data governance and access control, as well as a distributed layer for data storage and analysis. AWS Glue can provide data catalogs and ETL operations for the data mesh, but it cannot provide data governance and access control by itself. Therefore, the company needs to use another AWS service for this purpose. AWS Lake Formation is a service that allows you to create, secure, and manage data lakes on AWS. It integrates with AWS Glue and other AWS services to provide centralized data governance and access control for the data mesh. Therefore, option E is correct.
For data storage and analysis, the company can choose from different AWS services depending on their needs and preferences. However, one of the benefits of a data mesh is that it enables data to be stored and processed in a decoupled and scalable way. Therefore, using serverless or managed services that can handle large volumes and varieties of data is preferable. Amazon S3 is a highly scalable, durable, and secure object storage service that can store any type of data. Amazon Athena is a serverless interactive query service that can analyze data in Amazon S3 using standard SQL. Therefore, option B is a good choice for data storage and analysis in a data mesh. Option A, C, and D are not optimal because they either use relational databases that are not suitable for storing diverse and unstructured data, or they require more management and provisioning than serverless services.
References:
1: What is a Data Mesh? - Data Mesh Architecture Explained - AWS
2: AWS Glue - Developer Guide
3: AWS Lake Formation - Features [4]: Design a data mesh architecture using AWS Lake Formation and AWS Glue [5]: Amazon S3 - Features [6]: Amazon Athena - Features
Question 307:

A university is developing an educational application that analyzes student essays. The application provides personalized feedback with accurate citations to the university's textbooks. The application needs to process essays in multiple languages. Application responses must include direct references to specific sections in the course materials and must be in the student's selected language.
Which solution will meet these requirements with the LEAST operational overhead?
A. Build a custom vector database by using Amazon OpenSearch Serverless. Store textbook content as multilingual embeddings. Create an AWS Lambda function that queues the database when generating responses with Amazon Bedrock.
B. Create a knowledge base in Amazon Bedrock Knowledge Bases with the university's textbooks. Configure a multilingual model to generate responses with source citations.
C. Use Amazon Comprehend to detect the language and key topics in the essays. Use Amazon Kendra to search for relevant textbook passages. Create an AWS Lambda function that formats the textbook passages into feedback.
D. Use Amazon SageMaker to host a custom-trained large language model (LLM) that has been fine-tuned on the university's textbooks to generate personalized feedback with citations.

B. Create a knowledge base in Amazon Bedrock Knowledge Bases with the university's textbooks. Configure a multilingual model to generate responses with source citations.
Question 308:

A data engineering team is using an Amazon Redshift data warehouse for operational reporting. The team wants to prevent performance issues that might result from long-running queries. A data engineer must choose a system table in Amazon Redshift to record anomalies when a query optimizer identifies conditions that might indicate performance issues.
Which table views should the data engineer use to meet this requirement?
A. STL_USAGE_CONTROL
B. STL_ALERT_EVENT_LOG
C. STL_QUERY_METRICS
D. STL_PLAN_INFO

B. STL_ALERT_EVENT_LOG
Explanation
The STL ALERT EVENT LOG table view records anomalies when the query optimizer identifies conditions that might indicate performance issues. These conditions include skewed data distribution, missing statistics, nested loop joins, and broadcasted data. The STL ALERT EVENT LOG table view can help the data engineer to identify and troubleshoot the root causes of performance issues and optimize the query execution plan. The other table views are not relevant for this requirement. STL USAGE CONTROL records the usage limits and quotas for Amazon Redshift resources. STL QUERY METRICS records the execution time and resource consumption of queries. STL PLAN INFO records the query execution plan and the steps involved in each query.
Question 309:

A company stores customer data in an Amazon S3 bucket. The company must permanently delete all customer data that is older than 7 years.
Which solution will meet this requirement?
A. Configure an S3 Lifecycle policy to permanently delete objects that are older than 7 years.
B. Use Amazon Athena to query the S3 bucket for objects that are older than 7 years. Configure Athena to delete the results.
C. Configure an S3 Lifecycle policy to move objects that are older than 7 years to S3 Glacier Deep Archive.
D. Configure an S3 Lifecycle policy to enable S3 Object Lock on all objects that are older than 7 years.

A. Configure an S3 Lifecycle policy to permanently delete objects that are older than 7 years.
Question 310:

A company has five offices in different AWS Regions. Each office has its own human resources (HR) department that uses a unique IAM role. The company stores employee records in a data lake that is based on Amazon S3 storage.
A data engineering team needs to limit access to the records. Each HR department should be able to access records for only employees who are within the HR department's Region.
Which combination of steps should the data engineering team take to meet this requirement with the LEAST operational overhead? (Choose two.)
A. Use data filters for each Region to register the S3 paths as data locations.
B. Register the S3 path as an AWS Lake Formation location.
C. Modify the IAM roles of the HR departments to add a data filter for each department's Region.
D. Enable fine-grained access control in AWS Lake Formation. Add a data filter for each Region.
E. Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3 access. Restrict access based on Region.

B. Register the S3 path as an AWS Lake Formation location.
D. Enable fine-grained access control in AWS Lake Formation. Add a data filter for each Region.
Explanation
AWS Lake Formation is a service that helps you build, secure, and manage data lakes on Amazon S3.
You can use AWS Lake Formation to register the S3 path as a data lake location, and enable fine-grained access control to limit access to the records based on the HR department's Region. You can use data filters to specify which S3 prefixes or partitions each HR department can access, and grant permissions to the IAM roles of the HR departments accordingly. This solution will meet the requirement with the least operational overhead, as it simplifies the data lake management and security, and leverages the existing IAM roles of the HR departments.
The other options are not optimal for the following reasons:
Option A: Use data filters for each Region to register the S3 paths as data locations. This option is not possible, as data filters are not used to register S3 paths as data locations, but to grant permissions to access specific S3 prefixes or partitions within a data location. Moreover, this option does not specify how to limit access to the records based on the HR department's Region.
Option C: Modify the IAM roles of the HR departments to add a data filter for each department's Region. This option is not possible, as data filters are not added to IAMroles, but to permissions granted by AWS Lake Formation. Moreover, this option does not specify how to register the S3 path as a data lake location, or how to enable fine-grained access control in AWS Lake Formation.
Option E: Create a separate S3 bucket for each Region. Configure an IAM policy to allow S3 access. Restrict access based on Region. This option is not recommended, as it would require more operational overhead to create and manage multiple S3 buckets, and to configure and maintain IAM policies for each HR department. Moreover, this option does not leverage the benefits of AWS Lake Formation, such as data cataloging, data transformation, and data governance.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 301:

Question 302:

Question 303:

Question 304:

Question 305:

Question 306:

Question 307:

Question 308:

Question 309:

Question 310:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 301:

Question 302:

Question 303:

Question 304:

Question 305:

Question 306:

Question 307:

Question 308:

Question 309:

Question 310:

Related Exams:

Tips on How to Prepare for the Exams