DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 181:

A company needs to store semi-structured transactional data for an application in a database. The database must be serverless. The application writes the data infrequently, but it reads the data frequently.
The application must retrieve the data within milliseconds.
Which solution will meet these requirements with the LEAST operational overhead?
A. Store the data in an Amazon S3 Standard bucket. Enable S3 Transfer Acceleration.
B. Store the data in an Amazon S3 Apache Iceberg table. Enable S3 Transfer Acceleration.
C. Store the data in an Amazon RDS for MySQL cluster. Configure RDS Optimized Reads for the cluster.
D. Store the data in an Amazon DynamoDB table. Configure a DynamoDB Accelerator cache.

D. Store the data in an Amazon DynamoDB table. Configure a DynamoDB Accelerator cache.
Question 182:

A financial company recently added more features to its mobile app. The new features required the company to create a new topic in an existing Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster.
A few days after the company added the new topic, Amazon CloudWatch raised an alarm on the RootDiskUsed metric for the MSK cluster.
How should the company address the CloudWatch alarm?
A. Expand the storage of the MSK broker. Configure the MSK cluster storage to expand automatically.
B. Expand the storage of the Apache ZooKeeper nodes.
C. Update the MSK broker instance to a larger instance type. Restart the MSK cluster.
D. Specify the Target-Volume-in-GiB parameter for the existing topic.

A. Expand the storage of the MSK broker. Configure the MSK cluster storage to expand automatically.
Explanation
The RootDiskUsed metric for the MSK cluster indicates that the storage on the broker is reaching its capacity. The best solution is to expand the storage of the MSK broker and enable automatic storage expansion to prevent future alarms.
Expand MSK Broker Storage:
AWS Managed Streaming for Apache Kafka (MSK) allows you to expand the broker storage to accommodate growing data volumes. Additionally, auto-expansion of storage can be configured to ensure that storage grows automatically as the data increases.
Question 183:

A company has a gaming application that stores data in Amazon DynamoDB tables. A data engineer needs to ingest the game data into an Amazon OpenSearch Service cluster. Data updates must occur in near real time.
Which solution will meet these requirements?
A. Use AWS Step Functions to periodically export data from the Amazon DynamoDB tables to an Amazon S3 bucket. Use an AWS Lambda function to load the data into Amazon OpenSearch Service.
B. Configure an AW5 Glue job to have a source of Amazon DynamoDB and a destination of Amazon OpenSearch Service to transfer data in near real time.
C. Use Amazon DynamoDB Streams to capture table changes. Use an AWS Lambda function to process and update the data in Amazon OpenSearch Service.
D. Use a custom OpenSearch plugin to sync data from the Amazon DynamoDB tables.

C. Use Amazon DynamoDB Streams to capture table changes. Use an AWS Lambda function to process and update the data in Amazon OpenSearch Service.
Explanation
Problem Analysis:
The company uses DynamoDB for gaming data storage and needs to ingest data into Amazon OpenSearch Service near real time in .
Data updates must propagate quickly to OpenSearch for analytics or search purposes.
Key Considerations:
DynamoDB Streamsprovide near-real-time capture of table changes (inserts, updates, and deletes).
Integration with AWS Lambda allows seamless processing of these changes.
OpenSearch offers APIs for indexing and updating documents, which Lambda can invoke.
Solution Analysis:
Option A: Step Functions with Periodic Export
Not suitable for near-real-time updates; introduces significant latency.
Operationally complex to manage periodic exports and S3 data ingestion.
Option B: AWS Glue Job AWS Glue is designed for ETL workloads but lacks real-time processing capabilities.
Option C: DynamoDB Streams + Lambda
DynamoDB Streams capture changes in near real time.
Lambda can process these streams and use the OpenSearch API to update the index.
This approach provides low latency and seamless integration with minimal operational overhead.
Option D: Custom OpenSearch Plugin
Writing a custom plugin adds complexity and is unnecessary with existing AWS integrations.
Implementation Steps:
Enable DynamoDB Streams for the relevant DynamoDB tables.
Create a Lambda function to process stream records: Parse insert, update, and delete events.
Use OpenSearch APIs to index or update documents based on the event type.
Set up a trigger to invoke the Lambda function whenever there are changes in the DynamoDB Stream.
Monitor and log errors for debugging and operational health.
References:
Amazon DynamoDB Streams Documentation
AWS Lambda and DynamoDB Integration Amazon OpenSearch Service APIs
Question 184:

A company stores customer data that contains personally identifiable information (PII) in an Amazon Redshift cluster. The company's marketing, claims, and analytics teams need to be able to access the customer data.
The marketing team should have access to obfuscated claim information but should have full access to customer contact information.
The claims team should have access to customer information for each claim that the team processes.
The analytics team should have access only to obfuscated PII data.
Which solution will enforce these data access requirements with the LEAST administrative overhead?
A. Create a separate Redshift cluster for each team. Load only the required data for each team. Restrict access to clusters based on the teams.
B. Create views that include required fields for each of the data requirements. Grant the teams access only to the view that each team requires.
C. Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.
D. Move the customer data to an Amazon S3 bucket. Use AWS Lake Formation to create a data lake. Use fine-grained security capabilities to grant each team appropriate permissions to access the data.

C. Create a separate Amazon Redshift database role for each team. Define masking policies that apply for each team separately. Attach appropriate masking policies to each team role.
Question 185:

A company processes 500 GB of audience and advertising data daily, storing CSV files in Amazon S3 with schemas registered in AWS Glue Data Catalog. They need to convert these files to Apache Parquet format and store them in an S3 bucket.
The solution requires a long-running workflow with 15 GiB memory capacity to process the data concurrently, followed by a correlation process that begins only after the first two processes complete.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the workflow by using AWS Glue. Configure AWS Glue to begin the third process after the first two processes have finished.
B. Use Amazon EMR to run each process in the workflow. Create an Amazon Simple Queue Service (Amazon SQS) queue to handle messages that indicate the completion of the first two processes. Configure an AWS Lambda function to process the SQS queue by running the third process.
C. Use AWS Glue workflows to run the first two processes in parallel. Ensure that the third process starts after the first two processes have finished.
D. Use AWS Step Functions to orchestrate a workflow that uses multiple AWS Lambda functions. Ensure that the third process starts after the first two processes have finished.

C. Use AWS Glue workflows to run the first two processes in parallel. Ensure that the third process starts after the first two processes have finished.
Explanation
AWS Glue workflows natively orchestrate multiple Glue jobs in parallel and support dependency triggers so the third job starts only after the first two finish. Glue jobs provide the required ~15 GiB memory capacity and easily convert CSV to Parquet using the Glue Data Catalog, delivering a fully managed, low-operations solution.
Question 186:

An analytics workload in Athena scans several terabytes of CSV files each day. Most queries read only a few columns and filter by event_date. The team wants to reduce scanned data and improve query performance.
Which data layout should the data engineer choose?
A. Convert the data to Apache Parquet and partition the S3 layout by event_date.
B. Combine all CSV files into one uncompressed file in a single S3 prefix.
C. Convert the data to JSON and remove all partition prefixes.
D. Store the data in plain text files with random object key prefixes only.

A. Convert the data to Apache Parquet and partition the S3 layout by event_date.
Explanation
Columnar formats such as Parquet let Athena read only needed columns, and partitioning by event_date lets queries prune irrelevant S3 prefixes. One large CSV file prevents column pruning and reduces parallelism. JSON is still row-oriented for this workload and removing partitions increases scanned data.
Random keys do not align with the query filter.
Question 187:

A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles.
More than 100 users access the data warehouse every day.
The company wants to control user access to the objects based on each user's job role, permissions, and how sensitive the data is.
Which solution will meet these requirements?
A. Use the role-based access control (RBAC) feature of Amazon Redshift.
B. Use the row-level security (RLS) feature of Amazon Redshift.
C. Use the column-level security (CLS) feature of Amazon Redshift.
D. Use dynamic data masking policies in Amazon Redshift.

A. Use the role-based access control (RBAC) feature of Amazon Redshift.
Explanation
Role-based access control (RBAC) in Amazon Redshift enables administrators to define roles based on job functions and then assign these roles to users. RBAC allows for granular permissions management based on job roles, aligning well with the need to control access according to each user's role, permissions, and the sensitivity of data. This approach simplifies managing access for a large number of users by using predefined roles instead of individual permissions.
Question 188:

A company aggregates high-frequency sensor telemetry into an Amazon S3 data lake. Each sensor stream emits structured records every hour. The records include metadata such as sensor category, unit ID, operational state, event timestamp, and site location. The data scales up to millions of records each day. The company runs complex queries each day to uncover performance insights specific to sensor categories.
Which solution will meet these requirements with the FASTEST query execution time?
A. Persist the data in Apache ORC format. Partition the data by date. Sort the data by sensor category.
B. Persist the data in CSV format. Partition the data by date. Sort the data by operational status.
C. Persist the data in Parquet format. Partition the data by sensor category. Sort the data by date
D. Persist the data in CSV format. Partition the data by date. Sort the data by sensor category.

C. Persist the data in Parquet format. Partition the data by sensor category. Sort the data by date
Question 189:

A company uses an Amazon Redshift cluster as a data warehouse that is shared acro two departments.
To comply with a security policy, each department must have unique acce permiions.
Department A must have acce to tables and views for Department
A. Group tables and views for each department into dedicated schemas. Manage permiions at the schema level.
B. Group tables and views for each department into dedicated databases. Manage permiions at the database level.
C. Update the names of the tables and views to follow a naming convention that contains the department names. Manage permiions based on the new naming convention.
D. Create an IAM user group for each department. Use identity-based IAM policies to grant table and view permiions based on the IAM user group.

A. Group tables and views for each department into dedicated schemas. Manage permiions at the schema level.
Explanation
By organizing each department's tables and views into its own schema (for example, dept_a and dept_b), you can grant usage and object privileges at the schema level. Department A's role gets USAGE on schema dept_a and the neceary SELECT rights there (and no rights on dept_b), and vice versa for Department B. Because both schemas live in the same database, analysts can still run cro-department queries by schema-qualifying objects, and you avoid the extra complexity of multiple databases or intricate
IAM policies.
Question 190:

An insurance company stores transaction data that the company compressed with gzip.
The company needs to query the transaction data for occasional audits.
Which solution will meet this requirement in the MOST cost-effective way?
A. Store the data in Amazon Glacier Flexible Retrieval. Use Amazon S3 Glacier Select to query the data.
B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.
C. Store the data in Amazon S3. Use Amazon Athena to query the data.
D. Store the data in Amazon Glacier Instant Retrieval. Use Amazon Athena to query the data.

B. Store the data in Amazon S3. Use Amazon S3 Select to query the data.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 181:

Question 182:

Question 183:

Question 184:

Question 185:

Question 186:

Question 187:

Question 188:

Question 189:

Question 190:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 181:

Question 182:

Question 183:

Question 184:

Question 185:

Question 186:

Question 187:

Question 188:

Question 189:

Question 190:

Related Exams:

Tips on How to Prepare for the Exams