DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 101:

A retail company needs to implement a solution to capture data updates from multiple Amazon Aurora MySQL databases. The company needs to make the updates available for analytics in near real time. The solution must be serverless and require minimal maintenance.
Which solution will meet these requirements with the LEAST operational overhead?
A. Set up AWS Database Migration Service (AWS DMS) tasks that perform schema conversions for each database. Load the changes into Amazon Redshift Serverless.
B. Use Amazon Managed Streaming for Apache Kafka (Amazon MSK) Connect with Debezium connectors to load data into Amazon Redshift Serverless.
C. Use AWS Database Migration Service (AWS DMS) to set up binary log replication to Amazon Kinesis Data Streams. Load the data into Amazon Redshift Serverless after schema conversion.
D. Use Aurora zero-ETL integrations with Amazon Redshift Serverless for each database to load Aurora MySQL changes in Amazon Redshift Serverless.

D. Use Aurora zero-ETL integrations with Amazon Redshift Serverless for each database to load Aurora MySQL changes in Amazon Redshift Serverless.
Question 102:

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file.
The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table.
The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process.
Which solution will meet these requirements?
A. Use a provisioned Amazon EMR cluster to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.
B. Load all the data files in parallel into Amazon Aurora. Run an AWS Glue job to load the data into Amazon Redshift.
C. Use an AWS Glue job to copy all the data files into one folder. Use a COPY command to load the data into Amazon Redshift.
D. Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.

D. Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.
Explanation
The company is facing performance issues loading data into Amazon Redshift because it is issuing separate COPY commands for each data file location. The most efficient way to increase the speed of data ingestion into Redshift without increasing the cost is to use a manifest file .
Option D: Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.A manifest file provides a list of all the data files, allowing the COPY command to load all files in parallel from different locations in Amazon S3. This significantly improves the loading speed without adding costs, as it optimizes the data loading process in a single COPY operation.
Other options (A, B, C) involve additional steps that would either increase the cost (provisioning clusters, using Glue, etc.) or do not address the core issue of needing a unified and efficient COPY process.
Question 103:

A company hosts its applications on Amazon EC2 instances. The company must use SSL/TLS connections that encrypt data in transit to communicate securely with AWS infrastructure that is managed by a customer.
A data engineer needs to implement a solution to simplify the generation, distribution, and rotation of digital certificates. The solution must automatically renew and deploy SSL/TLS certificates.
Which solution will meet these requirements with the LEAST operational overhead?
A. Store self-managed certificates on the EC2 instances.
B. Use AWS Certi cate Manager (ACM).
C. Implement custom automation scripts in AWS Secrets Manager.
D. Use Amazon Elastic Container Service (Amazon ECS) Service Connect.

B. Use AWS Certi cate Manager (ACM).
Explanation
The best solution for managing SSL/TLS certificates on EC2 instances with minimal operational overhead is to use AWS Certificate Manager (ACM) . ACM simplifies certificate management by automating the provisioning, renewal, and deployment of certificates.
AWS Certificate Manager (ACM):
ACM manages SSL/TLS certificates for EC2 and other AWS resources, including automatic certificate renewal. This reduces the need for manual management and avoids operational complexity. ACM also integrates with other AWS services to simplify secure connections between AWS infrastructure and customer-managed environments.
Question 104:

A company stores analysis results for thousands of daily customer service call transcripts in an Amazon DynamoDB table. An application generates business reports by using only the previous 7 working days of data. To reduce costs, the company wants to implement a solution that automatically removes data that is older than 7 working days.
Which solution will meet this requirement with the LEAST operational overhead?
A. Enable TTL on the DynamoDB table. Add an expiration timestamp attribute to each item in the table.
B. Use Amazon EventBridge to invoke an AWS Lambda function on a schedule to delete old records every day.
C. Create a new DynamoDB table. Enable TTL on the new table. Migrate the data to the new table. Update the application to use the new table.
D. Use AWS Glue jobs to implement a daily batch job to scan for and delete expired items from the DynamoDB table.

A. Enable TTL on the DynamoDB table. Add an expiration timestamp attribute to each item in the table.
Question 105:

A team stores daily Parquet files in Amazon S3 by path pattern s3://company-data/orders/year=YYYY/ month=MM/day=DD/. Analysts query the files with Athena. New partitions are added every day, and the metadata must stay current in AWS Glue Data Catalog with minimal manual work.
Which solution should the data engineer use?
A. Configure an AWS Glue crawler for the S3 path and schedule it to update the Data Catalog.
B. Create a DynamoDB TTL attribute for each S3 partition.
C. Run Amazon Redshift VACUUM after each file arrives.
D. Use AWS DMS CDC to migrate each partition to the Data Catalog.

A. Configure an AWS Glue crawler for the S3 path and schedule it to update the Data Catalog.
Explanation
AWS Glue crawlers can discover schemas and partitions in S3 and update the Data Catalog on a schedule. DynamoDB TTL does not manage catalog metadata. Redshift VACUUM maintains Redshift table storage, not Athena external partitions. AWS DMS migrates database data and does not maintain Glue partition metadata for S3 files.
Question 106:

A company uses Amazon S3 buckets, AWS Glue tables, and Amazon Athena as components of a data lake. Recently, the company expanded its sales range to multiple new states. The company wants to introduce state names as a new partition to the existing S3 bucket, which is currently partitioned by date.
The company needs to ensure that additional partitions will not disrupt daily synchronization between the AWS Glue Data Catalog and the S3 buckets.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use the AWS Glue API to manually update the Data Catalog.
B. Run an MSCK REPAIR TABLE command in Athena.
C. Schedule an AWS Glue crawler to periodically update the Data Catalog.
D. Run a REFRESH TABLE command in Athena.

C. Schedule an AWS Glue crawler to periodically update the Data Catalog.
Explanation
Scheduling an AWS Glue crawler to periodically update the Data Catalog automates the process of detecting new partitions and updating the catalog, which minimizes manual maintenance and operational overhead.
Question 107:

A data engineer is building a data pipeline on AWS by using AWS Glue extract, transform, and load (ETL) jobs. The data engineer needs to process data from Amazon RDS and MongoDB, perform transformations, and load the transformed data into Amazon Redshift for analytics. The data updates must occur every hour.
Which combination of tasks will meet these requirements with the LEAST operational overhead? (Choose two.)
A. Configure AWS Glue triggers to run the ETL jobs even/ hour.
B. Use AWS Glue DataBrewto clean and prepare the data for analytics.
C. Use AWS Lambda functions to schedule and run the ETL jobs even/ hour.
D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
E. Use the Redshift Data API to load transformed data into Amazon Redshift.

A. Configure AWS Glue triggers to run the ETL jobs even/ hour.
D. Use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift.
Explanation
The correct answer is to configure AWS Glue triggers to run the ETL jobs every hour and use AWS Glue connections to establish connectivity between the data sources and Amazon Redshift. AWS Glue triggers are a way to schedule and orchestrate ETL jobs with the least operational overhead. AWS Glue connections are a way to securely connect todata sources and targets using JDBC or MongoDB drivers.
AWS Glue DataBrew is a visual data preparation tool that does not support MongoDB as a data source.
AWS Lambda functions are a serverless option to schedule and run ETL jobs, but they have a limit of 15 minutes for execution time, which may not be enough for complex transformations. The Redshift Data API is a way to run SQL commands on Amazon Redshift clusters without needing a persistent connection, but it does not support loading data from AWS Glue ETL jobs.
Question 108:

A data engineer needs to create an AWS Lambda function that converts the format of data from .csv to Apache Parquet. The Lambda function must run only if a user uploads a .csv file to an Amazon S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
B. Create an S3 event notification that has an event type of s3:ObjectTagging:* for objects that have a tag set to .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
C. Create an S3 event notification that has an event type of s3:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
D. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set an Amazon Simple Notification Service (Amazon SNS) topic as the destination for the event notification. Subscribe the Lambda function to the SNS topic.

A. Create an S3 event notification that has an event type of s3:ObjectCreated:*. Use a filter rule to generate notifications only when the suffix includes .csv. Set the Amazon Resource Name (ARN) of the Lambda function as the destination for the event notification.
Explanation
Option A is the correct answer because it meets the requirements with the least operational overhead.
Creating an S3 event notification that has an event type of S3:ObjectCreated:* will trigger the Lambda function whenever a new object is created in the S3 bucket. Using a filter rule to generate notifications only when the suffix includes .csv will ensure that the Lambda function only runs for .csv files. Setting the ARN of the Lambda function as the destination for the event notification will directly invoke the Lambda function without any additional steps.
Option B is incorrect because it requires the user to tag the objects with .csv, which adds an extra step and increases the operational overhead.
Option C is incorrect because it uses an event type of s3:*, which will trigger the Lambda function for any S3 event, not just object creation. This could result in unnecessary invocations and increased costs.
Option D is incorrect because it involves creating and subscribing to an SNS topic, which adds an extra layer of complexity and operational overhead.
References:
3: Data Ingestion and Transformation, Section 3.2: S3 Event Notifications and Lambda Functions, Pages 67-69 Building Batch Data Analytics Solutions on AWS, Module
4: Data Transformation, Lesson 4.2: AWS Lambda, Pages 4-8 AWS Documentation Overview, AWS Lambda Developer Guide, Working with AWS Lambda Functions, Configuring Function Triggers, Using AWS Lambda with Amazon S3, Pages 1-5
Question 109:

A data engineer is designing a log table for an application that requires continuous ingestion. The application must provide dependable API-based access to specific records from other applications. The application must handle more than 4,000 concurrent write operations and 6,500 read operations every second.
Which solution will meet these requirements?
A. Create an Amazon Redshift table with the KEY distribution style. Use the Amazon Redshift Data API to perform all read and write operations.
B. Store the log files in an Amazon S3 Standard bucket. Register the schema in AWS Glue Data Catalog. Create an external Redshift table that points to the AWS Glue schema. Use the table to perform Amazon Redshift Spectrum read operations.
C. Create an Amazon Redshift table with the EVEN distribution style. Use the Amazon Redshift Java Database Connectivity (JDBC) connector to establish a database connection. Use the database connection to perform all read and write operations.
D. Create an Amazon DynamoDB table that has provisioned capacity to meet the application's capacity needs. Use the DynamoDB table to perform all read and write operations by using DynamoDB APIs.

D. Create an Amazon DynamoDB table that has provisioned capacity to meet the application's capacity needs. Use the DynamoDB table to perform all read and write operations by using DynamoDB APIs.
Question 110:

A marketing company uses Amazon S3 to store clickstream data. The company queries the data at the end of each day by using a SQL JOIN clause on S3 objects that are stored in separate buckets.
The company creates key performance indicators (KPIs) based on the objects. The company needs a serverless solution that will give users the ability to query data by partitioning the data. The solution must maintain the atomicity, consistency, isolation, and durability (ACID) properties of the data.
Which solution will meet these requirements MOST cost-effectively?
A. Amazon S3 Select
B. Amazon Redshift Spectrum
C. Amazon Athena
D. Amazon EMR

C. Amazon Athena

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 101:

Question 102:

Question 103:

Question 104:

Question 105:

Question 106:

Question 107:

Question 108:

Question 109:

Question 110:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 101:

Question 102:

Question 103:

Question 104:

Question 105:

Question 106:

Question 107:

Question 108:

Question 109:

Question 110:

Related Exams:

Tips on How to Prepare for the Exams