DATA-ENGINEER-ASSOCIATE Practice Questions & Online Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code
:DATA-ENGINEER-ASSOCIATE
Exam Name
:AWS Certified Data Engineer - Associate (DEA-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:403 Q&As
Last Updated
:Jul 16, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 191:

A data engineer uploads confidential documents to an Amazon S3 bucket every day. The data engineer requires a solution to independently verify the integrity of all uploaded data to confirm that there was no corruption during the transfer process.
Which solution will meet this requirement?
A. Download a subset of the data after the data is uploaded to the S3 bucket. Manually validate the objects for integrity.
B. Change the default encryption on the S3 bucket to server-side encryption with customer-provided keys (SSE-C). Turn on S3 bucket keys to validate data integrity.
C. Calculate the SHA-256 checksum for the objects before uploading the objects. Pass the calculated value to the AWS SDK in each upload request.
D. Download the complete data after the data is uploaded to the S3 bucket. Programmatically validate the objects for integrity.

C. Calculate the SHA-256 checksum for the objects before uploading the objects. Pass the calculated value to the AWS SDK in each upload request.
Question 192:

A retail company is expanding its operations globally. The company needs to use Amazon QuickSight to accurately calculate currency exchange rates for financial reports. The company has an existing dashboard that includes a visual that is based on an analysis of a dataset that contains global currency values and exchange rates.
A data engineer needs to ensure that exchange rates are calculated with a precision of four decimal places. The calculations must be precomputed. The data engineer must materialize results in QuickSight super-fast, parallel, in-memory calculation engine (SPICE).
Which solution will meet these requirements?
A. Define and create the calculated field in the dataset.
B. Define and create the calculated field in the analysis.
C. Define and create the calculated field in the visual.
D. Define and create the calculated field in the dashboard.

A. Define and create the calculated field in the dataset.
Question 193:

A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
B. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Redshift Spectrum to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
C. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format. Store the transformed data in an S3 bucket. Use Amazon Athena to query the original and transformed data from the S3 bucket.
D. Use AWS Lake Formation to create a data lake. Use Lake Formation jobs to transform the data from all data sources to Apache Parquet format. Store the transformed data in an S3 bucket. Use Amazon Athena or Redshift Spectrum to query the data.

A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
Explanation
The best solution to meet the requirements of giving data scientists the ability to query all data sources by using syntax similar to SQL with the least operational overhead is to use AWS Glue to crawl the data sources, store metadata in the AWS Glue Data Catalog, use Amazon Athena to query the data, use SQL for structured data sources, and use PartiQL for data that is stored in JSON format. AWS Glue is a serverless data integration service that makes it easy to prepare, clean, enrich, and move data between data stores. AWS Glue crawlers are processes that connect to a data store, progress through a prioritized list of classifiers to determine the schema for your data, and then create metadata tables in the Data Catalog. The Data Catalog is a persistent metadata store that contains table definitions, job definitions, and other control information to help you manage your AWS Glue components. You can use AWS Glue to crawl the data sources, such as Amazon S3, Amazon RDS for Microsoft SQL Server, and Amazon DynamoDB, and store the metadata in the Data Catalog. Amazon Athena is a serverless, interactive query service that makes it easy to analyze data directly in Amazon S3 using standard SQL or Python. Amazon Athena also supports PartiQL, a SQL-compatible query language that lets you query, insert, update, and delete data from semi-structured and nested data, such as JSON. You can use Amazon Athena to query the data from the Data Catalog using SQL for structured data sources, such as .csv files and relational databases, and PartiQL for data that is stored in JSON format. You can also use Athena to query data from other data sources, such as Amazon Redshift, using federated queries. Using AWS Glue and Amazon Athena to query all data sources by using syntax similar to SQL is the least operational overhead solution, as you do not need to provision, manage, or scale any infrastructure, and you pay only for the resources you use. AWS Glue charges you based on the compute time and the data processed by your crawlers and ETL jobs. Amazon Athena charges you based on the amount of data scanned byyour queries. You can also reduce the cost and improve the performance of your queries by using compression, partitioning, and columnar formats for your data in Amazon S3. Option B is not the best solution, as using AWS Glue to crawl the data sources, store metadata in the AWS Glue Data Catalog, and use Redshift Spectrum to query the data, would incur more costs and complexity than using Amazon Athena. Redshift Spectrum is a feature of Amazon Redshift, a fully managed data warehouse service, that allows you to query and join data across your data warehouse and your data lake using standard SQL. While Redshift Spectrum is powerful and useful for many data warehousing scenarios, it is not necessary or cost-effective for querying all data sources by using syntax similar to SQL. Redshift Spectrum charges you based on the amount of data scanned by your queries, which is similar to Amazon Athena, but it also requires you to have an Amazon Redshift cluster, which charges you based on the node type, the number of nodes, and the duration of the cluster. These costs can add up quickly, especially if you have large volumes of data and complex queries. Moreover, using Redshift Spectrum would introduce additional latency and complexity, as you would have to provision and manage the cluster, and create an external schema and database for the data in the Data Catalog, instead of querying it directly from Amazon Athena.
Option C is not the best solution, as using AWS Glue to crawl the data sources, store metadata in the AWS Glue Data Catalog, use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format, store the transformed data in an S3 bucket, and use Amazon Athena to query the original and transformed data from the S3 bucket, would incur more costs and complexity than using Amazon Athena with PartiQL. AWS Glue jobs are ETL scripts that you can write in Python or Scala to transform your data and load it to your target data store. Apache Parquet is a columnar storage format that can improve the performance of analytical queries by reducing the amount of data that needs to be scanned and providing efficient compression and encoding schemes. While using AWS Glue jobs and Parquet can improve the performance and reduce the cost of your queries, they would also increase the complexity and the operational overhead of the data pipeline, as you would have to write, run, and monitor the ETL jobs, and store the transformed data in a separate location in Amazon S3. Moreover, using AWS Glue jobs and Parquet would introduce additional latency, as you would have to wait for the ETL jobs to finish before querying the transformed data.
Option D is not the best solution, as using AWS Lake Formation to create a data lake, use Lake Formation jobs to transform the data from all data sources to Apache Parquet format, store the transformed data in an S3 bucket, and use Amazon Athena or Redshift Spectrum to query the data, would incur more costs and complexity than using Amazon Athena with PartiQL. AWS Lake Formation is a service that helps you centrally govern, secure, and globally share data for analytics and machine learning. Lake Formation jobs are ETL jobs that you can create and run using the Lake Formation console or API. While using Lake Formation and Parquet can improve the performance and reduce the cost ofyour queries, they would also increase the complexity and the operational overhead of the data pipeline, as you would have to create, run, and monitor the Lake Formation jobs, and store the transformed data in a separate location in Amazon
S3. Moreover, using Lake Formation and Parquet would introduce additional latency, as you would have to wait for the Lake Formation jobs to finish before querying the transformed data. Furthermore, using Redshift Spectrum to query the data would also incur the same costs and complexity as mentioned in option B.
References:
What is Amazon Athena?
Data Catalog and crawlers in AWS Glue
AWS Glue Data Catalog
Columnar Storage Formats
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide
AWS Glue Schema Registry
What is AWS Glue?
Amazon Redshift Serverless
Amazon Redshift provisioned clusters
[Querying external data using Amazon Redshift Spectrum] [Using stored procedures in Amazon Redshift] [What is AWS Lambda?] [PartiQL for Amazon Athena] [Federated queries in Amazon Athena] [Amazon Athena pricing] [Top 10 performance tuning tips for Amazon Athena] [AWS Glue ETL jobs] [AWS Lake Formation jobs]
Question 194:

A global ecommerce company processes customer transactions, inventory updates, and user activity logs across multiple AWS services. The company needs a scalable, fully managed, and event-driven orchestration solution to coordinate complex extract, transform, and load (ETL) workflows. The solution must use AWS Glue and Amazon EMR to process data. The data will be stored in Amazon Redshift and Amazon S3. The solution must support dependency management, automated retries, and data pipeline monitoring.
Which solution will meet these requirements?
A. Use AWS Step Functions to define an express workflow that invokes the data transformation and loading tasks across Amazon EMR and AWS Glue.
B. Create AWS Lambda functions for each step of the workflow Configure Amazon EventBridge to invoke AWS Glue jobs. Configure the Lambda functions to process and move data through the pipeline.
C. Use Apache Airflow on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to create Directed Acyclic Graphs (DAGs) to manage ETL workflows.
D. Create an AWS Lambda function that runs each step of the workflow. Create an Amazon EventBridge scheduled rule to invoke the function every day.

C. Use Apache Airflow on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to create Directed Acyclic Graphs (DAGs) to manage ETL workflows.
Question 195:

A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.
Which solution will meet this requirement with the LEAST operational effort?
A. Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
D. Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.

B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
Question 196:

A company maintains a central Amazon Redshift data warehouse that aggregates daily transactional data from Amazon RDS for PostgreSQL and Amazon Aurora MySQL. A data engineer notices that some complex transformation queries take hours to finish. The data engineer wants to optimize query performance to reduce query execution time as much as possible.
Which solution will meet this requirement?
A. Increase the concurrency scaling quota for the Redshift cluster.
B. Export the tables to an Amazon S3 bucket. Use Amazon Athena to query the data in the bucket.
C. Use Amazon Redshift Spectrum to create external tables based on the Redshift tables.
D. Use materialized views in Amazon Redshift for frequently queried data patterns.

D. Use materialized views in Amazon Redshift for frequently queried data patterns.
Question 197:

An ecommerce company uses AWS Glue ETL to process and analyze orders. The company wants to build an extract, transform, and load (ETL) pipeline that processes placed, shipped, delivered, and canceled orders differently.
The company integrates the order processing system with Amazon EventBridge. The company configures EventBridge Scheduler rules for each order status to invoke different AWS Glue workflows. When the company examines Amazon CloudWatch metrics for the workflow, the co mpany notices that the FailedInvocations metric shows a high value for canceled orders.
The company must determine the cause of the failed invocations.
Which solution will meet this requirement?
A. Configure a dead-letter queue in EventBridge Scheduler to store failed events. Analyze the failed order events.
B. Use the archive and replay features in EventB ridge Scheduler to investigate the issue.
C. Change the retry policy in EventBridge Scheduler to reduce the value for maximum retries.
D. Change the retry policy in EventBridge Scheduler to increase the value for maximum age of event.

A. Configure a dead-letter queue in EventBridge Scheduler to store failed events. Analyze the failed order events.
Question 198:

A data engineer develops an AWS Glue Apache Spark ETL job to perform transformations on a dataset.
When the data engineer runs the job, the job returns an error that reads, "No space left on device."
The data engineer needs to identify the source of the error and provide a solution.
Which combinations of steps will meet this requirement MOST cost-effectively? (Choose Two.)
A. Scale out the workers vertically to address data skewness.
B. Use the Spark UI and AWS Glue metrics to monitor data skew in the Spark executors.
C. Scale out the number of workers horizontally to address data skewness.
D. Enable the --write-shuffle-files-to-s3 job parameter. Use the salting technique.
E. Use error logs in Amazon CloudWatch to monitor data skew.

B. Use the Spark UI and AWS Glue metrics to monitor data skew in the Spark executors.
D. Enable the --write-shuffle-files-to-s3 job parameter. Use the salting technique.
Explanation
Use the Spark UI and AWS Glue-exposed Spark metrics to pinpoint where partitions are disproportionately large (data skew) and where spill files are filling executor disk.
Enable the --write-shuffle-files-to-s3 job parameter so shuffle spills go to S3 instead of running out of local disk, and apply a salting technique on skewed keys to spread data more evenly across partitions.
Question 199:

A company needs to build a data pipeline to process a 1-TB file from an Amazon S3 bucket. The pipeline needs to create three DataFrames based on business logic. The pipeline must save all three DataFrames to a second S3 bucket in parallel. The company needs to set the pipeline to be the target of an Amazon EventBridge rule that matches file uploads to the source S3 bucket.
Which solution will meet these requirements with the LEAST maintenance overhead?
A. Configure an Apache Spark Streaming application on Amazon EMR to process data from the S3 source bucket in batches, create DataFrames, and save the output to the destination S3 bucket.
B. Configure three AWS Lambda functions to process the business logic and to save the DataFrames to the destination S3 bucket in parallel.
C. Configure an AWS Glue workflow to run three AWS Glue jobs in parallel to process the file.
D. Configure an AWS Step Functions state machine to initiate an AWS Glue workflow to run three AWS Glue jobs in parallel to process the file.

C. Configure an AWS Glue workflow to run three AWS Glue jobs in parallel to process the file.
Question 200:

A data engineer needs to create an empty copy of an existing table in Amazon Athena to perform data processing tasks. The existing table in Athena contains 1,000 rows.
Which query will meet this requirement?
A. CREATE TABLE new_table - LIKE old_table;
B. CREATE TABLE new_table - AS SELECT * FROM old_table - WITH NO DATA;
C. CREATE TABLE new_table - AS SELECT * FROM old_table;
D. CREATE TABLE new_table - as SELECT * FROM old_cable - WHERE 1=1;

B. CREATE TABLE new_table - AS SELECT * FROM old_table - WITH NO DATA;

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 191:

Question 192:

Question 193:

Question 194:

Question 195:

Question 196:

Question 197:

Question 198:

Question 199:

Question 200:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon DATA-ENGINEER-ASSOCIATE Online Practice Questions and Exam Preparation

DATA-ENGINEER-ASSOCIATE Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

Question 191:

Question 192:

Question 193:

Question 194:

Question 195:

Question 196:

Question 197:

Question 198:

Question 199:

Question 200:

Related Exams:

Tips on How to Prepare for the Exams