DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 191:

    A data engineer uploads confidential documents to an Amazon S3 bucket every day. The data engineer requires a solution to independently verify the integrity of all uploaded data to confirm that there was no corruption during the transfer process.

    Which solution will meet this requirement?

    A. Download a subset of the data after the data is uploaded to the S3 bucket. Manually validate the objects for integrity.
    B. Change the default encryption on the S3 bucket to server-side encryption with customer-provided keys (SSE-C). Turn on S3 bucket keys to validate data integrity.
    C. Calculate the SHA-256 checksum for the objects before uploading the objects. Pass the calculated value to the AWS SDK in each upload request.
    D. Download the complete data after the data is uploaded to the S3 bucket. Programmatically validate the objects for integrity.

  • Question 192:

    A retail company is expanding its operations globally. The company needs to use Amazon QuickSight to accurately calculate currency exchange rates for financial reports. The company has an existing dashboard that includes a visual that is based on an analysis of a dataset that contains global currency values and exchange rates.

    A data engineer needs to ensure that exchange rates are calculated with a precision of four decimal places. The calculations must be precomputed. The data engineer must materialize results in QuickSight super-fast, parallel, in-memory calculation engine (SPICE).

    Which solution will meet these requirements?

    A. Define and create the calculated field in the dataset.
    B. Define and create the calculated field in the analysis.
    C. Define and create the calculated field in the visual.
    D. Define and create the calculated field in the dashboard.

  • Question 193:

    A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
    B. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Redshift Spectrum to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format.
    C. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use AWS Glue jobs to transform data that is in JSON format to Apache Parquet or .csv format. Store the transformed data in an S3 bucket. Use Amazon Athena to query the original and transformed data from the S3 bucket.
    D. Use AWS Lake Formation to create a data lake. Use Lake Formation jobs to transform the data from all data sources to Apache Parquet format. Store the transformed data in an S3 bucket. Use Amazon Athena or Redshift Spectrum to query the data.

  • Question 194:

    A global ecommerce company processes customer transactions, inventory updates, and user activity logs across multiple AWS services. The company needs a scalable, fully managed, and event-driven orchestration solution to coordinate complex extract, transform, and load (ETL) workflows. The solution must use AWS Glue and Amazon EMR to process data. The data will be stored in Amazon Redshift and Amazon S3. The solution must support dependency management, automated retries, and data pipeline monitoring.

    Which solution will meet these requirements?

    A. Use AWS Step Functions to define an express workflow that invokes the data transformation and loading tasks across Amazon EMR and AWS Glue.
    B. Create AWS Lambda functions for each step of the workflow Configure Amazon EventBridge to invoke AWS Glue jobs. Configure the Lambda functions to process and move data through the pipeline.
    C. Use Apache Airflow on Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to create Directed Acyclic Graphs (DAGs) to manage ETL workflows.
    D. Create an AWS Lambda function that runs each step of the workflow. Create an Amazon EventBridge scheduled rule to invoke the function every day.

  • Question 195:

    A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII.

    Which solution will meet this requirement with the LEAST operational effort?

    A. Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
    B. Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
    C. Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
    D. Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.

  • Question 196:

    A company maintains a central Amazon Redshift data warehouse that aggregates daily transactional data from Amazon RDS for PostgreSQL and Amazon Aurora MySQL. A data engineer notices that some complex transformation queries take hours to finish. The data engineer wants to optimize query performance to reduce query execution time as much as possible.

    Which solution will meet this requirement?

    A. Increase the concurrency scaling quota for the Redshift cluster.
    B. Export the tables to an Amazon S3 bucket. Use Amazon Athena to query the data in the bucket.
    C. Use Amazon Redshift Spectrum to create external tables based on the Redshift tables.
    D. Use materialized views in Amazon Redshift for frequently queried data patterns.

  • Question 197:

    An ecommerce company uses AWS Glue ETL to process and analyze orders. The company wants to build an extract, transform, and load (ETL) pipeline that processes placed, shipped, delivered, and canceled orders differently.

    The company integrates the order processing system with Amazon EventBridge. The company configures EventBridge Scheduler rules for each order status to invoke different AWS Glue workflows. When the company examines Amazon CloudWatch metrics for the workflow, the co mpany notices that the FailedInvocations metric shows a high value for canceled orders.

    The company must determine the cause of the failed invocations.

    Which solution will meet this requirement?

    A. Configure a dead-letter queue in EventBridge Scheduler to store failed events. Analyze the failed order events.
    B. Use the archive and replay features in EventB ridge Scheduler to investigate the issue.
    C. Change the retry policy in EventBridge Scheduler to reduce the value for maximum retries.
    D. Change the retry policy in EventBridge Scheduler to increase the value for maximum age of event.

  • Question 198:

    A data engineer develops an AWS Glue Apache Spark ETL job to perform transformations on a dataset.

    When the data engineer runs the job, the job returns an error that reads, "No space left on device."

    The data engineer needs to identify the source of the error and provide a solution.

    Which combinations of steps will meet this requirement MOST cost-effectively? (Choose Two.)

    A. Scale out the workers vertically to address data skewness.
    B. Use the Spark UI and AWS Glue metrics to monitor data skew in the Spark executors.
    C. Scale out the number of workers horizontally to address data skewness.
    D. Enable the --write-shuffle-files-to-s3 job parameter. Use the salting technique.
    E. Use error logs in Amazon CloudWatch to monitor data skew.

  • Question 199:

    A company needs to build a data pipeline to process a 1-TB file from an Amazon S3 bucket. The pipeline needs to create three DataFrames based on business logic. The pipeline must save all three DataFrames to a second S3 bucket in parallel. The company needs to set the pipeline to be the target of an Amazon EventBridge rule that matches file uploads to the source S3 bucket.

    Which solution will meet these requirements with the LEAST maintenance overhead?

    A. Configure an Apache Spark Streaming application on Amazon EMR to process data from the S3 source bucket in batches, create DataFrames, and save the output to the destination S3 bucket.
    B. Configure three AWS Lambda functions to process the business logic and to save the DataFrames to the destination S3 bucket in parallel.
    C. Configure an AWS Glue workflow to run three AWS Glue jobs in parallel to process the file.
    D. Configure an AWS Step Functions state machine to initiate an AWS Glue workflow to run three AWS Glue jobs in parallel to process the file.

  • Question 200:

    A data engineer needs to create an empty copy of an existing table in Amazon Athena to perform data processing tasks. The existing table in Athena contains 1,000 rows.

    Which query will meet this requirement?

    A. CREATE TABLE new_table - LIKE old_table;
    B. CREATE TABLE new_table - AS SELECT * FROM old_table - WITH NO DATA;
    C. CREATE TABLE new_table - AS SELECT * FROM old_table;
    D. CREATE TABLE new_table - as SELECT * FROM old_cable - WHERE 1=1;

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.