DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 141:

    A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Confiqure an AWS Lambda function to load data from the S3 bucket into a pandas dataframe-Write a SQL SELECT statement on the dataframe to query the required column.
    B. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
    C. Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.
    D. Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

  • Question 142:

    An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.

    A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use the FindMatches feature of AWS Glue to remove duplicate records.
    B. Use non-Windows functions in Amazon Athena to remove duplicate records.
    C. Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
    D. Use the global tables feature of Amazon DynamoDB to prevent duplicate data.

  • Question 143:

    A company has a data processing pipeline that includes several dozen steps. The data processing pipeline needs to send alerts in real time when a step fails or succeeds. The data processing pipeline uses a combination of Amazon S3 buckets, AWS Lambda functions, and AWS Step Functions state machines.

    A data engineer needs to create a solution to monitor the entire pipeline.

    Which solution will meet these requirements?

    A. Configure the Step Functions state machines to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.
    B. Configure the AWS Lambda functions to store notifications in an Amazon S3 bucket when the state machines finish running. Enable S3 event notifications on the S3 bucket.
    C. Use AWS CloudTrail to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications when a state machine fails to run or succeeds to run.
    D. Configure an Amazon EventBridge rule to react when the execution status of a state machine changes. Configure the rule to send a message to an Amazon Simple Notification Service (Amazon SNS) topic that sends notifications.

  • Question 144:

    A company is uploading log files from on-premises servers to an Amazon S3 bucket. The company needs to validate that the logs from the on-premises server are the same as the logs that are stored in the S3 bucket.

    Which solution will meet this requirement?

    A. Use the AWS SDK to automatically compute CRC32 checksums during the upload. Store the checksums in S3 object metadata.
    B. Create an AWS Lambda function to calculate SHA-256 checksums. Store results in a separate metadata table. Validate the logs after the upload.
    C. Enable S3 Object Lock in compliance mode on the S3 bucket. Upload the objects to the bucket.
    D. After uploading the objects to the S3 bucket, enable S3 Object Lock in governance mode on the S3 objects.

  • Question 145:

    A company wants to use machine learning (ML) to perform analytics on data that is in an Amazon S3 data lake. The company has two data transformation requirements that will give consumers within the company the ability to create reports.

    The company must perform daily transformations on 300 GB of data that is in a variety format that must arrive in Amazon S3 at a scheduled time. The company must perform one-time transformations of terabytes of archived data that is in the S3 data lake. The company uses Amazon Managed workflows for Apache Airflow (Amazon MWAA) Directed Acyclic Graphs (DAGs) to orchestrate processing.

    Which combination of tasks should the company schedule in the Amazon MWAA DAGs to meet these requirements MOST cost-effectively? (Choose two.)

    A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
    B. For daily incoming data, use Amazon Athena to scan and identify the schema.
    C. For daily incoming data, use Amazon Redshift to perform transformations.
    D. For daily and archived data, use Amazon EMR to perform data transformations.
    E. For archived data, use Amazon SageMaker to perform data transformations.

  • Question 146:

    A company uses Amazon SageMaker AI for its machine learning (ML) workflows. The company is organized into several project groups that use sensitive data. The company needs to give the project groups the ability to discover available datasets across different AWS accounts. The solution must maintain access controls and track all data access for compliance purposes.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use Amazon SageMaker Assets to publish, discover, and request access to datasets through the asset catalog with approval workflows that track data access.
    B. Set up Amazon SageMaker Feature Store with cross-account access policies to automatically share data between AWS accounts without requiring approval workflows.
    C. Set up IAM roles for each project group with permissions to access all datasets across all AWS accounts. Use AWS CloudTrail to record data access activity.
    D. Create separate Amazon SageMaker Studio domains for each project group with isolated environments and no ability to share data between domains.

  • Question 147:

    A data engineer needs to create a new empty table in Amazon Athena that has the same schema as an existing table named old-table.

    Which SQL statement should the data engineer use to meet this requirement?

    A. CREATE TABLE new_table AS SELECT * FROM old_tables;
    B. INSERT INTO new_table SELECT * FROM old_table;
    C. CREATE TABLE new_table (LIKE old_table);
    D. CREATE TABLE new_table AS (SELECT * FROM old_table) WITH NO DATA;

  • Question 148:

    A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.

    The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the data, and filter the results by a numeric value of a column.

    Which solution will meet these requirements with the LEAST development effort?

    A. Use AWS Glue Python jobs to read and transform the CSV files.
    B. Use an AWS Glue custom crawler to read and transform the CSV files.
    C. Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.
    D. Use AWS Glue DataBrew recipes to read and transform the CSV files.

  • Question 149:

    A data engineer is building a solution to detect sensitive information that is stored in a data lake across multiple Amazon S3 buckets. The solution must detect personally identifiable information (Pll) that is in a proprietary data format.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use the AWS Glue Detect Pll transform with specific patterns.
    B. Use Amazon Made with managed data identifiers.
    C. Use an AWS Lambda function with custom regular expressions.
    D. Use Amazon Athena with a SQL query to match the custom formats.

  • Question 150:

    A company uses Amazon Redshift as a data warehouse solution. One of the datasets that the company stores in Amazon Redshift contains data for a vendor.

    Recently, the vendor asked the company to transfer the vendor's data into the vendor's Amazon S3 bucket once each week.

    Which solution will meet this requirement?

    A. Create an AWS Lambda function to connect to the Redshift data warehouse. Configure the Lambda function to use the Redshift COPY command to copy the required data to the vendor's S3 bucket on a schedule.
    B. Create an AWS Glue job to connect to the Redshift data warehouse. Configure the AWS Glue job to use the Redshift UNLOAD command to load the required data to the vendor's S3 bucket on a schedule.
    C. Use the Amazon Redshift data sharing feature. Set the vendor's S3 bucket as the destination. Configure the source to be as a custom SQL query that selects the required data.
    D. Configure Amazon Redshift Spectrum to use the vendor's S3 bucket as destination. Enable dataquerying in both directions.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.