DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 201:

    A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.

    Which AWS Glue feature should the data engineer use to meet this requirement?

    A. workflows
    B. Triggers
    C. Job bookmarks
    D. classifiers

  • Question 202:

    A company creates a new non-production application that runs on an Amazon EC2 instance. The application needs to communicate with an Amazon RDS database instance using Java Database Connectivity (JDBC). The EC2 instances and the RDS database instance are in the same subnet.

    Which solution will meet this requirement?

    A. Modify the IAM role that is assigned to the database instance to allow connections from the EC2 instances.
    B. Modify the ec2_authorized_hosts parameter in the RDS parameter group to include the EC2 instances. Restart the database instance.
    C. Update the database security group to allow connections from the EC2 instances.
    D. Enable the Amazon RDS Data API and specify the Amazon Resource Name (ARN) of the database instance in the JDBC connection string.

  • Question 203:

    A company must retain specific data for 1 year. A data engineer observes that one of the company's Amazon S3 buckets contains millions of objects that are older than 3 years. Versioning is enabled on the bucket.

    To reduce costs, the data engineer implements an S3 Lifecycle rule to expire objects after 365 days. The new S3 Lifecycle rule causes the object count to double instead of decrease.

    Which additional step must the data engineer take to permanently delete the old objects?

    A. Disable versioning on the S3 bucket.
    B. Use an AWS Lambda function to run a Python job to identify and delete objects that are older than 365 days.
    C. Suspend versioning on the S3 bucket.
    D. Add an additional S3 Lifecycle rule to delete the current and expired versions of objects that are older than 365 days.

  • Question 204:

    A marketing company collects clickstream data. The company sends the clickstream data to Amazon Kinesis Data Firehose and stores the clickstream data in Amazon S3. The company wants to build a series of dashboards that hundreds of users from multiple departments will use.

    The company will use Amazon QuickSight to develop the dashboards. The company wants a solution that can scale and provide daily updates about clickstream activity.

    Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)

    A. Use Amazon Redshift to store and query the clickstream data.
    B. Use Amazon Athena to query the clickstream data
    C. Use Amazon S3 analytics to query the clickstream data.
    D. Access the query data through a QuickSight direct SQL query.
    E. Access the query data through QuickSight SPICE (Super-fast, Parallel, In-memory Calculation Engine). configure a daily refresh for the dataset.

  • Question 205:

    A company runs a scheduled AWS Glue ETL job that reads daily files from an Amazon S3 prefix and writes curated data to another S3 prefix. The job reprocesses previously handled files every morning, which creates duplicate records in the target dataset. A data engineer needs the job to process only new supported source files after each successful run.

    Which solution will meet this requirement with the LEAST operational overhead?

    A. Enable AWS Glue job bookmarks for the job and ensure the script commits the job state at the end of each successful run.
    B. Configure the target S3 bucket with S3 Versioning so that duplicate output files are retained as separate versions.
    C. Add an Amazon EventBridge schedule that starts the AWS Glue job only once each week.
    D. Store the list of processed object keys in an Amazon RDS table and update the table manually after each job run.

  • Question 206:

    A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB.

    The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.

    The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.

    The company needs to improve the performance of the second pipeline.

    Which solution will meet this requirement MOST cost-effectively?

    A. Use a larger worker type.
    B. Increase the number of workers in the AWS Glue ETL jobs.
    C. Use the AWS Glue DynamicFrame grouping option.
    D. Enable AWS Glue auto scaling.

  • Question 207:

    A company has implemented a lake house architecture in Amazon Redshift. The company needs to give users the ability to authenticate into Redshift query editor by using a third-party identity provider (IdP).

    A data engineer must set up the authentication mechanism.

    What is the first step the data engineer should take to meet this requirement?

    A. Register the third-party IdP as an identity provider in the configuration settings of the Redshift cluster.
    B. Register the third-party IdP as an identity provider from within Amazon Redshift.
    C. Register the third-party IdP as an identity provider for AVS Secrets Manager. configure Amazon Redshift to use Secrets Manager to manage user credentials.
    D. Register the third-party IdP as an identity provider for AWS Certi cate Manager (ACM). configure Amazon Redshift to use ACM to manage user credentials.

  • Question 208:

    A data engineer is building a serverless. multi-step extract, transform, and load (ETL) pipeline. The pipeline extracts data from an Amazon S3 data lake and transforms the data by using AWS Glue ETL jobs. The pipeline then loads the results into an Amazon Redshift database. The data engineer needs to orchestrate the serverless ETL workflow.

    Which solutions will meet these requirements? (Choose two.)

    A. Implement the workflow by using AWS Step Functions. Configure Step Functions to coordinate the AWS Glue ETL jobs and handle error conditions with automatic retries.
    B. Use AWS Glue workflows to create a graph of the ETL tasks that visually represents the dependencies between jobs and the job triggers.
    C. Provision an always on Amazon EC2 instance. Create a cron job that invokes the AWS Glue ETL jobs in sequence based on a predefined schedule
    D. Use Amazon EventBridge rules to invoke the AWS Glue ETL jobs based on S3 object creation events. Configure the rules to chain the AWS Glue ETL jobs in sequence and handle complex job dependencies.
    E. Build an orchestration solution by using AWS CodePipeline to coordinate the ETL pipeline and infrastructure changes based on the dependencies.

  • Question 209:

    A company receives call logs as Amazon S3 objects that contain sensitive customer information. The company must protect the S3 objects by using encryption. The company must also use encryption keys that only specific employees can access.

    Which solution will meet these requirements with the LEAST effort?

    A. Use an AWS CloudHSM cluster to store the encryption keys. Configure the process that writes to Amazon S3 to make calls to CloudHSM to encrypt and decrypt the objects. Deploy an IAM policy that restricts access to the CloudHSM cluster.
    B. Use server-side encryption with customer-provided keys (SSE-C) to encrypt the objects that contain customer information. Restrict access to the keys that encrypt the objects.
    C. Use server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the KMS keys that encrypt the objects.
    D. Use server-side encryption with Amazon S3 managed keys (SSE-S3) to encrypt the objects that contain customer information. Configure an IAM policy that restricts access to the Amazon S3 managed keys that encrypt the objects.

  • Question 210:

    A gaming company uses AWS Glue to perform read and write operations on Apache Iceberg tables for real-time streaming data. The data in the Iceberg tables is in Apache Parquet format. The company is experiencing slow query performance.

    Which solutions will improve query performance? (Choose two.)

    A. Use AWS Glue Data Catalog to generate column-level statistics for the Iceberg tables on a schedule.
    B. Use AWS Glue Data Catalog to automatically compact the Iceberg tables.
    C. Use AWS Glue Data Catalog to automatically optimize indexes for the Iceberg tables.
    D. Use AWS Glue Data Catalog to enable copy-on-write for the Iceberg tables.
    E. Use AWS Glue Data Catalog to generate views for the Iceberg tables.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.