DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 291:

    A company has a data pipeline that uses an Amazon RDS instance, AWS Glue jobs, and an Amazon S3 bucket. The RDS instance and AWS Glue jobs run in a private subnet of a VPC and in the same security group. A user made a change to the security group that prevents the AWS Glue jobs from connecting to the RDS instance. After the change, the security group contains a single rule that allows inbound SSH traffic from a specific IP address.

    The company must resolve the connectivity issue.

    Which solution will meet this requirement?

    A. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the security group as the source.
    B. Add an inbound rule that allows all TCP traffic on all UDP ports. Set the private IP address of the RDS instance as the source.
    C. Add an inbound rule that allows all TCP traffic on all TCP ports. Set the DNS name of the RDS instance as the source.
    D. Replace the source of the existing SSH rule with the private IP address of the RDS instance. Create an outbound rule with the same source, destination, and protocol as the inbound SSH rule.

  • Question 292:

    A company receives .csv files that contain physical address data. The data is in columns that have the following names: Door_No, Street_Name, City, and Zip_Code. The company wants to create a single column to store these values in the following format:

    Which solution will meet this requirement with the LEAST coding effort?

    A. Use AWS Glue DataBrew to read the files. Use the NEST TO ARRAY transformation to create the new column.
    B. Use AWS Glue DataBrew to read the files. Use the NEST TO MAP transformation to create the new column.
    C. Use AWS Glue DataBrew to read the files. Use the PIVOT transformation to create the new column.
    D. Write a Lambda function in Python to read the files. Use the Python data dictionary type to create the new column.

  • Question 293:

    An auditor needs to investigate who changed encryption settings for data pipeline resources last month.

    The team wants a managed AWS service that records API activity for the account and supports audit review.

    Which service should the team use as the primary source of API activity?

    A. AWS CloudTrail
    B. Amazon CloudWatch metrics
    C. AWS Glue Data Catalog
    D. Amazon S3 Inventory

  • Question 294:

    A company stores raw clickstream data in an Amazon S3 bucket. The company needs a solution to process the data every day by using complex PySpark transformations that rely on custom internal libraries. After the data is transformed, the company must store the data in Amazon Redshift for analytics.

    The solution must be highly scalable to handle large data workloads.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use AWS Glue Studio to build and schedule PySpark jobs. Configure an AWS Glue data connection that includes the custom libraries.
    B. Use Amazon EC2 Auto Scaling groups with a custom AMI that contains the custom libraries to run a PySpark application.
    C. Use Amazon EMR to run PySpark jobs. Use bootstrap actions to install the custom libraries
    D. Use Amazon SageMaker Processing jobs to run PySpark code that uses native SageMaker libraries.

  • Question 295:

    A company is developing machine learning (ML) models. A data engineer needs to apply data quality rules to training data. The company stores the training data in an Amazon S3 bucket.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Create an AWS Lambda function to check data quality and to raise exceptions in the code. Run the function when data is added to the S3 bucket. Create an Amazon CloudWatch alarm for exceptions in the code.
    B. Create an AWS Glue DataBrew project for the data in the S3 bucket. Create a ruleset for the data quality rules. Create a profile job to run the data quality rules. Use Amazon EventBridge to run the profile job when data is added to the S3 bucket.
    C. Create an Amazon EMR provisioned cluster. Add a Python open source data quality package to the EMR cluster. Use the Python package to write code for data quality rules and to copy the data from the S3 bucket to the EMR cluster. Copy the data from the S3 bucket to the EMR cluster. Run the data quality rules.
    D. Create AWS Lambda functions to evaluate data quality rules. Use AWS Step Functions to orchestrate a workflow that publishes notifications when the data fails to meet data quality rules.

  • Question 296:

    A company uses Amazon EMR as an extract, transform, and load (ETL) pipeline to transform data that comes from multiple sources. A data engineer must orchestrate the pipeline to maximize performance.

    Which AWS service will meet this requirement MOST cost effectively?

    A. Amazon EventBridge
    B. Amazon Managed workflows for Apache Airflow (Amazon MWAA)
    C. AWS Step Functions
    D. AWS Glue workflows

  • Question 297:

    An online retail company has an application that runs on Amazon EC2 instances that are in a VPC. The company wants to collectflow logs for the VPC and analyze network traffic.

    Which solution will meet these requirements MOST cost-effectively?

    A. Publishflow logs to Amazon CloudWatch Logs. Use Amazon Athena for analytics.
    B. Publishflow logs to Amazon CloudWatch Logs. Use an Amazon OpenSearch Service cluster for analytics.
    C. Publishflow logs to Amazon S3 in text format. Use Amazon Athena for analytics.
    D. Publishflow logs to Amazon S3 in Apache Parquet format. Use Amazon Athena for analytics.

  • Question 298:

    A company maintains an Amazon Redshift provisioned cluster that the company uses for extract, transform, and load (ETL) operations to support critical analysis tasks. A sales team within the company maintains a Redshift cluster that the sales team uses for business intelligence (BI) tasks.

    The sales team recently requested access to the data that is in the ETL Redshift cluster so the team can perform weekly summary analysis tasks. The sales team needs to join data from the ETL cluster with data that is in the sales team's BI cluster.

    The company needs a solution that will share the ETL cluster data with the sales team without interrupting the critical analysis tasks. The solution must minimize usage of the computing resources of the ETL cluster.

    Which solution will meet these requirements?

    A. Set up the sales team Bl cluster as a consumer of the ETL cluster by using Redshift data sharing.
    B. Create materialized views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.
    C. Create database views based on the sales team's requirements. Grant the sales team direct access to the ETL cluster.
    D. Unload a copy of the data from the ETL cluster to an Amazon S3 bucket every week. Create an Amazon Redshift Spectrum table based on the content of the ETL cluster.

  • Question 299:

    A retail company stores point-of-sale transaction data in an Amazon RDS for MySQL database. The company maintains historical sales analytics in Amazon Redshift. The company needs to create daily reports that combine the current day's transactions with historical sales patterns for trend analysis. The company requires a solution that provides near real-time insights while minimizing data transfer costs and maintenance overhead.

    Which solution will meet these requirements?

    A. Configure AWS Database Migration Service (AWS DMS) to continuously replicate data from RDS for MySQL to Amazon Redshift. Use Redshift queries to create consolidated reports.
    B. Implement Amazon Redshift federated queries to directly access RDS for MySQL data and join it with existing Redshift tables in a single query.
    C. Use AWS Glue to create an extract, transform, an d load (ETL) pipeline that runs every hour to copy incremental data from RDS for MySQL to Amazon Redshift. Generate reports.
    D. Export RDS for MySQL datato an Amazon S3 bucket on a regular schedule. Use the COPY command to load the data into Amazon Redshift staging tables. Join the data with historical data.

  • Question 300:

    An application consumes messages from an Amazon Simple Queue Service (Amazon SQS) queue. The application experiences occasional downtime. As a result of the downtime, messages within the queue expire and are deleted after 1 day. The message deletions cause data loss for the application.

    Which solutions will minimize data loss for the application? (Choose two.)

    A. Increase the message retention period
    B. Increase the visibility timeout.
    C. Attach a dead-letter queue (DLQ) to the SQS queue.
    D. Use a delay queue to delay message delivery
    E. Reduce message processing time.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.