DATA-ENGINEER-ASSOCIATE Exam Details

  • Exam Code
    :DATA-ENGINEER-ASSOCIATE
  • Exam Name
    :AWS Certified Data Engineer - Associate (DEA-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :403 Q&As
  • Last Updated
    :May 29, 2026

Amazon DATA-ENGINEER-ASSOCIATE Online Questions & Answers

  • Question 121:

    A data engineer needs to deploy a complex pipeline. The stages of the pipeline must be able to run a script. The data engineer must use only fully managed and serverless services in the pipeline.

    Which solution will meet these requirements?

    A. Deploy AWS Glue jobs and workflows. UseAWS Glue to run the jobs and workflows on a schedule.
    B. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to build and schedule the pipeline.
    C. Deploy the script to Amazon EC2 instances. Use Amazon EventBridge to run the script on a schedule.
    D. Use Aws Glue DataBrew to build the pipeline. Use Amazon EventBridge to run the pipeline on a schedule.

  • Question 122:

    A company needs to build a data lake in AWS. The company must provide row-level data access and column-level data access to specific teams. The teams will access the data by using Amazon Athena, Amazon Redshift Spectrum, and Apache Hive from Amazon EMR.

    Which solution will meet these requirements with the LEAST operational overhead?

    A. Use Amazon S3 for data lake storage. Use S3 access policies to restrict data access by rows and columns. Provide data access through Amazon S3.
    B. Use Amazon S3 for data lake storage. Use Apache Ranger through Amazon EMR to restrict data access by rows and columns. Provide data access by using Apache Pig.
    C. Use Amazon Redshift for data lake storage. Use Redshift security policies to restrict data access by rows and columns. Provide data access by using Apache Spark and Amazon Athena federated queries.
    D. Use Amazon S3 for data lake storage. Use AWS Lake Formation to restrict data access by rows and columns. Provide data access through AWS Lake Formation.

  • Question 123:

    The company stores a large volume of customer records in Amazon S3. To comply with regulations, the company must be able to access new customer records immediately for the first 30 days after the records are created. The company accesses records that are older than 30 days infrequently.

    The company needs to cost-optimize its Amazon S3 storage.

    Which solution will meet these requirements MOST cost-effectively?

    A. Apply a lifecycle policy to transition records to S3 Standard Infrequent-Access (S3 Standard-IA) storage after 30 days.
    B. Use S3 Intelligent-Tiering storage.
    C. Transition records to S3 Glacier Deep Archive storage after 30 days.
    D. Use S3 Standard-Infrequent Access (S3 Standard-IA) storage for all customer records.

  • Question 124:

    A data engineer needs to provide analysts with SQL access to data in Amazon S3 without loading the data into a database. The team already has table metadata in AWS Glue Data Catalog.

    Which service should the engineer use for serverless querying?

    A. Amazon Athena
    B. AWS Backup
    C. Amazon MemoryDB for Redis
    D. AWS Application Migration Service

  • Question 125:

    A company stores logs in an Amazon S3 bucket. When a data engineer attempts to access several log files, the data engineer discovers that some files have been unintentionally deleted.

    The data engineer needs a solution that will prevent unintentional file deletion in the future.

    Which solution will meet this requirement with the LEAST operational overhead?

    A. Manually back up the S3 bucket on a regular basis.
    B. Enable S3 Versioning for the S3 bucket.
    C. Configure replication for the S3 bucket.
    D. Use an Amazon S3 Glacier storage class to archive the data that is in the S3 bucket.

  • Question 126:

    A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3.

    Which actions will provide the FASTEST queries? (Choose two.)

    A. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
    B. Use a columnar storage file format.
    C. Partition the data based on the most common query predicates.
    D. Split the data into files that are less than 10 KB.
    E. Use file formats that are not splittable.

  • Question 127:

    A company is creating near real-time dashboards to visualize time series data. The company ingests data into Amazon Managed Streaming for Apache Kafka (Amazon MSK). A customized data pipeline consumes the data. The pipeline then writes data to Amazon Keyspaces (for Apache Cassandra), Amazon OpenSearch Service, and Apache Avro objects in Amazon S3.

    Which solution will make the data available for the data visualizations with the LEAST latency?

    A. Create OpenSearch Dashboards by using the data from OpenSearch Service.
    B. Use Amazon Athena with an Apache Hive metastore to query the Avro objects in Amazon S3. Use Amazon Managed Grafana to connect to Athena and to create the dashboards.
    C. Use Amazon Athena to query the data from the Avro objects in Amazon S3. configure Amazon Keyspaces as the data catalog. Connect Amazon QuickSight to Athena to create the dashboards.
    D. Use AWS Glue to catalog the data. Use S3 Select to query the Avro objects in Amazon S3. Connect Amazon QuickSight to the S3 bucket to create the dashboards.

  • Question 128:

    A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake.

    The company uses Amazon Athena to query the data that is in the data lake.

    The company needs to identify matching records even when the records do not have a common unique identifier.

    Which solution will meet this requirement?

    A. Use Amazon Made pattern matching as part of the ETL job.
    B. Train and use the AWS Glue PySpark Filter class in the ETL job.
    C. Partition tables and use the ETL job to partition the data on a unique identifier.
    D. Train and use the AWS Lake Formation FindMatches transform in the ETL job.

  • Question 129:

    A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account.

    A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow.

    Which log type should the data engineer use to diagnose the cause of the failure?

    A. YourEnvironmentName-WebServer
    B. YourEnvironmentName-Scheduler
    C. YourEnvironmentName-DAGProcessing
    D. YourEnvironmentName-Task

  • Question 130:

    A retail company stores data from a product lifecycle management (PLM) application in an on-premises MySQL database. The PLM application frequently updates the database when transactions occur.

    The company wants to gather insights from the PLM application in near real time. The company wants to integrate the insights with other business datasets and to analyze the combined dataset by using an Amazon Redshift data warehouse.

    The company has already established an AWS Direct Connect connection between the on-premises infrastructure and AWS.

    Which solution will meet these requirements with the LEAST development effort?

    A. Run a scheduled AWS Glue extract, transform, and load (ETL) job to get the MySQL database updates by using a Java Database Connectivity (JDBC) connection. Set Amazon Redshift as the destination for the ETL job.
    B. Run a full load plus CDC task in AWS Database Migration Service (AWS DMS) to continuously replicate the MySQL database changes. Set Amazon Redshift as the destination for the task.
    C. Use the Amazon AppFlow SDK to build a custom connector for the MySQL database to continuously replicate the database changes. Set Amazon Redshift as the destination for the connector.
    D. Run scheduled AWS DataSync tasks to synchronize data from the MySQL database. Set Amazon Redshift as the destination for the tasks.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.