A company uses AWS Glue ETL pipelines to process data. The company uses Amazon Athena to analyze data in an Amazon S3 bucket.
To better understand shipping timelines, the company decides to collect and store shipping and delivery dates in addition to order data. The company adds a data quality check to ensure that shipping date is greater than order date and that delivery date is greater than shipping date. Orders that fail the quality check must be stored in a second S3 bucket.
Which solution will meet these requirements MOST cost-effectively?
A. Use the AWS Glue DataBrew DATEDIFF function to create two additional columns. Check the new columns. B. Use Athena to query all three date columns, and compare the columns. C. Use AWS Glue Data Quality to create a custom rule that uses the three date columns. D. Use an AWS Glue crawler to populate an AWS Glue Data Catalog. Use the three date columns to create a filter.
C. Use AWS Glue Data Quality to create a custom rule that uses the three date columns.
Question 232:
A media company wants to build a real-time analytics pipeline to process customer activity events across the company's website and mobile app. The company wants to build a solution to ingest millions of events with minimum latency. The solution must be scalable and durable enough so that no data is lost.
Which solution will meet these requirements in the MOST cost-effective way?
A. Set up an Amazon Kinesis Data Streams pipeline to ingest data, process the data by using AWS Lambda functions, and store the results in Amazon Redshift for analytics. B. Schedule an AWS Glue job to fetch user interaction logs every 10 minutes from Amazon S3. Configure the AWS Glue job to transform and store the data in Amazon Redshift for analytics. C. Configure Amazon S3 Event Notifications to invoke an AWS Lambda function to process every new interaction log file. Store the result in Amazon Redshift for analytics. D. Deploy an Amazon Managed Streaming for Apache Kafka (Amazon MSK) cluster. Use self-managed consumers to process and distribute data in real-time. Integrate with Amazon Redshift for enhanced analytics.
A. Set up an Amazon Kinesis Data Streams pipeline to ingest data, process the data by using AWS Lambda functions, and store the results in Amazon Redshift for analytics.
Question 233:
A company must copy files from an on-premises NFS server to Amazon S3 every night. The solution must transfer only changed data after the first run and must avoid custom file-transfer scripts.
Which service should a data engineer use?
A. AWS DataSync B. Amazon AppFlow C. Amazon Kinesis Data Streams D. Amazon Redshift Spectrum
A. AWS DataSync
Explanation
AWS DataSync is designed to transfer file and object data between on-premises storage and AWS storage services such as Amazon S3, and it can perform scheduled incremental transfers. AppFlow integrates SaaS and AWS application data. Kinesis Data Streams handles streaming records. Redshift Spectrum queries data in S3 but does not transfer files from on-premises servers.
Question 234:
A company is building a new application that ingests CSV files into Amazon Redshift. The company has developed the frontend for the application.
The files are stored in an Amazon S3 bucket. Files are no larger than 5 MB.
A data engineer is developing the extract, transform, and load (ETL) pipeline for the CSV files. The data engineer configured a Redshift cluster and an AWS Lambda function that copies the data out of the files into the Redshift cluster.
Which additional steps should the data engineer perform to meet these requirements?
A. Configure the bucket to send S3 event notifications to Amazon EventBridge. Configure an EventBridge rule that matches S3 new object created events. Set the Lambda function as the target. B. Configure the $3 bucket to send S3 event notifications to an Amazon Simple Queue Service (Amazon SQS) queue. Configure the Lambda function to proce the queue. C. Configure AWS Database Migration Service (AWS DMS) to stream new S3 objects to a data stream in Amazon Kinesis Data Streams. Set the Lambda function as the target of the data stream. D. Configure an Amazon EventBridge rule that matches S3 new object created events. Set an Amazon Simple Queue Service (Amazon SQS) queue as the target of the rule. Configure the Lambda function to proce the queue.
A. Configure the bucket to send S3 event notifications to Amazon EventBridge. Configure an EventBridge rule that matches S3 new object created events. Set the Lambda function as the target.
Explanation
By sending S3 "Object Created" events to EventBridge and matching those events with a rule that invokes your Lambda function, you trigger your ETL whenever a new CSV lands in S3, without extra polling or queue management. This direct, event-driven pattern keeps operational overhead to a minimum.
Question 235:
A company needs to collect logs for an Amazon RDS for MySQL database and make the logs available for audits. The logs must track each user that modifies data in the database or makes changes to the database instance.
Which solution will meet these requirements?
A. Enable Amazon CloudWatch Logs. Create metric filters to monitor database changes and instance-level changes. Configure automated notification systems to send near real-time alerts for suspicious database operations. B. Configure an Amazon EventBridge rule to monitor database activity. Create an AWS Lambda function to process EventBridge events and store them in Amazon OpenSearch Service. C. Configure AWS CloudTrail to log API calls. Use Amazon CloudWatch Logs for basic monitoring. Use IAM policies to control access to the logs. Set up scheduled reporting for log audits. D. Enable and configure native Amazon RDS database audit logging. Enable Amazon CloudWatch Logs. Configure metric filters and alarms. Configure AWS CloudTrail audit logging.
D. Enable and configure native Amazon RDS database audit logging. Enable Amazon CloudWatch Logs. Configure metric filters and alarms. Configure AWS CloudTrail audit logging.
Question 236:
A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change.
A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon EMR to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark. B. Use AWS Glue to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark. C. Create a PvSpark proqram in AWS Lambda to extract, transform, and load the data into the S3 bucket. D. Create a stored procedure in Amazon Redshift to detect the schema and to extract, transform, and load the data into a Redshift Spectrum table. Access the table from Amazon S3.
B. Use AWS Glue to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
Explanation
AWS Glue is a fully managed service that provides a serverless data integration platform. It can automatically discover and categorize data from various sources, including SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. It can also infer the schema of the data and store it in the AWS Glue Data Catalog, which is a central metadata repository. AWS Glue can then use the schema information to generate and run Apache Spark code to extract, transform, and load the data into an Amazon S3 bucket. AWS Glue can also monitor and optimize the performance and cost of the data pipeline, and handle any schema changes that may occur in the source data. AWS Glue can meet the SLA of loading the data into the S3 bucket within 15 minutes of data creation, as it can trigger the data pipeline based on events, schedules, or on-demand. AWS Glue has the least operational overhead among the options, as it does not require provisioning, configuring, or managing any servers or clusters. It also handles scaling, patching, and security automatically.
Question 237:
A company uses Apache Airflow DAGs for complex data pipeline orchestration. The company wants a managed AWS service for Airflow so the data engineering team does not operate Airflow servers, schedulers, and web servers.
Which service should the company use?
A. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) B. Amazon Managed Streaming for Apache Kafka (Amazon MSK) C. Amazon Managed Grafana D. AWS Database Migration Service
A. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Explanation
Amazon MWAA is the managed AWS service for Apache Airflow environments. Amazon MSK manages Apache Kafka clusters for streaming. Managed Grafana supports dashboards and visualization. AWS DMS performs database migration and replication tasks.
Question 238:
A company is planning to upgrade its Amazon Elastic Block Store (Amazon EBS) General Purpose SSD storage from gp2 to gp3. The company wants to prevent any interruptions in its Amazon EC2 instances that will cause data loss during the migration to the upgraded storage.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create snapshots of the gp2 volumes. Create new gp3 volumes from the snapshots. Attach the new gp3 volumes to the EC2 instances. B. Create new gp3 volumes. Gradually transfer the data to the new gp3 volumes. When the transfer is complete, mount the new gp3 volumes to the EC2 instances to replace the gp2 volumes. C. Change the volume type of the existing gp2 volumes to gp3. Enter new values for volume size, IOPS, and throughput. D. Use AWS DataSync to create new gp3 volumes. Transfer the data from the original gp2 volumes to the new gp3 volumes.
C. Change the volume type of the existing gp2 volumes to gp3. Enter new values for volume size, IOPS, and throughput.
Explanation
Changing the volume type of the existing gp2 volumes to gp3 is the easiest and fastest way to migrate to the new storage type without any downtime or data loss. You can use the AWS Management Console, the AWS CLI, or the Amazon EC2 API to modify the volume type, size, IOPS, and throughput of your gp2 volumes. The modification takes effect immediately, and you can monitor the progress of the modification using CloudWatch. The other options are either more complex or require additional steps, such as creating snapshots, transferring data, or attaching new volumes, which can increase the operational overhead and the risk of errors.
Question 239:
A team is implementing data quality checks in AWS Glue ETL. The team needs fixed checks for required columns and wants anomaly detection for row count trends after historical statistics are available.
Which capabilities should the team use? (Choose two.)
A. Define AWS Glue Data Quality rules with DQDL, such as completeness checks for required columns. B. Enable AWS Glue Data Quality anomaly detection for supported metrics such as row count. C. Use S3 Versioning as the primary way to detect missing column values. D. Use Redshift VACUUM to detect row count anomalies in S3 files. E. Use AWS Transfer Family user mappings as data quality rules.
A. Define AWS Glue Data Quality rules with DQDL, such as completeness checks for required columns. B. Enable AWS Glue Data Quality anomaly detection for supported metrics such as row count.
Explanation
AWS Glue Data Quality rules express expectations such as completeness, and anomaly detection can evaluate supported metrics such as row count after enough observations exist. S3 Versioning preserves object versions but does not validate values. Redshift VACUUM maintains Redshift storage. Transfer Family user mappings are unrelated to data validation.
Question 240:
A company has an on-premises PostgreSQL database that contains customer data. The company wants to migrate the customer data to an Amazon Redshift data warehouse. The company has established a VPN connection between the on-premises database and AWS.
The on-premises database is continuously updated. The company must ensure that the data in Amazon Redshift is updated as quickly as poible.
Which solution will meet these requirements?
A. Use the pg_dump utility to generate a backup of the PostgreSQL database. Use the AWS Schema Conversion Tool (AWS SCT) to upload the backup to Amazon Redshift. Set up a cron job to perform a backup. Upload the backup to Amazon Redshift every night. B. Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to use the change data capture (CDC) feature. C. Use the pg_dump utility to generate a backup of the PostgreSQL database. Upload the backup to an Amazon S3 bucket. Use the COPY command to import the data into Amazon Redshift. D. Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to perform a full load of the database to Amazon Redshift every night.
B. Create an AWS Database Migration Service (AWS DMS) full-load task. Set Amazon Redshift as the target. Configure the task to use the change data capture (CDC) feature.
Explanation
AWS DMS can perform an initial full load of your on-premises PostgreSQL data into Redshift and then continuously capture and apply changes (CDC) over the VPN. This approach keeps your Redshift tables up to date with minimal latency and operational overhead.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.