A company needs a solution to store and query product data that has variable attributes. The solution must support unpredictable and high-volume queries with single-digit millisecond latency, even during sudden traffic spikes. The solution must retrieve items by a primary identifier named Product ID. The solution must allow flexible queries by secondary attributes named Category and Brand.
Which solutionwill meet these requirements?
A. Use an Amazon DynamoDB table with on-demand capacity to store product data. Store products by primary key. Use global secondary indexes (GSIs) to store secondary attributes. B. Use Amazon Aurora with a Multi-AZ deployment to store product data. Use read replicas. Create indexes for primary and secondary attributes. C. Use an Amazon OpenSearch Serverless cluster with dynamic scaling to store product data. Index product data by primary and secondary attributes. D. Use Amazon ElastiCache (Redis OSS) and Amazon S3 to store product data. Use Amazon Athena to run flexible secondary attribute queries.
A. Use an Amazon DynamoDB table with on-demand capacity to store product data. Store products by primary key. Use global secondary indexes (GSIs) to store secondary attributes.
Question 322:
A media company uploads large video files to Amazon S3 for processing. After processing, the company needs to keep the original files for 90 days in case the files require reprocessing. After 90 days, the company can delete the files to reduce storage costs. The company stores the processed videos in a different S3 bucket.
Which S3 Lifecycle configuration will meet these requirements for the original files MOST cost-effectively?
A. Store the files in S3 Standard for 90 days. Transition the files to S3 Glacier Flexible Retrieval for long-term storage. Then expire the files. B. Store the files in S3 Standard for 90 days. Enable versioning. Enable Object Lock on the files for 90 days. Then expire the files. C. Store the files in S3 Standard for 90 days. Implement S3 Lifecycle management to expire the files. D. Store the files in S3 Intelligent-Tiering for 90 days. Enable versioning. Add S3 Lifecycle management to expire the files.
C. Store the files in S3 Standard for 90 days. Implement S3 Lifecycle management to expire the files.
Question 323:
A company wants to combine data from multiple software as a service (SaaS) applications for analysis.
A data engineering team needs to use Amazon QuickSight to perform the analysis and build dashboards.
A data engineer needs to extract the data from the SaaS applications and make the data available for QuickSight queries.
Which solution will meet these requirements in the MOST operationally efficient way?
A. Create AWS Lambda functions that call the required APIs to extract the data from the applications. Store the data in an Amazon S3 bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight. B. Use AWS Lambda functions as Amazon Athena data source connectors to run federated queries against the SaaS applications. Create an Athena data source and a dataset in QuickSight. C. Use Amazon AppFlow to create a flow for each SaaS application. Set an Amazon S3 bucket as the destination. Schedule the flows to extract the data to the bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight. D. Export data the from the SaaS applications as Microsoft Excel files. Create a data source and a dataset in QuickSight by uploading the Excel files.
C. Use Amazon AppFlow to create a flow for each SaaS application. Set an Amazon S3 bucket as the destination. Schedule the flows to extract the data to the bucket. Use AWS Glue to catalog the data in the S3 bucket. Create a data source and a dataset in QuickSight.
Explanation
Amazon AppFlow provides fully managed, no-code connectors to a broad range of SaaS applications. By scheduling flows that land data in S3, you avoid building and maintaining custom API-calling code. Glue crawlers can catalog the files and make them available to QuickSight, giving you an end-to-end pipeline with minimal operational effort.
Question 324:
A data engineer needs to optimize the performance of a data pipeline that handles retail orders. Data about the orders is ingested daily into an Amazon S3 bucket.
The data engineer runs queries once each week to extract metrics from the orders data based the order date for multiple date ranges. The data engineer needs an optimization solution that ensures the query performance will not degrade when the volume of data increases.
Which solution will meet this requirement MOST cost-effectively?
A. Partition the data based on order date. Use Amazon Athena to query the data. B. Partition the data based on order date. Use Amazon Redshift to query the data. C. Partition the data based on load date. Use Amazon EMR to query the data. D. Partition the data based on load date. Use Amazon Aurora to query the data.
A. Partition the data based on order date. Use Amazon Athena to query the data.
Explanation
Partitioning the S3 data by the order date lets Athena prune scans to only the relevant date folders, keeping query times stable as data grows. Because Athena is a serverless, pay-per-query service, you only pay for the data scanned, making it the most cost-effective way to run your weekly date-range metrics.
Question 325:
A company stores objects in an Amazon S3 bucket. The company crawls the objects so that Amazon Athena can query the data.
A data engineer manually moved all objects from the partition with a path prefix of status=01 to the prefix status=02. The status=01 partition location is now empty. However, the status=01 partition location still appears in the AWS Glue Data Catalog metadata.
Which Athena command should the data engineer run to resolve the metadata discrepancy?
A. MSCK REPAIR TABLE B. ALTER TABLE DROP PARTITION C. ALTER TABLE SET TBLPROPERTIES D. ALTER TABLE CHANGE COLUMN
B. ALTER TABLE DROP PARTITION
Question 326:
A retail company uses an Amazon Redshift data warehouse and an Amazon S3 bucket. The company ingests retail order data into the S3 bucket every day.
The company stores all order data at a single path within the S3 bucket. The data has more than 100 columns. The company ingests the order data from a third-party application that generates more than 30 files in CSV format every day. Each CSV file is between 50 and 70 MB in size. The company uses Amazon Redshift Spectrum to run queries that select sets of columns. Users aggregate metrics based on daily orders. Recently, users have reported that the performance of the queries has degraded. A data engineer must resolve the performance issues for the queries.
Which combination of steps will meet this requirement with LEAST developmental effort? (Choose Two.)
A. Configure the third-party application to create the files in a columnar format. B. Develop an AWS Glue ETL job to convert the multiple daily CSV files to one file for each day. C. Partition the order data in the S3 bucket based on order date. D. Configure the third-party application to create the files in JSON format. E. Load the JSON data into the Amazon Redshift table in a SUPER type column.
A. Configure the third-party application to create the files in a columnar format. C. Partition the order data in the S3 bucket based on order date.
Explanation
The performance issue in Amazon Redshift Spectrum queries arises due to the nature of CSV files, which are row-based storage formats. Spectrum is more optimized for columnar formats, which significantly improve performance by reducing the amount of data scanned. Also, partitioning data based on relevant columns like order date can further reduce the amount of data scanned, as queries can focus only on the necessary partitions.
A. Configure the third-party application to create the files in a columnar format: Columnar formats(like Parquet or ORC) store data in a way that is optimized for analytical queries because they allow queries to scan only the columns required, rather than scanning all columns in a row-based format like CSV. Amazon Redshift Spectrum works much more efficiently with columnar formats, reducing the amount of data that needs to be scanned, which improves query performance.
Question 327:
A company has an application that is deployed on AWS. The application uses Amazon Simple Notification Service (Amazon SNS) with multiple topics. The company's security team needs to be able to audit all Publish and PublishBatch API actions for all the SNS topics. The company's application team and security team must also be able to query the audit data. The company has already established an event data store in AWS CloudTrail Lake to collect all events.
Which solution will meet these requirements with the LEAST operational overhead?
A. Enable management events for the SNS topics. Create a table in AWS Glue Data Catalog. Query the data by using Amazon Athena. B. Enable management events for the SNS topics. Use CloudTrail Lake to query the audit data. C. Enable data events for the SNS topics. Use CloudTrail Lake to query the audit data. D. Enable data events for the SNS topics. Create a table in AWS Glue Data Catalog. Query the data by using Amazon Athena.
C. Enable data events for the SNS topics. Use CloudTrail Lake to query the audit data.
Question 328:
A lab uses IoT sensors to monitor humidity, temperature, and pressure for a project. The sensors send 100 KB of data every 10 seconds. A downstream process will read the data from an Amazon S3 bucket every
30 seconds.
Which solution will deliver the data to the S3 bucket with the LEAST latency?
A. Use Amazon Kinesis Data Streams and Amazon Kinesis Data Firehose to deliver the data to the S3 bucket. Use the default buffer interval for Kinesis Data Firehose. B. Use Amazon Kinesis Data Streams to deliver the data to the S3 bucket. configure the stream to use 5 provisioned shards. C. Use Amazon Kinesis Data Streams and call the Kinesis Client Library to deliver the data to the S3 bucket. Use a 5 second buffer interval from an application. D. Use Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) and Amazon Kinesis Data Firehose to deliver the data to the S3 bucket. Use a 5 second buffer interval for Kinesis Data Firehose.
C. Use Amazon Kinesis Data Streams and call the Kinesis Client Library to deliver the data to the S3 bucket. Use a 5 second buffer interval from an application.
Question 329:
A data engineer manages Athena external tables over Hive-style partitions in Amazon S3. New partitions are added daily, and occasionally old partition locations are manually deleted.
Which statements are correct? (Choose two.)
A. MSCK REPAIR TABLE can add compatible new partitions that exist in S3 but are missing from table metadata. B. ALTER TABLE DROP PARTITION can remove stale partition metadata after partition data is deleted from S3. C. MSCK REPAIR TABLE automatically removes partition metadata for deleted S3 prefixes. D. S3 Lifecycle rules update AWS Glue Data Catalog partition metadata automatically. E. DynamoDB TTL is required before Athena can query partitioned S3 data.
A. MSCK REPAIR TABLE can add compatible new partitions that exist in S3 but are missing from table metadata. B. ALTER TABLE DROP PARTITION can remove stale partition metadata after partition data is deleted from S3.
Explanation
MSCK REPAIR TABLE scans S3 for compatible new partitions and adds them to metadata. It does not remove stale partitions, so ALTER TABLE DROP PARTITION is the appropriate metadata operation when a partition location is deleted.
S3 Lifecycle controls object retention, not catalog updates. DynamoDB TTL is unrelated to Athena partition querying.
Question 330:
An ecommerce company wants to use AWS to migrate data pipelines from an on-premises environment into the AWS Cloud. The company currently uses a third-party too in the on-premises environment to orchestrate data ingestion processes.
The company wants a migration solution that does not require the company to manage servers. The solution must be able to orchestrate Python and Bash scripts. The solution must not require the company to refactor any code.
Which solution will meet these requirements with the LEAST operational overhead?
A. AWS Lambda B. Amazon Managed Workflows for Apache Airflow (Amazon MWAA) C. AWS Step Functions D. AWS Glue
B. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Explanation
The ecommerce company wants to migrate its data pipelines into the AWS Cloud without managing servers, and the solution must orchestrate Python and Bash scripts without refactoring code.Amazon Managed Workflows for Apache Airflow (Amazon MWAA) is the most suitable solution for this scenario.
Option B: Amazon Managed Workflows for Apache Airflow (Amazon MWAA) MWAA is a managed orchestration service that supports Python and Bash scripts via Directed Acyclic Graphs (DAGs) for workflows. It is a serverless, managed version of Apache Airflow, which is commonly used for orchestrating complex data workflows, making it an ideal choice for migrating existing pipelines without refactoring. It supports Python, Bash, and other scripting languages, and the company would not need to manage the underlying infrastructure.
Other options:
AWS Lambda (Option A) is more suited for event-driven workflows but would require breaking down the pipeline into individual Lambda functions, which may require refactoring. AWS Step Functions (Option C) is good for orchestration but lacks native support for Python and Bash without using Lambda functions, and it may require code changes. AWS Glue (Option D) is an ETL service primarily for data transformation and not suitable for orchestrating general scripts without modification.
References:
Amazon Managed Workflows for Apache Airflow (MWAA) Documentation
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.