A company uses EventBridge Scheduler to start an AWS Glue workflow every hour. When the target invocation fails, the operations team needs failed invocation events to be retained for later troubleshooting without writing custom retry storage code.
Which configuration should the data engineer add?
A. Configure a dead-letter queue for the EventBridge Scheduler target.An online retailer uses multiple delivery partners to deliver products to customers. The delivery partners send order summaries to the retailer. The retailer stores the order summaries in Amazon S3.
Some of the order summaries contain personally identifiable information (PII) about customers. A data engineer needs to detect PII in the order summaries so the company can redact the PII.
Which solution will meet these requirements with the LEAST operational overhead?
A. Amazon TextractA company uses an Amazon S3 Standard bucket to maintain a self-managed transactional data lake that uses Apache Iceberg tables. The data lake ingests data both in real time and in batches.
Users report slow performance for real-time tables. A data engineer reviews the real-time tables and notices that the tables are made up of many small data files.
The data engineer must improve the performance of the real-time tables.
Which solution will meet this requirement?
A. Expire historic snapshots.An ecommerce company processes millions of orders each day. The company uses AWS Glue ETL to collect data from multiple sources, clean the data, and store the data in an Amazon S3 bucket in CSV format by using the S3 Standard storage class. The company uses the stored data to conduct daily analysis.
The company wants to optimize costs for data storage and retrieval.
Which solution will meet this requirement?
A. Transition the data to Amazon S3 Glacier Flexible Retrieval.A company has a data lake in Amazon S3. The company collects AWS CloudTrail logs for multiple applications. The company stores the logs in the data lake, catalogs the logs in AWS Glue, and partitions the logs based on the year. The company uses Amazon Athena to analyze the logs.
Recently, customers reported that a query on one of the Athena tables did not return any data. A data engineer must resolve the issue.
Which combination of troubleshooting steps should the data engineer take? (Choose Two.)
A. Confirm that Athena is pointing to the correct Amazon S3 location.A data engineer is building a data orchestration workflow. The data engineer plans to use a hybrid model that includes some on-premises resources and some resources that are in the cloud. The data engineer wants to prioritize portability and open source resources.
Which service should the data engineer use in both the on-premises environment and the cloud-based environment?
A. AWS Data ExchangeA company adds new data to a large CSV file in an Amazon S3 bucket every day. The file contains company sales data from the previous 5 years. The file currently includes more than 5,000 rows. The CSV file structure is shown below with sample data:

The company needs to use Amazon Athena to run queries on the CSV file to fetch data from a specific time period.
Which solution will meet this requirement MOST cost-effectively?
A. Write an Apache Spark script to convert the CSV data to JSON format. Create an AWS Glue job to run the script every day. Catalog the JSON data in AWS Glue. Run the Athena queries on the JSON data.A data engineer must ingest a source of structured data that is in .csv format into an Amazon S3 data lake.
The .csv files contain 15 columns. Data analysts need to run Amazon Athena queries on one or two columns of the dataset. The data analysts rarely query the entire file.
Which solution will meet these requirements MOST cost-effectively?
A. Use an AWS Glue PySpark job to ingest the source data into the data lake in .csv format.A company wants to build a dimension table in an Amazon S3 bucket. The bucket contains historical data that includes 10 million records. The historical data is 1 TB in size.
A data engineer needs a solution to update changes for up to 10,000 records in the base table every day.
Which solution will meet this requirement with the LOWEST runtime?
A. Develop an Apache Spark job in Amazon EMR to read the historical data and the new changes into two Spark DataFrames. Use the Spark update method to update the base table.A data engineer needs to maintain a central metadata repository that users access through Amazon EMR and Amazon Athena queries. The repository needs to provide the schema and properties of many tables.
Some of the metadata is stored in Apache Hive. The data engineer needs to import the metadata from Hive into the central metadata repository.
Which solution will meet these requirements with the LEAST development effort?
A. Use Amazon EMR and Apache Ranger.Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.