A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values:

The table currently contains billions of items. The table is partitioned by RidelD and uses TripStartTime as the sort key. The company wants to use the data to build a personal interface to give drivers the ability to view the rides that each driver has completed, based on RideStatus. The solution must access the necessary data without scanning the entire table.
Which solution will meet these requirements?
A. Create a local secondary index (LSI) on DriverlD.A company has a data warehouse that contains a table that is named Sales. The company stores the table in Amazon Redshift The table includes a column that is namedcity_name.
The company wants to query the table to find all rows that have a city_name that starts with "San" or "El."
Which SQL query will meet this requirement?
A. Select * from Sales where city_name - '$(San|EI)";A company stores a large dataset in an Amazon S3 bucket. A data engineer frequently runs complex queries on the dataset by using Amazon Athena. The data engineer needs to optimize query performance and optimize costs for queries that are run multiple times with the same parameters.
Which solution will meet these requirements?
A. Convert the dataset to JSON format before running Athena queries.A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on the data to make the data accessible to business partners.
The company needs a solution that will give multiple business partners the ability to run SQL queries on the central data lake during normal business hours.
Which solution will meet these requireme nts MOST cost-effectively?
A. Use a provisioned Amazon EMR cluster after normal business hours to process the previous day's data, apply all necessary transformations, and load the prepa red data into Amazon Redshift Serverless.A data pipeline has three stages. The second stage must run only after the first stage succeeds. If the second stage fails, the pipeline must retry twice and then send a notification before stopping.
Which service should a data engineer use to coordinate this workflow with built-in state transitions and error handling?
A. AWS Step FunctionsWhich AWS service most cost-effectively orchestrates an AWS Glue ETL pipeline that crawls Microsoft SQL Server and loads data to S3?
A. AWS Step FunctionsA company is building a governed data lake on AWS. The solution must store raw and curated datasets in object storage, support SQL queries without provisioning database servers, and enforce centralized fine-grained access policies.
Which combination of services should the data engineer choose? (Choose three.)
A. Amazon S3A company has developed several AWS Glue extract, transform, and load (ETL) jobs to validate and transform data from Amazon S3. The ETL jobs load the data into Amazon RDS for MySQL in batches once every day. The ETL jobs use a DynamicFrame to read the S3 data.
The ETL jobs currently process all the data that is in the S3 bucket. However, the company wants the jobs to process only the daily incremental data.
Which solution will meet this requirement with the LEAST coding effort?
A. Create an ETL job that reads the S3 file status and logs the status in Amazon DynamoDB.A manufacturing company is setting up an IoT monitoring system that generates large, complex data streams. The company wants to store the data in an Amazon S3 data lake for real-time and historical analysis. The company needs a solution that can process data quickly, provide short query times, and use resources efficiently without slowing down data ingestion.
The solution must use a Spark streaming extract, transform, and load (ETL) job on Amazon EMR that is configured to write data to an Iceberg table.
Which solution will meet these requirements?
A. Use Amazon Kinesis Data Streams to ingest the data. Configure the Iceberg table with copy on write (CoW) mode. Enable the AWS Glue Data Catalog compaction optimizer.A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.