A company has an electronic healthcare system that contains data of patients. The data is consolidated from multiple systems and is stored in an Amazon S3 bucket in .csv format. The company has created an AWS Glue Data Catalog. The dataset contains duplicate data, and no unique keys exist to identify a patient. Fields do not match exactly across the systems.
A data analytics specialist must design a solution to identify and remove duplicates. The solution must minimize the amount of human intervention and code that are required.
The data analytics specialist starts by using labeled data to teach the FindMatches machine learning (ML) transform.
What must the data analytics specialist do next to meet these requirements?
A. Identify matches in the dataset by using an AWS Glue ETL job with Spark distinct(). Review the output by using Amazon Redshift Spectrum.
B. Identify matches in the dataset by using an AWS Glue ETL job with Spark distinct(). Create a Data Catalog of transformed results Review the output by using Amazon Athena.
C. Identify matches in the dataset by using an AWS Glue ETL job that has a transform type of "find matching records." Create a Data Catalog of transformed results. Review the output by using Amazon Athena.
D. Identify matches in the dataset by using an AWS Glue ETL job that has a transform type of "find matching records." Review the output by using Amazon Redshift Spectrum.
A company ingests a large set of sensor data in nested JSON format from different sources and stores it in an Amazon S3 bucket. The sensor data must be joined with performance data currently stored in an Amazon Redshift cluster.
A business analyst with basic SQL skills must build dashboards and analyze this data in Amazon QuickSight. A data engineer needs to build a solution to prepare the data for use by the business analyst. The data engineer does not know the structure of the JSON file. The company requires a solution with the least possible implementation effort.
Which combination of steps will create a solution that meets these requirements? (Choose three.)
A. Use an AWS Glue ETL job to convert the data into Apache Parquet format and write to Amazon S3.
B. Use an AWS Glue crawler to catalog the data.
C. Use an AWS Glue ETL job with the ApplyMapping class to un-nest the data and write to Amazon Redshift tables.
D. Use an AWS Glue ETL job with the Regionalize class to un-nest the data and write to Amazon Redshift tables.
E. Use QuickSight to create an Amazon Athena data source to read the Apache Parquet files in Amazon S3.
F. Use QuickSight to create an Amazon Redshift data source to read the native Amazon Redshift tables.
A mining company is using Amazon S3 as its data lake. The company wants to analyze the data collected by the sensors in its mines. A data pipeline is being built to capture data from the sensors, ingest the data into an S3 bucket, and convert the data to Apache Parquet format. The data pipeline must be processed in near-real time. The data will be used for on-demand queries with Amazon Athena.
Which solution will meet these requirements?
A. Use Amazon Kinesis Data Firehose to invoke an AWS Lambda function that converts the data to Parquet format and stores the data in Amazon S3.
B. Use Amazon Kinesis Data Streams to invoke an AWS Lambda function that converts the data to Parquet format and stores the data in Amazon S3.
C. Use AWS DataSync to invoke an AWS Lambda function that converts the data to Parquet format and stores the data in Amazon S3.
D. Use Amazon Simple Queue Service (Amazon SQS) to stream data directly to an AWS Glue job that converts the data to Parquet format and stores the data in Amazon S3.
An online gaming company wants to read customer data from Amazon Kinesis Data Streams and deliver the data to an Amazon S3 data lake for analytics. The data contains customer_id as one of the attributes. The data consumers need the data to be partitioned by customer_id in the S3 data lake.
Which solution will meet this requirement with the LEAST effort?
A. Create an Amazon Kinesis Data Firehose delivery stream. Use dynamic partitioning to partition the data by customer_id before delivering the data to Amazon S3.
B. Create an AWS Glue streaming job Use the built-in map transform to partition the data by customer_id before delivering the data to Amazon S3.
C. Create an AWS Lambda function. Use Lambda layers to partition the data by customer_id before delivering the data to Amazon S3.
D. Create an Amazon EMR cluster. Run an Apache Spark job to automatically partition the data by customer_id before delivering the data to Amazon S3.
A retail company uses Amazon Aurora MySQL for its operational data store and Amazon Redshift for its data warehouse. The MySQL database resides in a different VPC than the Redshift cluster. Data analysts need to query data in both MySQL and Amazon Redshift to provide business insights. The company wants the lowest network latency between the two VPCs.
Which combination of solutions meet these requirements? (Choose two.)
A. Set up VPC peering between the MySQL VPC and the Redshift VPC.
B. Set up a transit gateway to connect the MySQL VPC with the Redshift VPC.
C. Use a Redshift federated query to retrieve live data from the MySQL database. Create a late-binding view to combine the MySQL data with the Redshift data.
D. Use Amazon Redshift Spectrum to retrieve live data from the MySQL database. Create a late-binding view to combine the MySQL data with the Redshift data.
E. Use the Redshift COPY command to constantly copy live data from MySQL to the Redshift cluster. Create a late-binding view to combine the MySQL data with the Redshift data.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DAS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.