A data analyst notices the following error message while loading data to an Amazon Redshift cluster:
“The bucket you are attempting to access must be addressed using the specified endpoint.”
What should the data analyst do to resolve this issue?
A. Specify the correct AWS Region for the Amazon S3 bucket by using the REGION option with the COPY command.
B. Change the Amazon S3 object's ACL to grant the S3 bucket owner full control of the object.
C. Launch the Redshift cluster in a VPC.
D. Configure the timeout settings according to the operating system used to connect to the Redshift cluster.
A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data. Which solution will meet these requirements with the LEAST development effort?
A. Use an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.
B. Use an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).
C. Use an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.
D. Use an AWS Lambda function to process the datasets. Store the data in Amazon S3.
A company's data analytics specialist must build a solution to implement quality checks on a dataset before the company uses the data in a sales report. The dataset is stored in an Amazon S3 bucket and is in CSV format.
The data quality checks must include identification of duplicate rows, removal of duplicate rows, and validation of date formats. The solution must run daily and must produce output data in Apache Parquet format in Amazon S3.
Which solution will meet these requirements with the LEAST development effort?
A. Create an AWS Glue ETL job that includes transformation steps to implement data quality checks. Configure the job to write to Amazon S3. Create a schedule-based job within an AWS Glue workflow to run the job daily.
B. Create an AWS Glue DataBrew job that includes data quality recipe steps to implement data quality checks. Configure the job to write to Amazon S3. Create a schedule within the DataBrew job to run the job daily.
C. Create an Amazon EMR cluster. Use an Apache Spark ETL job that includes data processing steps to implement data quality checks. Configure the job to write to Amazon S3. Create an Apache Oozie workflow to run the job daily.
D. Create an AWS Lambda function. Use custom code to implement data quality checks and to write to Amazon S3. Create an Amazon EventBridge rule to run the Lambda function daily.
A large digital advertising company has built business intelligence (BI) dashboards in Amazon QuickSight Enterprise edition to understand customer buying behavior. The dashboards use the Super-fast, Parallel, In-memory Calculation Engine (SPICE) as the in-memory engine to store the data. The company's Amazon S3 data lake provides the data for these dashboards, which are queried using Amazon Athena. The data files used by the dashboards consist of millions of records partitioned by year, month, and hour, and new data is continuously added. Every data file in the data lake has a timestamp column named CREATE_TS, which indicates when the data was added or updated.
Until now, the dashboards have been scheduled to refresh every night through a full reload. A data analyst must recommend an approach so the dashboards can be refreshed every hour, and include incremental data from the last hour.
How can the data analyst meet these requirements with the LEAST amount of operational effort?
A. Create new data partitions every hour in Athena by using the CREATE_TS column and schedule the QuickSight dataset to refresh every hour.
B. Use direct querying in QuickSight by using Athena to make refreshed data always available.
C. Use the CREATE_TS column to look back for incremental data in the last hour and schedule the QuickSight dataset to incrementally refresh every hour.
D. Create new datasets in QuickSight to do a full reload every hour and add the datasets to SPICE.
An ecommerce company extracts a large volume of data from Amazon S3, relational databases, non-relational databases, and custom data stores. A data analytics team wants to analyze the data to discover patterns by using SQL-like queries with a high degree of complexity and store the results in Amazon S3 for further analysis.
How can the data analytics team meet these requirements with the LEAST operational overhead?
A. Query the datasets with Presto running on Amazon EMR.
B. Query the datasets by using Apache Spark SQL running on Amazon EMR.
C. Use AWS Glue jobs to ETL data from various data sources to Amazon S3 and query the data with Amazon Athena.
D. Use federated query functionality in Amazon Athena.
An event ticketing website has a data lake on Amazon S3 and a data warehouse on Amazon Redshift. Two datasets exist: events data and sales data. Each dataset has millions of records.
The entire events dataset is frequently accessed and is stored in Amazon Redshift. However, only the last 6 months of sales data is frequently accessed and is stored in Amazon Redshift. The rest of the sales data is available only in Amazon
S3.
A data analytics specialist must create a report that shows the total revenue that each event has generated in the last 12 months. The report will be accessed thousands of times each week.
Which solution will meet these requirements with the LEAST operational effort?
A. Create an AWS Glue job to access sales data that is older than 6 months from Amazon S3 and to access event and sales data from Amazon Redshift. Load the results into a new table in Amazon Redshift.
B. Create a stored procedure to copy sales data that is older than 6 months and newer than 12 months from Amazon S3 to Amazon Redshift. Create a materialized view with the autorefresh option.
C. Create an AWS Lambda function to copy sales data that is older than 6 months and newer than 12 months to an Amazon Kinesis Data Firehose delivery stream. Specify Amazon Redshift as the destination of the delivery stream. Create a materialized view with the autorefresh option.
D. Create a materialized view in Amazon Redshift with the autorefresh option. Use Amazon Redshift Spectrum to include sales data that is older than 6 months.
A company has a process that writes two datasets in CSV format to an Amazon S3 bucket every 6 hours. The company needs to join the datasets, convert the data to Apache Parquet, and store the data within another bucket for users to query using Amazon Athena. The data also needs to be loaded to Amazon Redshift for advanced analytics. The company needs a solution that is resilient to the failure of any individual job component and can be restarted in case of an error.
Which solution meets these requirements with the LEAST amount of operational overhead?
A. Use AWS Step Functions to orchestrate an Amazon EMR cluster running Apache Spark. Use PySpark to generate data frames of the datasets in Amazon S3, transform the data, join the data, write the data back to Amazon S3, and load the data to Amazon Redshift.
B. Create an AWS Glue job using Python Shell that generates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job at the desired frequency.
C. Use AWS Step Functions to orchestrate the AWS Glue job. Create an AWS Glue job using Python Shell that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift.
D. Create an AWS Glue job using PySpark that creates dynamic frames of the datasets in Amazon S3, transforms the data, joins the data, writes the data back to Amazon S3, and loads the data to Amazon Redshift. Use an AWS Glue workflow to orchestrate the AWS Glue job.
A company has a partner that is supposed to put a data object into the company's Amazon S3 bucket each day. Occasionally, the partner fails to deliver the data. A data analytics specialist needs to implement a solution to automate notifications to an Amazon Simple Notification Service (Amazon SNS) topic when the partner data is missing.
Which solution will meet this requirement with the LEAST operational overhead?
A. Set up AWS CloudTrail to log bucket-level actions to Amazon CloudWatch. Use Amazon EventBridge to schedule an AWS Lambda function to run each day. Configure the function to publish a message to the SNS topic if no PutObject calls have been recorded in the last day.
B. Set up S3 Event Notifications to invoke an AWS Lambda function. Configure the function to write a custom Amazon CloudWatch metric. Configure a CloudWatch alarm to publish a message to the SNS topic when the metric is zero for 1 day.
C. Set up S3 Event Notifications to invoke an AWS Lambda function. Configure the function to write the count for the partner data to an Amazon DynamoDB table. Use Amazon EventBridge to schedule a second Lambda function to run each day. Configure the second function to verify the file counts and to publish a message to the SNS topic when data is missing.
D. Set up S3 Event Notifications to invoke an AWS Lambda function. Configure the function to import the S3 objects into an Amazon RDS for PostgreSQL database. Configure an Amazon CloudWatch alarm to publish a message to the SNS topic when the function finishes running with any errors.
A company wants to use a data lake that is hosted on Amazon S3 to provide analytics services for historical data. The data lake consists of 800 tables but is expected to grow to thousands of tables. More than 50 departments use the tables, and each department has hundreds of users. Different departments need access to specific tables and columns.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create an IAM role for each department. Use AWS Lake Formation based access control to grant each IAM role access to specific tables and columns. Use Amazon Athena to analyze the data.
B. Create an Amazon Redshift cluster for each department. Use AWS Glue to ingest into the Redshift cluster only the tables and columns that are relevant to that department. Create Redshift database users. Grant the users access to the relevant department's Redshift cluster. Use Amazon Redshift to analyze the data.
C. Create an IAM role for each department. Use AWS Lake Formation tag-based access control to grant each IAM role access to only the relevant resources. Create LF-tags that are attached to tables and columns. Use Amazon Athena to analyze the data.
D. Create an Amazon EMR cluster for each department. Configure an IAM service role for each EMR cluster to access relevant S3 files. For each department's users, create an IAM role that provides access to the relevant EMR cluster. Use Amazon EMR to analyze the data.
A company wants to build a real-time data processing and delivery solution for streaming data. The data is being streamed through an Amazon Kinesis data stream. The company wants to use an Apache Flink application to process the data before writing the data to another Kinesis data stream. The data must be stored in an Amazon S3 data lake every 60 seconds for further analytics.
Which solution will meet these requirements with the LEAST operational overhead?
A. Host the Flink application on an Amazon EMR cluster. Use Amazon Kinesis Data Firehose to write the data to Amazon S3.
B. Host the Flink application on Amazon Kinesis Data Analytics. Use AWS Glue to write the data to Amazon S3.
C. Host the Flink application on an Amazon EMR cluster. Use AWS Glue to write the data to Amazon S3.
D. Host the Flink application on Amazon Kinesis Data Analytics. Use Amazon Kinesis Data Firehose to write the data to Amazon S3.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DAS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.