A healthcare company stores patient records in an on-premises MySQL database. The company creates an application to access the MySQL database. The company must enforce security protocols to protect the patient records. The company currently rotates database credentials every 30 days to minimize the risk of unauthorized access.
The company wants a solution that does require the company to modify the application code for each credential rotation.
Which solution will meet this requirement with the LEAST operational overhead?
A. Assign an IAM role access permissions to the database. Configure the application to obtain temporary credentials through the IAM role. B. Use AWS Key Management Service (AWS KMS) to generate encryption keys. Configure automatic key rotation. Store the encrypted credentials in an Amazon DynamoDB table. C. Use AWS Secrets Manager to automatically rotate credentials. Allow the application to retrieve the credentials by using API calls. D. Store credentials in an encrypted Amazon S3 bucket. Rotate the credentials every month by using an S3 Lifecycle policy. Use bucket policies to control access.
C. Use AWS Secrets Manager to automatically rotate credentials. Allow the application to retrieve the credentials by using API calls.
Question 132:
A data engineer must operationalize monitoring for data pipelines. The team needs application logs for troubleshooting and API event history for audit investigations.
Which services should the engineer use? (Choose two.)
A. Amazon CloudWatch Logs B. AWS CloudTrail C. AWS Cost Explorer D. Amazon ECR E. Amazon S3 Glacier Deep Archive only
A. Amazon CloudWatch Logs B. AWS CloudTrail
Explanation
CloudWatch Logs stores and supports analysis of application and service logs. CloudTrail records AWS API activity for audit and traceability. Cost Explorer is for cost analysis. ECR stores container images. S3 Glacier Deep Archive is a storage class and does not provide pipeline log collection or API audit history by itself.
Question 133:
A company uses AWS Key Management Service (AWS KMS) to encrypt an Amazon Redshift cluster. The company wants to configure a cross-Region snapshot of the Redshift cluster as part of disaster recovery (DR) strategy.
A data engineer needs to use the AWS CLI to create the cross-Region snapshot.
Which combination of steps will meet these requirements? (Choose Two.)
A. Create a KMS key and configure a snapshot copy grant in the source AWS Region. B. In the source AWS Region, enable snapshot copying. Specify the name of the snapshot copy grant that is created in the destination AWS Region. C. In the source AWS Region, enable snapshot copying. Specify the name of the snapshot copy grant that is created in the source AWS Region. D. Create a KMS key and configure a snapshot copy grant in the destination AWS Region. E. Convert the cluster to a Multi-AZ deployment.
A. Create a KMS key and configure a snapshot copy grant in the source AWS Region. C. In the source AWS Region, enable snapshot copying. Specify the name of the snapshot copy grant that is created in the source AWS Region.
Question 134:
A company has a data warehouse in Amazon Redshift. To comply with security regulations, the company needs to log and store all user activities and connection activities for the data warehouse.
Which solution will meet these requirements?
A. Create an Amazon S3 bucket. Enable logging for the Amazon Redshift cluster. Specify the S3 bucket in the logging configuration to store the logs. B. Create an Amazon Elastic File System (Amazon EFS) file system. Enable logging for the Amazon Redshift cluster. Write logs to the EFS file system. C. Create an Amazon Aurora MySQL database. Enable logging for the Amazon Redshift cluster. Write the logs to a t D. Create an Amazon Elastic Block Store (Amazon EBS) volume. Enable logging for the Amazon Redshift cluster. Write the logs to the EBS volume.
A. Create an Amazon S3 bucket. Enable logging for the Amazon Redshift cluster. Specify the S3 bucket in the logging configuration to store the logs.
Explanation
Problem Analysis:
The company must log all user activities and connection activities in Amazon Redshift for security compliance.
Key Considerations:
Redshift supports audit logging , which can be configured to write logs to an S3 bucket.
S3 provides durable, scalable, and cost-effective storage for logs.
Solution Analysis:
Option A: S3 for Logging Standard approach for storing Redshift logs.
Easy to set up and manage with minimal cost.
Option B: Amazon EFS
EFS is unnecessary for this use case and less cost-efficient than S3.
Option C: Aurora MySQL
Using a database to store logs increases complexity and cost.
Option D: EBS Volume
EBS is not a scalable option for log storage compared to S3.
Final Recommendation:
Enable Redshift audit logging and specify an S3 bucket as the destination.
References:
Amazon Redshift Audit Logging
Storing Logs in Amazon S3
Question 135:
A company maintains multiple extract, transform, and load (ETL) workflows that ingest data from the company's operational databases into an Amazon S3 based data lake. The ETL workflows use AWS Glue and Amazon EMR to process data.
The company wants to improve the existing architecture to provide automated orchestration and to require minimal manual effort.
Which solution will meet these requirements with the LEAST operational overhead?
A. AWS Glue workflows B. AWS Step Functions tasks C. AWS Lambda functions D. Amazon Managed workflows for Apache Airflow (Amazon MWAA) workflows
B. AWS Step Functions tasks
Question 136:
A data engineer is using Amazon QuickSight to build a dashboard to report a company's revenue in multiple AWS Regions. The data engineer wants the dashboard to display the total revenue for a Region, regardless of the drill-down levels shown in the visual.
Which solution will meet these requirements?
A. Create a table calculation. B. Create a simple calculated field. C. Create a level-aware calculation - aggregate (LAC-A) function. D. Create a level-aware calculation - window (LAC-W) function.
C. Create a level-aware calculation - aggregate (LAC-A) function.
Question 137:
A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company's application uses the PutRecord action to send data to Kinesis Data Streams.
A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline.
Which solution will meet this requirement?
A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source. B. Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events. C. Design the data source so events are not ingested into Kinesis Data Streams multiple times. D. Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.
A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
Question 138:
A data engineer uses Amazon Kinesis Data Streams to ingest and process records that contain user behavior data from an application every day.
The data engineer notices that the data stream is experiencing throttling because hot shards receive much more data than other shards in the data stream.
How should the data engineer resolve the throttling issue?
A. Use a random partition key to distribute the ingested records. B. Increase the number of shards in the data stream. Distribute the records across the shards. C. Limit the number of records that are sent each second by the producer to match the capacity of the stream. D. Decrease the size of the records that the producer sends to match the capacity of the stream.
B. Increase the number of shards in the data stream. Distribute the records across the shards.
Explanation
Amazon Kinesis Data Streams distribute data across multiple shards, with each shard having its own capacity for read and write operations. Throttling occurs when one or more shards, referred to as "hot shards," receive significantly more data than they can handle. To resolve this, increasing the number of shards in the data stream and redistributing the records across the shards is the appropriate solution. This approach ensures that the workload is spread more evenly, thereby preventing throttling on individual shards.
Question 139:
A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.
A data engineer notices that one of the fields in the source data includes values that are in JSON format.
How should the data engineer load the JSON data into the data warehouse with the LEAST effort?
A. Use the SUPER data type to store the data in the Amazon Redshift table. B. Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table. C. Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data. D. Use an AWS Lambda function to flatten the JSON data. Store the data in Amazon S3.
A. Use the SUPER data type to store the data in the Amazon Redshift table.
Explanation
In Amazon Redshift, the SUPER data type is designed specifically to handle semi-structured data like JSON, Parquet, ORC, and others. By using the SUPER data type, Redshift can ingest and query JSON data without requiring complex data flattening processes, thus reducing the amount of preprocessing required before loading the data. The SUPER data type also works seamlessly with Redshift Spectrum , enabling complex queries that can combine both structured and semi-structured datasets, which aligns with the company's need to use combined datasets to identify potential new customers.
Using the SUPER data type also allows for automatic parsing and query processing of nested data structures through Amazon Redshift's PARTITION BY and JSONPATH expressions , which makes this option the most efficient approach with the least effort involved. This reduces the overhead associated with using tools like AWS Glue or Lambda for data transformation.
References:
Amazon Redshift Documentation - SUPER Data Type
AWS Certified Data Engineer - Associate Training: Building Batch Data Analytics Solutions on AWS
AWS Certified Data Engineer - Associate Study Guide
By directly leveraging the capabilities of Redshift with the SUPER data type, the data engineer ensures streamlined JSON ingestion with minimal effort while maintaining query efficiency.
Question 140:
A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of tiles into a tact table that is in a Redshift cluster.
The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the tact table.
Which solution will meet these requirements?
A. Use multiple COPY commands to load the data into the Redshift cluster. B. Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster. C. Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node. D. Use a single COPY command to load the data into the Redshift cluster.
D. Use a single COPY command to load the data into the Redshift cluster.
Explanation
To achieve the highest throughput and efficiently use cluster resources while loading data into an Amazon Redshift cluster, the optimal approach is to use a single COPY command that ingests data in parallel.
Option D: Use a single COPY command to load the data into the Redshift cluster.TheCOPY command is designed to load data from multiple files in parallel into a Redshift table, using all the cluster nodes to optimize the load process. Redshift is optimized for parallel processing, and a single COPY command can load multiple files at once, maximizing throughput. Options A, B, and Ceither involve unnecessary complexity or inefficient approaches, such as using multiple COPY commands or INSERT statements, which are not optimized for bulk loading.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.