During a security review, a company identified a vulnerability in an AWS Glue job. The company discovered that credentials to access an Amazon Redshift cluster were hard coded in the job script.
A data engineer must remediate the security vulnerability in the AWS Glue job. The solution must securely store the credentials.
Which combination of steps should the data engineer take to meet these requirements? (Choose two.)
A. Store the credentials in the AWS Glue job parameters. B. Store the credentials in a configuration file that is in an Amazon S3 bucket. C. Access the credentials from a configuration file that is in an Amazon S3 bucket by using the AWS Glue job. D. Store the credentials in AWS Secrets Manager. E. Grant the AWS Glue job IAM role access to the stored credentials.
D. Store the credentials in AWS Secrets Manager. E. Grant the AWS Glue job IAM role access to the stored credentials.
Explanation
AWS Secrets Manager is a service that allows you to securely store and manage secrets, such as database credentials, API keys, passwords, etc. You can use Secrets Manager to encrypt, rotate, and audit your secrets, as well as to control access to them using fine-grained policies. AWS Glue is a fully managed service that provides a serverless data integration platform for data preparation, data cataloging, and data loading. AWS Glue jobs allow you to transform and load data from various sources into various targets, using either a graphical interface (AWS Glue Studio) or a code-based interface (AWS Glue console or AWS Glue API). Storing the credentials in AWS Secrets Manager and granting the AWS Glue job IAM role access to the stored credentials will meet the requirements, as it will remediate the security vulnerability in the AWS Glue job and securely store the credentials. By using AWS Secrets Manager, you can avoid hard coding the credentials in the job script, which is a bad practice that exposes the credentials to unauthorized access or leakage. Instead, you can store the credentials as a secret in Secrets Manager and reference the secret name or ARN in the job script. You can also use Secrets Manager to encrypt the credentials using AWS Key Management Service (AWS KMS), rotate the credentialsautomatically or on demand, and monitor the access to the credentials using AWS CloudTrail. By granting the AWS Glue job IAM role access to the stored credentials, you can use the principle of least privilege to ensure that only the AWS Glue job can retrieve the credentials from Secrets Manager. You can also use resource-based or tag-based policies to further restrict the access to the credentials. The other options are not as secure as storing the credentials in AWS Secrets Manager and granting the AWS Glue job IAM role access to the stored credentials. Storing the credentials in the AWS Glue job parameters will not remediate the security vulnerability, as the job parameters are still visible in the AWS Glue console and API. Storing the credentials in a configuration file that is in an Amazon S3 bucket and accessing the credentials from the configuration file by using the AWS Glue job will not be as secure as using Secrets Manager, as the configuration file may not be encrypted or rotated, and the access to the file may not be audited or controlled.
References:
AWS Secrets Manager
AWS Glue
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide, Chapter 6: Data Integration and Transformation, Section 6.1: AWS Glue
Question 242:
A company saves customer data to an Amazon S3 bucket. The company uses server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the bucket. The dataset includes personally identifiable information (PII) such as social security numbers and account details.
Data that is tagged as PII must be masked before the company uses customer data for analysis. Some users must have secure access to the PII data during the preprocessing phase. The company needs a low-maintenance solution to mask and secure the PII data throughout the entire engineering pipeline.
Which combination of solutions will meet these requirements? (Choose Two.)
A. Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis. B. Use Amazon GuardDuty to monitor access patterns for the PII data that is used in the engineering pipeline. C. Configure an Amazon Made discovery job for the S3 bucket. D. Use AWS Identity and Access Management (IAM) to manage permissions and to control access to the PII data. E. Write custom scripts in an application to mask the PII data and to control access.
A. Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis. D. Use AWS Identity and Access Management (IAM) to manage permissions and to control access to the PII data.
Explanation
To address the requirement of masking PII data and ensuring secure access throughout the data pipeline, the combination of AWS Glue DataBrew and IAM provides a low-maintenance solution.
A. AWS Glue DataBrew for Masking: AWS Glue DataBrew provides a visual tool to perform data transformations , including masking PII data. It allows for easy configuration of data transformation tasks without requiring manual coding, making it ideal for this use case.
Question 243:
A company has a data pipeline that processes transaction data in real time. The company needs a notification system that alerts different teams based on the type of processing error without any delay. For security-related errors, the system must immediately notify the security team. For data validation errors, the system must notify the data quality team. For system errors, the system must notify the operations team.
Which solution will meet these requirements with the LEAST operational overhead?
A. Create an Amazon Simple Notification Service (Amazon SNS) topic with an AWS Lambda function subscriber that evaluates the error type and forwards the error to the appropriate email addresses. B. Configure Amazon EventBridge rules with distinct event patterns for each error type. Route each error type to a dedicated Amazon Simple Notification Service (Amazon SNS) topic for team-specific alerts. C. Use Amazon Simple Queue Service (Amazon SQS) with message attributes to categorize errors. Allow each team to poll their respective SQS queue for relevant errors. D. Set up Amazon CloudWatch alarms with different metrics for each error type. Invoke a different Amazon Simple Notification Service (Amazon SNS) notification each time a metrics threshold is crossed.
B. Configure Amazon EventBridge rules with distinct event patterns for each error type. Route each error type to a dedicated Amazon Simple Notification Service (Amazon SNS) topic for team-specific alerts.
Question 244:
A company runs an Apache Spark application every night in an Amazon EMR cluster. The company uses Amazon EC2 instances to supply compute capacity for the EMR cluster. The company deployed the Spark application in cluster mode. An error occurs in the Spark application. A log for the error is stored in the application's Spark driver standard error logs. A data engineer needs to investigate the error.
Where can the data engineer find this error log?
A. The engineer can connect to the web UI on the live cluster to see the YARN ResourceManager logs. B. The engineer can connect to the persistent application UI to see the first YARN container log in the Spark UI. C. The engineer can connect to the Amazon EMR console to see the Amazon EMR step logs that are archived in Amazon S3. D. The engineer can connect to the primary node of the cluster by using SSH to see the Spark history server logs.
C. The engineer can connect to the Amazon EMR console to see the Amazon EMR step logs that are archived in Amazon S3.
Explanation
In EMR cluster mode, the Spark driver runs inside the cluster and its standard error logs are captured as step logs. These logs are automatically archived to Amazon S3 and accessible from the Amazon EMR console under step logs. This is the correct location for investigating Spark driver errors.
Question 245:
A legal company is building a data pipeline to power an application that will handle peak traffic during business hours. The application will provide information about relevant laws and available lawyers. The legal document database will be updated one time each day.
The application must display up-to-date lawyer availability from a calendar database and provide complex full-text search of legal documents. The company wants to use AWS Glue for extract, transform, and load (ETL) processes. Lawyer availability information must be current within 5 minutes of any schedule changes.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use AWS Step Functions to orchestrate AWS Glue jobs with time-based triggers and event-based triggers. Store processed data in Amazon S3. Use Amazon RDS for the search functionality. B. Use AWS Step Functions to orchestrate AWS Glue jobs with time-based triggers and event-based triggers. Store processed data in Amazon S3. Use Amazon OpenSearch Service for full-text search capabilities. C. Use AWS Glue workflows with time-based triggers and event-based triggers. Store processed data in Amazon DynamoDB. Create a custom search solution by using AWS Lambda functions. D. Use Amazon EventBridge to schedule all AWS Glue jobs. Store processed data in Amazon RDS. Use Amazon Kendra for full-text search capabilities.
B. Use AWS Step Functions to orchestrate AWS Glue jobs with time-based triggers and event-based triggers. Store processed data in Amazon S3. Use Amazon OpenSearch Service for full-text search capabilities.
Question 246:
An AWS Glue job connects to an Amazon Redshift cluster by using database credentials. Security policy requires periodic credential rotation and requires the ETL code to avoid hardcoding database passwords.
Which solution should the data engineer implement?
A. Store the database credentials in AWS Secrets Manager, enable automatic rotation, and grant the Glue job role permission to retrieve the secret. B. Store the database password in the Glue job arguments and update the argument value when the password changes. C. Store the password in the ETL script in an encrypted comment and decrypt it during job startup. D. Store the password in an S3 object that allows public read access and rotate the S3 object key each month.
A. Store the database credentials in AWS Secrets Manager, enable automatic rotation, and grant the Glue job role permission to retrieve the secret.
Explanation
Secrets Manager can store database credentials and rotate supported database secrets automatically.
The Glue job role can retrieve the secret at runtime without embedding credentials in code or job parameters. Job arguments and scripts expose sensitive values to operators and logs more easily. Public
S3 storage is not an acceptable credential store.
Question 247:
A company uses Amazon RDS for MySQL as the database for a critical application. The database workload is mostly writes, with a small number of reads.
A data engineer notices that the CPU utilization of the DB instance is very high. The high CPU utilization is slowing down the application. The data engineer must reduce the CPU utilization of the DB Instance.
Which actions should the data engineer take to meet this requirement? (Choose two.)
A. Use the Performance Insights feature of Amazon RDS to identify queries that have high CPU utilization. Optimize the problematic queries. B. Modify the database schema to include additional tables and indexes. C. Reboot the RDS DB instance once each week. D. Upgrade to a larger instance size. E. Implement caching to reduce the database query load.
A. Use the Performance Insights feature of Amazon RDS to identify queries that have high CPU utilization. Optimize the problematic queries. D. Upgrade to a larger instance size.
Question 248:
A company has a data lake on AWS. The data lake ingests sources of data from business units. The company uses Amazon Athena for queries. The storage layer is Amazon S3 with an AWS Glue Data Catalog as a metadata repository.
The company wants to make the data available to data scientists and business analysts. However, the company first needs to manage ne-grained, column-level data access for Athena based on the user roles and responsibilities.
Which solution will meet these requirements?
A. Set up AWS Lake Formation. De fine security policy-based rules for the users and applications by IAM role in Lake Formation. B. De fine an IAM resource-based policy for AWS Glue tables. Attach the same policy to IAM user groups. C. De fine an IAM identity-based policy for AWS Glue tables. Attach the same policy to IAM roles. Associate the IAM roles with IAM groups that contain the users. D. Create a resource share in AWS Resource Access Manager (AWS RAM) to grant access to IAM users.
A. Set up AWS Lake Formation. De fine security policy-based rules for the users and applications by IAM role in Lake Formation.
Question 249:
A manufacturing company has many IoT devices in facilities around the world. The company uses Amazon Kinesis Data Streams to collect data from the devices. The data includes device ID, capture date, measurement type, measurement value, and facility ID. The company uses facility ID as the partition key.
The company's operations team recently observed many WriteThroughputExceeded exceptions. The operations team found that some shards were heavily used but other shards were generally idle.
How should the company resolve the issues that the operations team observed?
A. Change the partition key from facility ID to a randomly generated key. B. Increase the number of shards. C. Archive the data on the producer's side. D. Change the partition key from facility ID to capture date.
A. Change the partition key from facility ID to a randomly generated key.
Question 250:
A retail company has a customer data hub in an Amazon S3 bucket. Employees from many countries use the data hub to support company-wide analytics. A governance team must ensure that the company's data analysts can access data only for customers who are within the same country as the analysts.
Which solution will meet these requirements with the LEAST operational effort?
A. Create a separate table for each country's customer data. Provide access to each analyst based on the country that the analyst serves. B. Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies. C. Move the data to AWS Regions that are close to the countries where the customers are. Provide access to each analyst based on the country that the analyst serves. D. Load the data into Amazon Redshift. Create a view for each country. Create separate IAM roles for each country to provide access to data from each country. Assign the appropriate roles to the analysts.
B. Register the S3 bucket as a data lake location in AWS Lake Formation. Use the Lake Formation row-level security features to enforce the company's access policies.
Explanation
AWS Lake Formation is a service that allows you to easily set up, secure, and manage data lakes. One of the features of Lake Formation is row-level security, which enables you to control access to specific rows or columns of data based on the identity or role of the user. This feature is useful for scenarios where you need to restrict access to sensitive or regulated data, such as customer data from different countries. By registering the S3 bucket as a data lake location in Lake Formation, you can use the Lake Formation console or APIs to define and apply row-level security policies to the data in the bucket. You can also use Lake Formation blueprints to automate the ingestion and transformation of data from various sources into the data lake. This solution requires theleast operational effort compared to the other options, as it does not involve creating or moving data, or managing multiple tables, views, or roles.
References:
AWS Lake Formation
Row-Level Security
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide, Chapter 4: Data Lakes and Data Warehouses, Section 4.2: AWS Lake Formation
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.