A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable information (PII) about customers within the data. The company's privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.
The data engineer needs to modify the current process to scan for the custom PII categories across multiple datasets within the data lake.
Which solution will meet these requirements with the LEAST operational overhead?
A. Manually review the data for custom PII categories. B. Implement custom data quality rules in Data Brew. Apply the custom rules across datasets. C. Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew. D. Implement regex patterns to extract PII information from fields during extract transform, and load (ETL) operations into the data lake.
B. Implement custom data quality rules in Data Brew. Apply the custom rules across datasets.
Explanation
The data engineer needs to detect custom categories of PII within the data lake using AWS Glue DataBrew . While DataBrew provides standard data quality rules, the solution must support custom PII categories.
Option B: Implement custom data quality rules in DataBrew. Apply the custom rules across datasets.This option is the most efficient because DataBrew allows the creation ofcustom data quality rules that can be applied to detect specific data patterns, including custom PII categories. This approach minimizes operational overhead while ensuring that the specific privacy requirements are met. Options A, C, and D either involve manual intervention or developing custom scripts, both of which increase operational effort compared to using DataBrew's built-in capabilities.
Question 212:
A nance company receives data from third-party data providers and stores the data as objects in an Amazon S3 bucket.
The company ran an AWS Glue crawler on the objects to create a data catalog. The AWS Glue crawler created multiple tables. However, the company expected that the crawler would create only one table.
The company needs a solution that will ensure the AVS Glue crawler creates only one table.
Which combination of solutions will meet this requirement? (Choose two.)
A. Ensure that the object format, compression type, and schema are the same for each object. B. Ensure that the object format and schema are the same for each object. Do not enforce consistency for the compression type of each object. C. Ensure that the schema is the same for each object. Do not enforce consistency for the file format and compression type of each object. D. Ensure that the structure of the prefix for each S3 object name is consistent. E. Ensure that all S3 object names follow a similar pattern.
A. Ensure that the object format, compression type, and schema are the same for each object. D. Ensure that the structure of the prefix for each S3 object name is consistent.
Question 213:
A company is developing an application that runs on Amazon EC2 instances. Currently, the data that the application generates is temporary. However, the company needs to persist the data, even if the EC2 instances are terminated.
A data engineer must launch new EC2 instances from an Amazon Machine Image (AMI) and configure the instances to preserve the data.
Which solution will meet this requirement?
A. Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume that contains the application data. Apply the default settings to the EC2 instances. B. Launch new EC2 instances by using an AMI that is backed by a root Amazon Elastic Block Store (Amazon EBS) volume that contains the application data. Apply the default settings to the EC2 instances. C. Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances. D. Launch new EC2 instances by using an AMI that is backed by an Amazon Elastic Block Store (Amazon EBS) volume. Attach an additional EC2 instance store volume to contain the application data. Apply the default settings to the EC2 instances.
C. Launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume. Attach an Amazon Elastic Block Store (Amazon EBS) volume to contain the application data. Apply the default settings to the EC2 instances.
Explanation
Amazon EC2 instances can use two types of storage volumes: instance store volumes and Amazon EBS volumes. Instance store volumes are ephemeral, meaning they are only attached to the instance for the duration of its life cycle. If the instance is stopped, terminated, or fails, the data on the instance store volume is lost. Amazon EBS volumes are persistent, meaning they can be detached from the instance and attached to another instance, and the data on the volume is preserved. To meet the requirement of persisting the data even if the EC2 instances are terminated, the data engineer must use Amazon EBS volumes to store the application data. The solution is to launch new EC2 instances by using an AMI that is backed by an EC2 instance store volume, which is the default option for most AMIs. Then, the data engineer must attach an Amazon EBS volume to each instance and configure the application to write the data to the EBS volume. This way, the data will be saved on the EBS volume and can be accessed by another instance if needed. The data engineer can apply the default settings to the EC2 instances, as there is no need to modify the instance type, security group, or IAM role for this solution. The other options are either not feasible or not optimal. Launching new EC2 instances by using an AMI that is backed by an EC2 instance store volume that contains the application data (option A) or by using an AMI that is backed by a root Amazon EBS volume that contains the application data (option B) would not work, as the data on the AMI would be outdated and overwritten by the new instances. Attaching an additional EC2 instance store volume to contain the application data (option D) would not work, as the data on the instance store volume would be lost if the instance is terminated.
Question 214:
A data engineer creates an AWS Lambda function that an Amazon EventBridge event will invoke. When the data engineer tries to invoke the Lambda function by using an EventBridge event, an AccessDeniedException message appears.
How should the data engineer resolve the exception?
A. Ensure that the trust policy of the Lambda function execution role allows EventBridge to assume the execution role. B. Ensure that both the IAM role that EventBridge uses and the Lambda function's resource-based policy have the necessary permissions. C. Ensure that the subnet where the Lambda function is deployed is configured to be a private subnet. D. Ensure that EventBridge schemas are valid and that the event mapping configuration is correct.
B. Ensure that both the IAM role that EventBridge uses and the Lambda function's resource-based policy have the necessary permissions.
Question 215:
A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake.
The company uses AWS Glue with an AWS Glue Data Catalog.
A data engineer needs to publish AWS Glue Data Quality scores to the Amazon DataZone portal.
Which solution will meet this requirement?
A. Create a data quality ruleset with Data Quality De nition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source. B. configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. De fine a data quality ruleset inside the jobs. configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source. C. Create a data quality ruleset with Data Quality De nition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source. D. configure AWS Glue ETL jobs to use an Evaluate Data Quality transform. De fine a data quality ruleset inside the jobs. configure the Amazon DataZone project to have an Amazon Redshift data source. Enable the data quality configuration for the data source.
C. Create a data quality ruleset with Data Quality De nition language (DQDL) rules that apply to a specific AWS Glue table. Schedule the ruleset to run daily. configure the Amazon DataZone project to have an AWS Glue data source. Enable the data quality configuration for the data source.
Explanation
Data Quality Ruleset: Creating a ruleset with Data Quality Definition Language (DQDL) rules allows for defining and evaluating data quality on specific AWS Glue tables, enabling automated checks on data quality.
Scheduled Execution: Running the ruleset daily ensures that data quality scores are regularly updated.
AWS Glue Data Source in Amazon DataZone: Configuring Amazon DataZone with an AWS Glue data source enables seamless integration, allowing data quality scores from AWS Glue Data Quality to be published to the Amazon DataZone portal.
Question 216:
A data engineer needs to design a data pipeline that invokes an AWS Glue job. After the AWS Glue job finishes successfully, the pipeline needs to invoke three AWS Lambda functions. The pipeline must be serverless. The data engineer wants to see the entire pipeline lineage in a single interface.
Which solution will meet these requirements?
A. Configure a workflow in AWS Step Functions that invokes the AWS Glue job and the Lambda functions. View the lineage in Step Functions Workflow Studio. B. Deploy an Apache Airflow workflow that invokes the AWS Glue job and the Lambda functions. View the lineage in the Airflow UI. C. Build the pipeline in the AWS Glue job. Invoke the Lambda functions after the AWS Glue job runs. Use Amazon CloudWatch Logs Insights to view the lineage. D. Deploy a workflow in AWS Step Functions to invoke the AWS Glue job. In the job code, invoke the Lambda functions before the job finishes. View the lineage from the AWS Glue UI.
A. Configure a workflow in AWS Step Functions that invokes the AWS Glue job and the Lambda functions. View the lineage in Step Functions Workflow Studio.
Question 217:
A company uses an AWS Lambda function to transfer les from a legacy SFTP environment to Amazon S3 buckets. The Lambda function is VPC enabled to ensure that all communications between the Lambda function and other AVS services that are in the same VPC environment will occur over a secure network.
The Lambda function is able to connect to the SFTP environment successfully. However, when the Lambda function attempts to upload les to the S3 buckets, the Lambda function returns timeout errors. A data engineer must resolve the timeout issues in a secure way.
Which solution will meet these requirements in the MOST cost-effective way?
A. Create a NAT gateway in the public subnet of the VPC. Route network traffic to the NAT gateway. B. Create a VPC gateway endpoint for Amazon S3. Route network traffic to the VPC gateway endpoint. C. Create a VPC interface endpoint for Amazon S3. Route network traffic to the VPC interface endpoint. D. Use a VPC internet gateway to connect to the internet. Route network traffic to the VPC internet gateway.
B. Create a VPC gateway endpoint for Amazon S3. Route network traffic to the VPC gateway endpoint.
Question 218:
A retail company uses AWS Glue for extract, transform, and load (ETL) operations on a dataset that contains information about customer orders. The company wants to implement specific validation rules to ensure data accuracy and consistency.
Which solution will meet these requirements?
A. Use AWS Glue job bookmarks to track the data for accuracy and consistency. B. Create custom AWS Glue Data Quality rulesets to de fine specific data quality checks. C. Use the built-in AWS Glue Data Quality transforms for standard data quality validations. D. Use AWS Glue Data Catalog to maintain a centralized data schema and metadata repository.
B. Create custom AWS Glue Data Quality rulesets to de fine specific data quality checks.
Question 219:
A data engineer has implemented data quality rules in 1,000 AWS Glue Data Catalog tables. Because of a recent change in business requirements, the data engineer must edit the data quality rules.
How should the data engineer meet this requirement with the LEAST operational overhead?
A. Create a pipeline in AWS Glue ETL to edit the rules for each of the 1,000 Data Catalog tables. Use an AWS Lambda function to call the corresponding AWS Glue job for each Data Catalog table. B. Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits. C. Create an Amazon EMR cluster. Run a pipeline on Amazon EMR that edits the rules for each Data Catalog table. Use an AWS Lambda function to run the EMR pipeline. D. Use the AWS Management Console to edit the rules within the Data Catalog.
B. Create an AWS Lambda function that makes an API call to AWS Glue Data Quality to make the edits.
Question 220:
A hotel management company receives daily data files from each of its hotels. The company wants to upload its data to AWS. The company plans to use Amazon Athena to access the files. The company needs to protect the files from accidental deletion. The company will develop an application on its on-premises servers to automatically forward the files to a fully managed AWS ingestion service.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use AWS DataSync to replicate data from the on-premises servers to Amazon Elastic File System (Amazon EFS). Configure automatic backups in AWS Backup. B. Use the Amazon Kinesis Agent on the on-premises servers to send data to Amazon Data Firehose. Store the data in an Amazon S3 bucket that has versioning enabled. C. Use AWS Glue jobs to ingest data from the on-premises servers into Amazon RDS. Enable automated backups for data protection. D. Use a self-managed Apache Kafka agent on the on-premises servers to stream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK). Store the data in an Amazon S3 bucket with versioning enabled.
B. Use the Amazon Kinesis Agent on the on-premises servers to send data to Amazon Data Firehose. Store the data in an Amazon S3 bucket that has versioning enabled.
Explanation
Kinesis Agent on-prem sends files to fully managed Kinesis Data Firehose, which delivers them to Amazon
S3. Enabling S3 versioning protects against accidental deletions, and Athena can query directly from S3 - meeting requirements with minimal operational overhead.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your DATA-ENGINEER-ASSOCIATE exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.