A company will use Amazon SageMaker to train and host a machine learning (ML) model for a marketing campaign. The majority of data is sensitive customer data. The data must be encrypted at rest. The company wants AWS to maintain the root of trust for the master keys and wants encryption key usage to be logged.
Which implementation will meet these requirements?
A. Use encryption keys that are stored in AWS Cloud HSM to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3. B. Use SageMaker built-in transient keys to encrypt the ML data volumes. Enable default encryption for new Amazon Elastic Block Store (Amazon EBS) volumes. C. Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3. D. Use AWS Security Token Service (AWS STS) to create temporary tokens to encrypt the ML storage volumes, and to encrypt the model artifacts and data in Amazon S3.
C. Use customer managed keys in AWS Key Management Service (AWS KMS) to encrypt the ML data volumes, and to encrypt the model artifacts and data in Amazon S3.
Explanation
Using customer managed keys in AWS KMS will allow the company to maintain the root of trust for the master keys, and AWS KMS will log key usage. This ensures that the encryption keys used to encrypt the ML data volumes and model artifacts are properly managed and secured. Additionally, using customer managed keys allows the company to have greater control over the encryption process.
Question 172:
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords.
Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
A. Use Amazon SageMaker script mode and use train.py unchanged. Point the Amazon SageMaker training invocation to the local path of the data without reformatting the training data. B. Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3 bucket without reformatting the training data. C. Rewrite the train.py script to add a section that converts TFRecords to protobuf and ingests the protobuf data instead of TFRecords. D. Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue or AWS Lambda to reformat and store the data in an Amazon S3 bucket.
B. Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3 bucket without reformatting the training data.
Explanation
Amazon SageMaker script mode enables training a machine learning model using a script that you provide. By using the unchanged train.py script and putting the TFRecord data into an Amazon S3 bucket, you can easily point the Amazon
SageMaker training invocation to the S3 bucket without reformatting the training data.
Option B avoids rewriting the train.py script or preparing the data in a different format. It also leverages the scalability and cost-effectiveness of Amazon S3 for storing large amounts of data, which is important for training machine learning
A global company receives and processes hundreds of documents daily. The documents are in printed .pdf format or .jpg format.
A machine learning (ML) specialist wants to build an automated document processing workflow to extract text from specific fields from the documents and to classify the documents. The ML specialist wants a solution that requires low
maintenance.
Which solution will meet these requirements with the LEAST operational effort?
A. Use a PaddleOCR model in Amazon SageMaker to detect and extract the required text and fields. Use a SageMaker text classification model to classify the document. B. Use a PaddleOCR model in Amazon SageMaker to detect and extract the required text and fields. Use Amazon Comprehend to classify the document. C. Use Amazon Textract to detect and extract the required text and fields. Use Amazon Rekognition to classify the document. D. Use Amazon Textract to detect and extract the required text and fields. Use Amazon Comprehend to classify the document.
D. Use Amazon Textract to detect and extract the required text and fields. Use Amazon Comprehend to classify the document.
Explanation
Question 174:
A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container.
Which action will provide the MOST secure protection?
A. Remove Amazon S3 access permissions from the SageMaker execution role. B. Encrypt the weights of the CNN model. C. Encrypt the training and validation dataset. D. Enable network isolation for training jobs.
D. Enable network isolation for training jobs.
Explanation
Enable Network Isolation ?Set this to true when creating training, hyperparameter tuning, and inference jobs to prevent situations like malicious code being accidentally installed and transferring data to a remote host. https://aws.amazon.com/blogs/security/secure-deployment-of-amazon-sagemaker-resources/
Question 175:
A Data Scientist wants to gain real-time insights into a data stream of GZIP files. Which solution would allow the use of SQL to query the stream with the LEAST latency?
A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data. B. AWS Glue with a custom ETL script to transform the data. C. An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster. D. Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket.
A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.
Explanation
Kinesis Data Analytics can use lamda to convert GZIP and can run SQL on the converted data. https://aws.amazon.com/about-aws/whats-new/2017/10/amazon-kinesis-analytics-can-now-pre-process-data-prior-to-running-sql-queries/
Question 176:
A data scientist is working on a forecast problem by using a dataset that consists of .csv files that are stored in Amazon S3. The files contain a timestamp variable in the following format: March 1st, 2020, 08:14pm There is a hypothesis about seasonal differences in the dependent variable. This number could be higher or lower for weekdays because some days and hours present varying values, so the day of the week, month, or hour could be an
important factor. As a result, the data scientist needs to transform the timestamp into weekdays, month, and day as three separate variables to conduct an analysis. Which solution requires the LEAST operational overhead to create a new dataset with the added features?
A. Create an Amazon EMR cluster. Develop PySpark code that can read the timestamp variable as a string, transform and create the new variables, and save the dataset as a new file in Amazon S3. B. Create a processing job in Amazon SageMaker. Develop Python code that can read the timestamp variable as a string, transform and create the new variables, and save the dataset as a new file in Amazon S3. C. Create a new flow in Amazon SageMaker Data Wrangler. Import the S3 file, use the Featurize date/time transform to generate the new variables, and save the dataset as a new file in Amazon S3. D. Create an AWS Glue job. Develop code that can read the timestamp variable as a string, transform and create the new variables, and save the dataset as a new file in Amazon S3.
C. Create a new flow in Amazon SageMaker Data Wrangler. Import the S3 file, use the Featurize date/time transform to generate the new variables, and save the dataset as a new file in Amazon S3.
Explanation
Question 177:
A media company is building a computer vision model to analyze images that are on social media. The model consists of CNNs that the company trained by using images that the company stores in Amazon S3. The company used an Amazon SageMaker training job in File mode with a single Amazon EC2 On-Demand Instance.
Every day, the company updates the model by using about 10,000 images that the company has collected in the last 24 hours. The company configures training with only one epoch. The company wants to speed up training and lower costs without the need to make any code changes.
Which solution will meet these requirements?
A. Instead of File mode, configure the SageMaker training job to use Pipe mode. Ingest the data from a pipe. B. Instead of File mode, configure the SageMaker training job to use FastFile mode with no other changes. C. Instead of On-Demand Instances, configure the SageMaker training job to use Spot Instances. Make no other changes, D. Instead of On-Demand Instances, configure the SageMaker training job to use Spot Instances, implement model checkpoints.
A. Instead of File mode, configure the SageMaker training job to use Pipe mode. Ingest the data from a pipe.
Explanation
Question 178:
A company hosts a machine learning (ML) dataset repository on Amazon S3. A data scientist is preparing the repository to train a model. The data scientist needs to redact personally identifiable information (PH) from the dataset. Which solution will meet these requirements with the LEAST development effort?
A. Use Amazon SageMaker Data Wrangler with a custom transformation to identify and redact the PII. B. Create a custom AWS Lambda function to read the files, identify the PII. and redact the PII C. Use AWS Glue DataBrew to identity and redact the PII D. Use an AWS Glue development endpoint to implement the PII redaction from within a notebook
A. Use Amazon SageMaker Data Wrangler with a custom transformation to identify and redact the PII.
Explanation
Question 179:
A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.
The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:
Based on the model evaluation results, why is this a viable model for production?
A. The model is 86% accurate and the cost incurred by the company as a result of false negatives is less than the false positives. B. The precision of the model is 86%, which is less than the accuracy of the model. C. The model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives. D. The precision of the model is 86%, which is greater than the accuracy of the model.
C. The model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives.
Explanation
Question 180:
A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users.
What should the Specialist do to meet this objective?
A. Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR. B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR. C. Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR. D. Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR.
B. Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR.
Explanation
Many developers want to implement the famous Amazon model that was used to power the "People who bought this also bought these items" feature on Amazon.com. This model is based on a method called Collaborative Filtering. It takes items such as movies, books, and products that were rated highly by a set of users and recommending them to other users who also gave them high ratings. This method works well in domains where explicit ratings or implicit user actions can be gathered and analyzed.
Nowadays, the certification exams become more and more important and required by more and more
enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare
for the exam in a short time with less efforts? How to get a ideal result and how to find the
most reliable resources? Here on Vcedump.com, you will find all the answers.
Vcedump.com provide not only Amazon exam questions,
answers and explanations but also complete assistance on your exam preparation and certification
application. If you are confused on your MLS-C01 exam preparations
and Amazon certification application, do not hesitate to visit our
Vcedump.com to find your solutions here.