MLS-C01 Practice Questions & Online Exam Preparation

MLS-C01 Exam Details

Exam Code
:MLS-C01
Exam Name
:AWS Certified Machine Learning - Specialty (MLS-C01)
Certification
:Amazon Certifications
Vendor
:Amazon
Total Questions
:396 Q&As
Last Updated
:May 26, 2026

Amazon MLS-C01 Online Questions & Answers

Question 221:

A gaming company has launched an online game where people can start playing for free but they need to pay if they choose to use certain features The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year The company has gathered a labeled dataset from 1 million users
The training dataset consists of 1.000 positive samples (from users who ended up paying within 1 year) and 999.000 negative samples (from users who did not use any paid features) Each data sample consists of 200 features including user
age, device, location, and play patterns
Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set However, the prediction results on a test dataset were not satisfactory.
Which of the following approaches should the Data Science team take to mitigate this issue? (Select TWO.)
A. Add more deep trees to the random forest to enable the model to learn more features.
B. indicate a copy of the samples in the test database in the training dataset
C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.
D. Change the cost function so that false negatives have a higher impact on the cost value than false positives
E. Change the cost function so that false positives have a higher impact on the cost value than false negatives

C. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.
D. Change the cost function so that false negatives have a higher impact on the cost value than false positives
Explanation
Question 222:

A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy.
Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs? (Choose two.)
A. Add L1 regularization to the classifier
B. Add features to the dataset
C. Perform recursive feature elimination
D. Perform t-distributed stochastic neighbor embedding (t-SNE)
E. Perform linear discriminant analysis

A. Add L1 regularization to the classifier
C. Perform recursive feature elimination
Explanation
Key Words:
1.100 continuous numerical features ?too many features
2.No feature selection has been done
3.Easy interpretation - direct relationship between X and Y are preferred
4.gap between the training and validation set accuracy ?overfitting
A: L1 regularization solves overfitting, interpretation is easy, direct relationships between x and y
B: More features, Overfitting will be worse.
C: Recursive feature elimination solves overfitting, interpretation is easy, direct relationships between x and y
D: Perform t-distributed stochastic neighbor embedding (t-SNE)= Amazon's favorite dimensionality reduction technique, frequently show up in the questions. However, same as PCA, less interpretable. You won't be able to see the direct impact of relevant features on the model outcome.
E: If you have more than two classes then Linear Discriminant Analysis is the preferred linear classification technique.
Question 223:

A company is building an application that can predict spam email messages based on email text. The company can generate a few thousand human-labeled datasets that contain a list of email messages and a label of "spam" or "not spam" for each email message. A machine learning (ML) specialist wants to use transfer learning with a Bidirectional Encoder Representations from Transformers (BERT) model that is trained on English Wikipedia text data.
What should the ML specialist do to initialize the model to fine-tune the model with the custom data?
A. Initialize the model with pretrained weights in all layers except the last fully connected layer.
B. Initialize the model with pretrained weights in all layers. Stack a classifier on top of the first output position. Train the classifier with the labeled data.
C. Initialize the model with random weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
D. Initialize the model with pretrained weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.

D. Initialize the model with pretrained weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
Explanation
Question 224:

An online advertising company is developing a linear model to predict the bid price of advertisements in real time with low-latency predictions. A data scientist has trained the linear model by using many features, but the model is overfitting the training dataset. The data scientist needs to prevent overfitting and must reduce the number of features.
Which solution will meet these requirements?
A. Retrain the model with L1 regularization applied.
B. Retrain the model with L2 regularization applied.
C. Retrain the model with dropout regularization applied.
D. Retrain the model by using more data.

A. Retrain the model with L1 regularization applied.
Explanation
https://www.google.com/url?sa=tandrct=jandq=andesrc=sandsource=webandcd=andcad=rjaanduact=8andved=2ahukewim5ygd0-x9ahxpycakhsalakoqfnoeca0qaqandurl=https%3a%2f%2fdocs.aws.amazon.com%2fmachine-learning%2flatest%2fdg%2fmodel-fitunderfitting-vs-overfitting.htmlandusg=aovvaw2jwlt-j0jrsweidyjezi_s
Question 225:

A data engineer needs to provide a team of data scientists with the appropriate dataset to run machine learning training jobs. The data will be stored in Amazon S3. The data engineer is obtaining the data from an Amazon Redshift database and is using join queries to extract a single tabular dataset. A portion of the schema is as follows:
1.TransactionTimestamp (Timestamp)
2.CardName (Varchar)
3.CardNo (Varchar)
The data engineer must provide the data so that any row with a CardNo value of NULL is removed. Also, the TransactionTimestamp column must be separated into a TransactionDate column and a TransactionTime column. Finally, the CardName column must be renamed to NameOnCard. The data will be extracted on a monthly basis and will be loaded into an S3 bucket. The solution must minimize the effort that is needed to set up infrastructure for the ingestion and transformation. The solution also must be automated and
must minimize the load on the Amazon Redshift cluster. Which solution meets these requirements?
A. Set up an Amazon EMR cluster. Create an Apache Spark job to read the data from the Amazon Redshift cluster and transform the data. Load the data into the S3 bucket. Schedule the job to run monthly.
B. Set up an Amazon EC2 instance with a SQL client tool, such as SQL Workbench/J, to query the data from the Amazon Redshift cluster directly Export the resulting dataset into a file. Upload the file into the S3 bucket. Perform these tasks monthly.
C. Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination. Use the built-in transforms Filter, Map, and RenameField to perform the required transformations. Schedule the job to run monthly.
D. Use Amazon Redshift Spectrum to run a query that writes the data directly to the S3 bucket. Create an AWS Lambda function to run the query monthly.

C. Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination. Use the built-in transforms Filter, Map, and RenameField to perform the required transformations. Schedule the job to run monthly.
Explanation
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-transforms.html
Question 226:

A machine learning (ML) specialist collected daily product usage data for a group of customers. The ML specialist appended customer metadata such as age and gender from an external data source.
The ML specialist wants to understand product usage patterns for each day of the week for customers in specific age groups. The ML specialist creates two categorical features named dayofweek and binned_age, respectively.
Which approach should the ML specialist use discover the relationship between the two new categorical features?
A. Create a scatterplot for day_of_week and binned_age.
B. Create crosstabs for day_of_week and binned_age.
C. Create word clouds for day_of_week and binned_age.
D. Create a boxplot for day_of_week and binned_age.

B. Create crosstabs for day_of_week and binned_age.
Explanation
Question 227:

A retail chain has been ingesting purchasing records from its network of 20,000 stores to Amazon S3 using Amazon Kinesis Data Firehose To support training an improved machine learning model, training records will require new but simple transformations, and some attributes will be combined The model needs lo be retrained daily
Given the large number of stores and the legacy data ingestion, which change will require the LEAST amount of development effort?
A. Require that the stores to switch to capturing their data locally on AWS Storage Gateway for loading into Amazon S3 then use AWS Glue to do the transformation
B. Deploy an Amazon EMR cluster running Apache Spark with the transformation logic, and have the cluster run each day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3
C. Spin up a fleet of Amazon EC2 instances with the transformation logic, have them transform the data records accumulating on Amazon S3, and output the transformed records to Amazon S3.
D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehouse stream that transforms raw record attributes into simple transformed values using SQL.

D. Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehouse stream that transforms raw record attributes into simple transformed values using SQL.
Explanation
https://docs.aws.amazon.com/kinesisanalytics/latest/java/examples-s3.html
Question 228:

A machine learning (ML) engineer is integrating a production model with a customer metadata repository for real-time inference. The repository is hosted in Amazon SageMaker Feature Store. The engineer wants to retrieve only the latest version of the customer metadata record for a single customer at a time.
Which solution will meet these requirements?
A. Use the SageMaker Feature Store BatchGetRecord API with the record identifier. Filter to find the latest record.
B. Create an Amazon Athena query to retrieve the data from the feature table.
C. Create an Amazon Athena query to retrieve the data from the feature table. Use the write_time value to find the latest record.
D. Use the SageMaker Feature Store GetRecord API with the record identifier.

D. Use the SageMaker Feature Store GetRecord API with the record identifier.
Explanation
Question 229:

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena The dataset contains more than 800.000 records stored as plaintext CSV files Each record contains 200 columns and is approximately 1 5 MB in size Most queries will span 5 to 10 columns only
How should the Machine Learning Specialist transform the dataset to minimize query runtime?
A. Convert the records to Apache Parquet format
B. Convert the records to JSON format
C. Convert the records to GZIP CSV format
D. Convert the records to XML format

A. Convert the records to Apache Parquet format
Explanation
Using compressions will reduce the amount of data scanned by Amazon Athena, and also reduce your S3 bucket storage. It's a Win-Win for your AWS bill. Supported formats: GZIP, LZO, SNAPPY (Parquet) and ZLIB.
https://www.cloudforecast.io/blog/using-parquet-on-athena-to-save-money-on-aws/
Question 230:

A real estate company wants to create a machine learning model for predicting housing prices based on a historical dataset. The dataset contains 32 features. Which model will meet the business requirement?
A. Logistic regression
B. Linear regression
C. K-means
D. Principal component analysis (PCA)

B. Linear regression
Explanation

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your MLS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

MLS-C01 Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon MLS-C01 Online Questions & Answers

Question 221:

Question 222:

Question 223:

Question 224:

Question 225:

Question 226:

Question 227:

Question 228:

Question 229:

Question 230:

Related Exams:

AIF-C01

AIP-C01

ANS-C00

ANS-C01

AXS-C01

BDS-C00

CLF-C02

DAS-C01

DATA-ENGINEER-ASSOCIATE

DBS-C01

Tips on How to Prepare for the Exams

Amazon MLS-C01 Online Practice Questions and Exam Preparation

MLS-C01 Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Amazon MLS-C01 Online Questions & Answers

Question 221:

Question 222:

Question 223:

Question 224:

Question 225:

Question 226:

Question 227:

Question 228:

Question 229:

Question 230:

Related Exams:

Tips on How to Prepare for the Exams