MLS-C01 Exam Details

  • Exam Code
    :MLS-C01
  • Exam Name
    :AWS Certified Machine Learning - Specialty (MLS-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :396 Q&As
  • Last Updated
    :May 26, 2026

Amazon MLS-C01 Online Questions & Answers

  • Question 91:

    A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.

    Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)

    A. Amazon SageMaker seq2seq algorithm
    B. Amazon SageMaker BlazingText algorithm in Skip-gram mode
    C. Amazon SageMaker Object2Vec algorithm
    D. Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode
    E. Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)

  • Question 92:

    A Machine Learning Specialist is configuring automatic model tuning in Amazon SageMaker When using the hyperparameter optimization feature, which of the following guidelines should be followed to improve optimization? Choose the maximum number of hyperparameters supported by

    A. Amazon SageMaker to search the largest number of combinations possible
    B. Specify a very large hyperparameter range to allow Amazon SageMaker to cover every possible value.
    C. Use log-scaled hyperparameters to allow the hyperparameter space to be searched as quickly as possible
    D. Execute only one hyperparameter tuning job at a time and improve tuning through successive rounds of experiments

  • Question 93:

    A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance.

    What type of machine learning model should be used?

    A. Classification month-to-month using supervised learning of the 200 categories based on claim contents.
    B. Reinforcement learning using claim IDs and timestamps where the agent will identify how many claims in each category to expect from month to month.
    C. Forecasting using claim IDs and timestamps to identify how many claims in each category to expect from month to month.
    D. Classification with supervised learning of the categories for which partial information on claim contents is provided, and forecasting using claim IDs and timestamps for all other categories.

  • Question 94:

    A manufacturing company needs to identify returned smartphones that have been damaged by moisture. The company has an automated process that produces 2,000 diagnostic values for each phone. The database contains more than five million phone evaluations. The evaluation process is consistent, and there are no missing values in the data. A machine learning (ML) specialist has trained an Amazon SageMaker linear learner ML model to classify phones as moisture damaged or not moisture damaged by using all available features. The model's F1 score is 0.6. Which changes in model training would MOST likely improve the model's F1 score? (Choose two.)

    A. Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the SageMaker principal component analysis (PCA) algorithm.
    B. Continue to use the SageMaker linear learner algorithm. Reduce the number of features with the scikit-learn multi-dimensional scaling (MDS) algorithm.
    C. Continue to use the SageMaker linear learner algorithm. Set the predictor type to regressor.
    D. Use the SageMaker k-means algorithm with k of less than 1,000 to train the model.
    E. Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reduction target of less than 1,000 to train the model.

  • Question 95:

    A law firm handles thousands of contracts every day. Every contract must be signed. Currently, a lawyer manually checks all contracts for signatures.

    The law firm is developing a machine learning (ML) solution to automate signature detection for each contract. The ML solution must also provide a confidence score for each contract page.

    Which Amazon Textract API action can the law firm use to generate a confidence score for each page of each contract?

    A. Use the AnalyzeDocument API action. Set the FeatureTypes parameter to SIGNATURES. Return the confidence scores for each page.
    B. Use the Prediction API call on the documents. Return the signatures and confidence scores for each page.
    C. Use the StartDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page.
    D. Use the GetDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page.

  • Question 96:

    A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values.

    Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamptes the data daily and exports the data for further modeling.

    Which solution will meet these requirements with the LEAST implementation effort?

    A. Use Amazon EMR Serveriess with PySpark.
    B. Use AWS Glue DataBrew.
    C. Use Amazon SageMaker Studio Data Wrangler.
    D. Use Amazon SageMaker Studio Notebook with Pandas.

  • Question 97:

    A bank's Machine Learning team is developing an approach for credit card fraud detection The company has a large dataset of historical data labeled as fraudulent The goal is to build a model to take the information from new transactions and predict whether each transaction is fraudulent or not.

    Which built-in Amazon SageMaker machine learning algorithm should be used for modeling this problem?

    A. Seq2seq
    B. XGBoost
    C. K-means
    D. Random Cut Forest (RCF)

  • Question 98:

    An automotive company uses computer vision in its autonomous cars. The company trained its object detection models successfully by using transfer learning from a convolutional neural network (CNN). The company trained the models by using PyTorch through the Amazon SageMaker SDK.

    The vehicles have limited hardware and compute power. The company wants to optimize the model to reduce memory, battery, and hardware consumption without a significant sacrifice in accuracy.

    Which solution will improve the computational efficiency of the models?

    A. Use Amazon CloudWatch metrics to gain visibility into the SageMaker training weights, gradients, biases, and activation outputs. Compute the filter ranks based on the training information. Apply pruning to remove the low-ranking filters. Set new weights based on the pruned set of filters. Run a new training job with the pruned model.
    B. Use Amazon SageMaker Ground Truth to build and run data labeling workflows. Collect a larger labeled dataset with the labelling workflows. Run a new training job that uses the new labeled data with previous training data.
    C. Use Amazon SageMaker Debugger to gain visibility into the training weights, gradients, biases, and activation outputs. Compute the filter ranks based on the training information. Apply pruning to remove the low-ranking filters. Set the new weights based on the pruned set of filters. Run a new training job with the pruned model.
    D. Use Amazon SageMaker Model Monitor to gain visibility into the ModelLatency metric and OverheadLatency metric of the model after the company deploys the model. Increase the model learning rate. Run a new training job.

  • Question 99:

    An ecommerce company wants to use machine learning (ML) to monitor fraudulent transactions on its website. The company is using Amazon SageMaker to research, train, deploy, and monitor the ML models.

    The historical transactions data is in a .csv file that is stored in Amazon S3. The data contains features such as the user's IP address, navigation time, average time on each page, and the number of clicks for each session. There is no label in

    the data to indicate if a transaction is anomalous.

    Which models should the company use in combination to detect anomalous transactions? (Choose two.)

    A. IP Insights
    B. K-nearest neighbors (k-NN)
    C. Linear learner with a logistic function
    D. Random Cut Forest (RCF)
    E. XGBoost

  • Question 100:

    A company wants to segment a large group of customers into subgroups based on shared characteristics. The company's data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

    Which data visualization approach will MOST accurately determine the optimal value of k?

    A. Calculate the principal component analysis (PCA) components. Run the k-means clustering algorithm for a range of k by using only the first two PCA components. For each value of k, create a scatter plot with a different color for each cluster. The optimal value of k is the value where the clusters start to look reasonably separated.
    B. Calculate the principal component analysis (PCA) components. Create a line plot of the number of components against the explained variance. The optimal value of k is the number of PCA components after which the curve starts decreasing in a linear fashion.
    C. Create a t-distributed stochastic neighbor embedding (t-SNE) plot for a range of perplexity values. The optimal value of k is the value of perplexity, where the clusters start to look reasonably separated.
    D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your MLS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.