Exam Details

  • Exam Code
    :MLS-C01
  • Exam Name
    :AWS Certified Machine Learning - Specialty (MLS-C01)
  • Certification
    :Amazon Certifications
  • Vendor
    :Amazon
  • Total Questions
    :394 Q&As
  • Last Updated
    :May 04, 2025

Amazon Amazon Certifications MLS-C01 Questions & Answers

  • Question 41:

    A data engineer wants to perform exploratory data analysis (EDA) on a petabyte of data. The data engineer does not want to manage compute resources and wants to pay only for queries that are run. The data engineer must write the analysis by using Python from a Jupyter notebook.

    Which solution will meet these requirements?

    A. Use Apache Spark from within Amazon Athena.

    B. Use Apache Spark from within Amazon SageMaker.

    C. Use Apache Spark from within an Amazon EMR cluster.

    D. Use Apache Spark through an integration with Amazon Redshift.

  • Question 42:

    A data scientist receives a new dataset in .csv format and stores the dataset in Amazon S3. The data scientist will use the dataset to train a machine learning (ML) model.

    The data scientist first needs to identify any potential data quality issues in the dataset. The data scientist must identify values that are missing or values that are not valid. The data scientist must also identify the number of outliers in the

    dataset.

    Which solution will meet these requirements with the LEAST operational effort?

    A. Create an AWS Glue job to transform the data from .csv format to Apache Parquet format. Use an AWS Glue crawler and Amazon Athena with appropriate SQL queries to retrieve the required information.

    B. Leave the dataset in .csv format. Use an AWS Glue crawler and Amazon Athena with appropriate SQL queries to retrieve the required information.

    C. Create an AWS Glue job to transform the data from .csv format to Apache Parquet format. Import the data into Amazon SageMaker Data Wrangler. Use the Data Quality and Insights Report to retrieve the required information.

    D. Leave the dataset in .csv format. Import the data into Amazon SageMaker Data Wrangler. Use the Data Quality and Insights Report to retrieve the required information.

  • Question 43:

    An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items.

    A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.

    How should the data scientist meet these requirements MOST cost-effectively?

    A. Tune all possible hyperparameters by using automatic model tuning (AMT). Optimize on {"HyperParameterTuningJobObjective": {"MetricName": "validation:accuracy", "Type": "Maximize"}}.

    B. Tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {"HyperParameterTuningJobObjective": {"MetricName": "validation'll", "Type": "Maximize"}}.

    C. Tune all possible hyperparameters by using automatic model tuning (AMT). Optimize on {"HyperParameterTuningJobObjective": {"MetricName": "validation:f1", "Type": "Maximize"}}.

    D. Tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {"HyperParameterTuningJobObjective": {"MetricName": "validation:f1", "Type": "Minimize"}}.

  • Question 44:

    A data scientist is trying to improve the accuracy of a neural network classification model. The data scientist wants to run a large hyperparameter tuning job in Amazon SageMaker. However, previous smaller tuning jobs on the same model often ran for several weeks. The ML specialist wants to reduce the computation time required to run the tuning job.

    Which actions will MOST reduce the computation time for the hyperparameter tuning job? (Choose two.)

    A. Use the Hyperband tuning strategy.

    B. Increase the number of hyperparameters.

    C. Set a lower value for the MaxNumberOfTrainingJobs parameter.

    D. Use the grid search tuning strategy.

    E. Set a lower value for the MaxParallelTrainingJobs parameter.

  • Question 45:

    A machine learning (ML) specialist needs to solve a binary classification problem for a marketing dataset. The ML specialist must maximize the Area Under the ROC Curve (AUC) of the algorithm by training an XGBoost algorithm. The ML specialist must find values for the eta, alpha, min_child_weight, and max_depth hyperparameters that will generate the most accurate model.

    Which approach will meet these requirements with the LEAST operational overhead?

    A. Use a bootstrap script to install scikit-learn on an Amazon EMR cluster. Deploy the EMR cluster. Apply k-fold cross-validation methods to the algorithm.

    B. Deploy Amazon SageMaker prebuilt Docker images that have scikit-learn installed. Apply k-fold cross-validation methods to the algorithm.

    C. Use Amazon SageMaker automatic model tuning (AMT). Specify a range of values for each hyperparameter.

    D. Subscribe to an AUC algorithm that is on AWS Marketplace. Specify a range of values for each hyperparameter.

  • Question 46:

    A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior.

    The engineer needs to increase the model performance by checking for multicollinearity in the dataset.

    Which steps can the data engineer take to accomplish this with the LEAST operational effort? (Choose two.)

    A. Use SageMaker Data Wrangler to refit and transform the dataset by applying one-hot encoding to category-based variables.

    B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.

    C. Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature.

    D. Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.

    E. Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset.

  • Question 47:

    An ecommerce company has used Amazon SageMaker to deploy a factorization machines (FM) model to suggest products for customers. The company's data science team has developed two new models by using the TensorFlow and PyTorch deep learning frameworks. The company needs to use A/B testing to evaluate the new models against the deployed model.

    The required A/B testing setup is as follows:

    Send 70% of traffic to the FM model, 15% of traffic to the TensorFlow model, and 15% of traffic to the PyTorch model. For customers who are from Europe, send all traffic to the TensorFlow model.

    Which architecture can the company use to implement the required A/B testing setup?

    A. Create two new SageMaker endpoints for the TensorFlow and PyTorch models in addition to the existing SageMaker endpoint. Create an Application Load Balancer. Create a target group for each endpoint. Configure listener rules and add weight to the target groups. To send traffic to the TensorFlow model for customers who are from Europe, create an additional listener rule to forward traffic to the TensorFlow target group.

    B. Create two production variants for the TensorFlow and PyTorch models. Create an auto scaling policy and configure the desired A/B weights to direct traffic to each production variant. Update the existing SageMaker endpoint with the auto scaling policy. To send traffic to the TensorFlow model for customers who are from Europe, set the TargetVariant header in the request to point to the variant name of the TensorFlow model.

    C. Create two new SageMaker endpoints for the TensorFlow and PyTorch models in addition to the existing SageMaker endpoint. Create a Network Load Balancer. Create a target group for each endpoint. Configure listener rules and add weight to the target groups. To send traffic to the TensorFlow model for customers who are from Europe, create an additional listener rule to forward traffic to the TensorFlow target group.

    D. Create two production variants for the TensorFlow and PyTorch models. Specify the weight for each production variant in the SageMaker endpoint configuration. Update the existing SageMaker endpoint with the new configuration. To send traffic to the TensorFlow model for customers who are from Europe, set the TargetVariant header in the request to point to the variant name of the TensorFlow model.

  • Question 48:

    A university wants to develop a targeted recruitment strategy to increase new student enrollment. A data scientist gathers information about the academic performance history of students. The data scientist wants to use the data to build student profiles. The university will use the profiles to direct resources to recruit students who are likely to enroll in the university.

    Which combination of steps should the data scientist take to predict whether a particular student applicant is likely to enroll in the university? (Choose two.)

    A. Use Amazon SageMaker Ground Truth to sort the data into two groups named "enrolled" or "not enrolled."

    B. Use a forecasting algorithm to run predictions.

    C. Use a regression algorithm to run predictions.

    D. Use a classification algorithm to run predictions.

    E. Use the built-in Amazon SageMaker k-means algorithm to cluster the data into two groups named "enrolled" or "not enrolled."

  • Question 49:

    A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced. What should the engineer do to improve the validation accuracy of the model?

    A. Perform stratified sampling on the original dataset.

    B. Acquire additional data about the majority classes in the original dataset.

    C. Use a smaller, randomly sampled version of the training dataset.

    D. Perform systematic sampling on the original dataset.

  • Question 50:

    A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10,000 unlabeled images.

    Which actions should the ML specialist take to address this problem? (Choose two.)

    A. Use Amazon SageMaker Ground Truth to label the unlabeled images.

    B. Use image preprocessing to transform the images into grayscale images.

    C. Use data augmentation to rotate and translate the labeled images.

    D. Replace the activation of the last layer with a sigmoid.

    E. Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label the unlabeled images.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Amazon exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your MLS-C01 exam preparations and Amazon certification application, do not hesitate to visit our Vcedump.com to find your solutions here.