Which of the following is a benefit of using vectorized pandas UDFs instead of standard PySpark UDFs?
A. The vectorized pandas UDFs allow for the use of type hints
B. The vectorized pandas UDFs process data in batches rather than one row at a time
C. The vectorized pandas UDFs allow for pandas API use inside of the function
D. The vectorized pandas UDFs work on distributed DataFrames
E. The vectorized pandas UDFs process data in memory rather than spilling to disk
A machine learning engineer is using the following code block to scale the inference of a single-node model on a Spark DataFrame with one million records:
Assuming the default Spark configuration is in place, which of the following is a benefit of using anIterator?
A. The data will be limited to a single executor preventing the model from being loaded multiple times
B. The model will be limited to a single executor preventing the data from being distributed
C. The model only needs to be loaded once per executor rather than once per batch during the inference process
D. The data will be distributed across multiple executors during the inference process
Which statement describes a Spark ML transformer?
A. A transformer is an algorithm which can transform one DataFrame into another DataFrame
B. A transformer is a hyperparameter grid that can be used to train a model
C. A transformer chains multiple algorithms together to transform an ML workflow
D. A transformer is a learning algorithm that can use a DataFrame to train a model
A machine learning engineer wants to parallelize the inference of group-specific models using the Pandas Function API. They have developed theapply_modelfunction that will look up and load the correct model for each group, and they want to apply it to each group of DataFramedf.
They have written the following incomplete code block:
Which piece of code can be used to fill in the above blank to complete the task?
A. applyInPandas
B. groupedApplyInPandas
C. mapInPandas
D. predict
Which of the following tools can be used to parallelize the hyperparameter tuning process for single-node machine learning models using a Spark cluster?
A. MLflow Experiment Tracking
B. Spark ML
C. Autoscaling clusters
D. Autoscaling clusters
E. Delta Lake
A data scientist has created a linear regression model that useslog(price)as a label variable. Using this model, they have performed inference and the predictions and actual label values are in Spark DataFramepreds_df.
They are using the following code block to evaluate the model:
regression_evaluator.setMetricName("rmse").evaluate(preds_df)
Which of the following changes should the data scientist make to evaluate the RMSE in a way that is comparable withprice?
A. They should exponentiate the computed RMSE value
B. They should take the log of the predictions before computing the RMSE
C. They should evaluate the MSE of the log predictions to compute the RMSE
D. They should exponentiate the predictions before computing the RMSE
A data scientist is wanting to explore summary statistics for Spark DataFrame spark_df. The data scientist wants to see the count, mean, standard deviation, minimum, maximum, and interquartile range (IQR) for each numerical feature.
Which of the following lines of code can the data scientist run to accomplish the task?
A. spark_df.summary ()
B. spark_df.stats()
C. spark_df.describe().head()
D. spark_df.printSchema()
E. spark_df.toPandas()
A data scientist uses 3-fold cross-validation when optimizing model hyperparameters for a regression problem. The following root-mean-squared-error values are calculated on each of the validation folds:
1.
10.0
2.
12.0
3.
17.0
Which of the following values represents the overall cross-validation root-mean-squared error?
A. 13.0
B. 17.0
C. 12.0
D. 39.0
E. 10.0
A machine learning engineer has identified the best run from an MLflow Experiment. They have stored the run ID in the run_id variable and identified the logged model name as "model". They now want to register that model in the MLflow Model Registry with the name "best_model".
Which lines of code can they use to register the model associated with run_id to the MLflow Model Registry?
A. mlflow.register_model(run_id, "best_model")
B. mlflow.register_model(f"runs:/{run_id}/model", "best_model")
C. millow.register_model(f"runs:/{run_id)/model")
D. mlflow.register_model(f"runs:/{run_id}/best_model", "model")
Which of the following evaluation metrics is not suitable to evaluate runs in AutoML experiments for regression problems?
A. F1
B. R-squared
C. MAE
D. MSE
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-MACHINE-LEARNING-ASSOCIATE exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.