Exam Details

  • Exam Code
    :DS-200
  • Exam Name
    :Data Science Essentials
  • Certification
    :CCDH
  • Vendor
    :Cloudera
  • Total Questions
    :60 Q&As
  • Last Updated
    :Apr 22, 2024

Cloudera CCDH DS-200 Questions & Answers

  • Question 1:

    Which three metrics are useful in measuring the accuracy and quality of a recommender system?

    A. Mutual Information

    B. RMSF

    C. Tanimoto coefficient

    D. Pearson correlation

    E. Precision

    F. Recall

  • Question 2:

    Why is the naive Bayes classifier "naive"?

    A. It generally performs worse than more complex methods

    B. It Is an unbiased estimator

    C. It assumes Independence between all features

    D. It makes no assumptions on the underlying distributions (i.e., it is non-parametric)

  • Question 3:

    What is one limitation encountered by all systems that employ collaborative filtering and use preferences as input. In order to output product recommendations to consumers?

    A. Consumers do not have stable ratings for the same product over time

    B. There are too many consumers and too few products

    C. Not every product has been rated by every consumer

    D. There are too few consumers and too many products

  • Question 4:

    You've built a model that has ten different variables with complicated independence relationships between them, and both continuous and discrete variables that have complicated, multi-parameter distributions. Computing the joint probability distribution is complex, but it turns out that computing the conditional probabilities for the variables is easy. What is the most computationally efficient for computing the expected value?

    A. Method of moments

    B. Markov Chain Monte Carlo

    C. Gibbs sampling

    D. Numerical quadrature

  • Question 5:

    Which two machine learning algorithm should you consider as likely to benefit from discretizing continuous features?

    A. Support vector machine

    B. Naïve Bayes

    C. Decision trees

    D. Logistic regression

    E. Singular value decomposition

  • Question 6:

    What is the best way to determine the learning rate parameters for stochastic gradient descent when the distribution of the input data shifts over time?

    A. The learning rate should be adjusted periodically based on the setting that optimizes the objective function over a sample of recent observations

    B. The learning rate should be fixed number that decays as the number of observations in the data set increases

    C. The learning rate should be the value that optimizes the value of the objective function over the first N samples in the dataset

    D. The learning rate should be a fixed number with a constant decay factor

    E. The learning rate should be continuously adjusted based on the value that optimizes the objective function for the most recent observation from the input data

  • Question 7:

    Given the following sample of numbers from a distribution:

    1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

    How do high-level languages like Apache Hive and Apache Pig efficiently calculate approximately percentiles for a distribution?

    A. They sort all of the input samples and the lookup the samples for each percentile

    B. They maintain index of input data as it is loaded into HDFS and load them into memory

    C. They use pivots to assign each observations to the reducer that calculate each percentile

    D. They assign sample observations to buckets and then aggregate the buckets to compute the approximations

  • Question 8:

    Given the following sample of numbers from a distribution:

    1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89 What are the five numbers that summarize this distribution (the five number summary of sample percentiles)?

    A. 1, 3, 8, 34, 89

    B. 1, 4, 13, 34, 89

    C. 1, 1.5, 5, 24.5, 89

    D. 1, 2.5, 8, 27.5, 89

  • Question 9:

    Given the following sample of numbers from a distribution: 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89

    What are two benefits of using the five-number summary of sample percentiles to summarize a data set?

    A. You can calculate unbiased estimators for the parameters of the distribution

    B. It's robust to outliers

    C. It's well-defined for any probability distribution

    D. You can calculate it quickly using a relational database like MySQL, even when we have a large sample

  • Question 10:

    You want to build a classification model to identify spam comments on a blog. You decide to use the words in the comment text as inputs to your model. Which criteria should you use when deciding which words to use as features in order to contribute to making the correct classification decision?

    A. Choose words for your sample that are most correlated with the Spam label

    B. Choose words for your sample that occur most frequently in the text

    C. Choose words, for your sample that have the largest mutual information with the spam label

    D. Choose words for your sample that are least correlated with the spam label

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Cloudera exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DS-200 exam preparations and Cloudera certification application, do not hesitate to visit our Vcedump.com to find your solutions here.