RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a______, as it is scale-dependent.
A. Between Variables
B. Particular Variable
C. Among all the variables
D. All of the above are correct
Correct Answer: B
Explanation: : The RMSE serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSE is a good measure of accuracy, but only to compare forecasting errors of different models for a particular variable and not between variables, as it is scale-dependent.
Question 62:
Select the statement which applies correctly to the Naive Bayes
A. Works with a small amount of data
B. Sensitive to how the input data is prepared
C. Works with nominal values
Correct Answer: ABC
Question 63:
Find out the classifier which assumes independence among all its features?
A. Neural networks
B. Linear Regression
C. Naive Bayes
D. Random forests
Correct Answer: C
Explanation: A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model". A Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem (from Bayesian statistics) with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model". In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.
Question 64:
Which of the following problem you can solve using binomial distribution
A. A manufacturer of metal pistons finds that on the average: 12% of his pistons are rejected because they are either oversize or undersize. What is the probability that a batch of 10 pistons will contain no more than 2 rejects?
B. A life insurance salesman sells on the average 3 life insurance policies per week. Use Poisson's law to calculate the probability that in a given week he will sell Some policies
C. Vehicles pass through a junction on a busy road at an average rate of 300 per hour Find the probability that none passes in a given minute.
D. It was found that the mean length of 100 parts produced by a lathe was 20.05 mm with a standard deviation of 0.02 mm. Find the probability that a part selected at random would have a length between 20.03 mm and 20.08 mm
Correct Answer: A
Explanation: The entire problem can be solved using below method Binomial: A manufacturer of metal pistons finds that on the average, 12% of his pistons are rejected because they are either oversize or undersize. What is the probability that a batch of 10 pistons will contain no more than 2 rejects? Poisson: A life insurance salesman sells on the average 3 life insurance policies per week. Use Poisson's law to calculate the probability that in a given week he will sell Some policies Poisson: Vehicles pass through a junction on a busy road at an average rate of 300 per hour Find the probability that none passes in a given minute. Normal: It was found that the mean length of 100 parts produced by a lathe was
20.05 mm with a standard deviation of 0.02 mm. Find the probability that a part selected at random would have a length between 20 03 mm and 20.08 mm
Question 65:
A researcher is interested in how variables, such as GRE (Graduate Record Exam scores), GPA (grade point average) and prestige of the undergraduate institution, effect admission into graduate school. The response variable, admit/don't admit, is a binary variable.
Above is an example of:
A. Linear Regression
B. Logistic Regression
C. Recommendation system
D. Maximum likelihood estimation
E. Hierarchical linear models
Correct Answer: B
Explanation: Logistic regression Pros: Computationally inexpensive, easy to implement, knowledge representation easy to interpret Cons: Prone to underfitting, may have low accuracy Works with: Numeric values, nominal values
Question 66:
What is the best way to evaluate the quality of the model found by an unsupervised algorithm like k-means clustering, given metrics for the cost of the clustering (how well it fits the data) and its stability (how similar the clusters are across multiple runs over the same data)?
A. The lowest cost clustering subject to a stability constraint
B. The lowest cost clustering
C. The most stable clustering subject to a minimal cost constraint
D. The most stable clustering
Correct Answer: A
Explanation: There is a tradeoff between cost and stability in unsupervised learning. The more tightly you fit the data, the less stable the model will be, and vice versa. The idea is to find a good balance with more weight given to the cost. Typically a good approach is to set a stability threshold and select the model that achieves the lowest cost above the stability threshold.
Question 67:
Which of the following could be features?
A. Words in the document
B. Symptoms of a diseases
C. Characteristics of an unidentified object
D. 0nly 1 and 2
E. All 1,2 and 3 are possible
Correct Answer: E
Explanation: Any dataset that can be turned into lists of features. A feature is simply something that is either present or absent for a given item. In the case of documents, the features are the words in the document but they could also be characteristics of an unidentified object symptoms of a disease, or anything else that can be said to be present of absent.
Question 68:
Suppose A, B , and C are events. The probability of A given B , relative to P(|C), is the same as the probability of A given B and C (relative to P ). That is,
A. P(A,B|C) P(B|C) =P(A|B,C)
B. P(A,B|C) P(B|C) =P(B|A,C)
C. P(A,B|C) P(B|C) =P(C|B,C)
D. P(A,B|C) P(B|C) =P(A|C,B)
Correct Answer: A
From the definition, P(A,B|C) P(B|C) =P(A,B.C)/P(C) P(B.C)/P(C) =P(A,B.C) P(B,C) =P(A|BC) This follows from the definition of conditional probability, applied twice: P(A,B)=(PA|B)P(B)
Question 69:
Your customer provided you with 2. 000 unlabeled records three groups. What is the correct analytical method to use?
A. Semi Linear Regression
B. Logistic regression
C. Naive Bayesian classification
D. Linear regression
E. K-means clustering
Correct Answer: E
Explanation: k-means clustering is a method of vector quantization^ originally from signal processing, that is popular for cluster analysis in data mining, k-means clustering aims to partition n observations into k clusters in which each
observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster This results in a partitioning of the data space into Voronoi cells.
The problem is computationally difficult (NP-hard); however there are efficient heuristic algorithms that are commonly employed and converge quickly to a local optimum. These are usually similar to the expectation-maximization algorithm for
mixtures of Gaussian distributions via an iterative refinement approach employed by both algorithms. Additionally they both use cluster centers to model the data; however k-means clustering tends to find clusters of comparable spatial extent,
while the expectation-maximization mechanism allows clusters to have different shapes.
The algorithm has nothing to do with and should not be confused with k-nearest neighbor another popular machine learning technique.
Question 70:
Consider the following confusion matrix for a data set with 600 out of 11,100 instances positive:
In this case, Precision = 50%, Recall = 83%, Specificity = 95%, and Accuracy = 95%.
Select the correct statement
A. Precision is low, which means the classifier is predicting positives best
B. Precision is low, which means the classifier is predicting positives poorly
C. problem domain has a major impact on the measures that should be used to evaluate a classifier within it
D. 1 and 3
E. 2 and 3
Correct Answer: E
Explanation: In this case, Precision = 50%, Recall = 83%, Specificity = 95%: and Accuracy = 95%. In this case, Precision is low, which means the classifier is predicting positives poorly. However, the three other measures seem to suggest that this is a good classifier. This just goes to show that the problem domain has a major impact on the measures that should be used to evaluate a classifier within it, and that looking at the 4 simple cases presented is not sufficient.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.