Which of the following question statement falls under data science category?
A. What happened in last six months?
B. How many products have been sold in a last month?
C. Where is a problem for sales?
D. Which is the optimal scenario for selling this product?
E. What happens, if these scenario continues?
Correct Answer: DE
Explanation: This question wants to check your understanding about Bl and Data Science. Bl was already existing and analytics team already using it. They need to improve and learn data science technique to solve some problems. If you check the option given in the question, it will confuse you. But if you have worked in Bl or as a Data Scientist then it is easy to answer. First 3 option can be easily answered using reporting solution, what sales happened in last six month, what was the problem etc. But for the last two option you need to apply data science techniques like which all scenarios are optimal for product sales, you need to collect the data and applying various techniques for that. Hence, last two option can only be answered using Data Science technique And for this you need to apply techniques like Optimization, predictive modeling, statistical analysis on structured and un-structured data.
Question 32:
You are creating a model for the recommending the book at Amazon.com, so which of the following recommender system you will use you don't have cold start problem?
A. Naive Bayes classifier
B. Item-based collaborative filtering
C. User-based collaborative filtering
D. Content-based filtering
Correct Answer: D
Explanation: The cold start problem is most prevalent in recommender systems. Recommender systems form a specific type of information filtering (IF) technique that attempts to present information items (movies, music, books, news, images, web pages) that are likely of interest to the user. Typically, a recommender system compares the user's profile to some reference characteristics. These characteristics may be from the information item (the content-based approach) or the user's social environment (the collaborative filtering approach). In the content-based approach, the system must be capable of matching the characteristics of an item against relevant features in the user's profile. In order to do this, it must first construct a sufficiently-detailed model of the user's tastes and preferences through preference elicitation. This may be done either explicitly (by querying the user) or implicitly (by observing the user's behaviour). In both cases, the cold start problem would imply that the user has to dedicate an amount of effort using the system in its 'dumb' state - contributing to the construction of their user profile - before the system can start providing any intelligent recommendations. Content-based filtering recommender systems use information about items or users to make recommendations, rather than user preferences, so it will perform well with little user preference data. Item-based and user-based collaborative filtering makes predictions based on users' preferences for items, os they will typically perform poorly with little user preference data. Logistic regression is not recommender system technique.
Question 33:
A bio-scientist is working on the analysis of the cancer cells. To identify whether the cell is cancerous or not, there has been hundreds of tests are done with small variations to say yes to the problem. Given the test result for a sample of healthy and cancerous cells, which of the following technique you will use to determine whether a cell is healthy?
A. Linear regression
B. Collaborative filtering
C. Naive Bayes
D. Identification Test
Correct Answer: C
Explanation: In this problem you have been given high-dimensional independent variables like yes, no: test results etc. and you have to predict either valid or not valid (One of two). So all of the below technique can be applied to this problem. Support vector machines Naive Bayes Logistic regression Random decision forests
Question 34:
In which phase of the analytic lifecycle would you expect to spend most of the project time?
A. Discovery
B. Data preparation
C. Communicate Results
D. Operationalize
Correct Answer: B
In the data preparation phase of the Data Analytics Lifecycle, the data range and distribution can be obtained. If the data is skewed, viewing the logarithm of the data (if it's all positive) can help detect structures that might otherwise be overlooked in a graph with a regular, nonlogarithmic scale. When preparing the data, one should look for signs of dirty data, as explained in the previous section. Examining if the data is unimodal or multimodal will give an idea of how many distinct populations with different behavior patterns might be mixed into the overall population. Many modeling techniques assume that the data follows a normal distribution. Therefore, it is important to know if the available dataset can match that assumption before applying any of those modeling techniques.
Question 35:
Refer to image below
A. Option A
B. Option B
C. Option C
D. Option D
Correct Answer: A
Text
Question 36:
In which lifecycle stage are appropriate analytical techniques determined?
A. Model planning
B. Model building
C. Data preparation
D. Discovery
Correct Answer: A
Explanation: In Phase 3, the data science team identifies candidate models to apply to the data for clustering, classifying, or finding relationships in the data depending on the goal of the project, It is during this phase that the team refers to the hypotheses developed in Phase 1, when they first became acquainted with the data and understanding the business problems or domain area. These hypotheses help the team frame the analytics to execute in Phase 4 and select the right methods to achieve its objectives. Some of the activities to consider in this phase include the following: Assess the structure of the datasets. The structure of the datasets is one factor that dictates the tools and analytical techniques for the next phase. Depending on whether the team plans to analyze textual data or transactional data, for example, different tools and approaches are required. Ensure that the analytical techniques enable the team to meet the business objectives and accept or reject the working hypotheses. Determine if the situation warrants a single model or a series of techniques as part of a larger analytic workflow. A few example models include association rules and logistic regression Other tools, such as Alpine Miner, enable users to set up a series of steps and analyses and can serve as a front-end user interface (Ul) for manipulating Big Data sources in PostgreSQL.
Question 37:
Which of the following true with regards to the K-Means clustering algorithm?
A. Labels are not pre-assigned to each objects in the cluster.
B. Labels are pre-assigned to each objects in the cluster.
C. It classify the data based on the labels.
D. It discovers the center of each cluster.
E. It find each objects fall in which particular cluster
Correct Answer: ADE
Explanation: Clustering does not require any predefined labels on the object, rather it consider the attributes on the object. Hence, option-B is out. Clustering is different than classification technique.
Hence you can discard the option-C as well. It does not use the pre-defined labels, hence it is called unsupervised learning and option-Ais correct. Main purpose of the Clustering technique is to determine the center of each Cluster and then
find the distance from that center. If object is near the center than it would fall in that particular cluster. Hence, finally you will have group or clusters created and get to know that objects fall in which particular cluster.
Question 38:
In which of the scenario you can use the linear regression model?
A. Predicting Home Price based on the location and house area
B. Predicting demand of the goods and services based on the weather
C. Predicting tumor size reduction based on input as number of radiation treatment
D. Predicting sales of the text book based on the number of students in state
Correct Answer: ABCD
Explanation: : You can use the linear regression model for predicting the continuous output variable based on the input variables. In all the cases mentioned in the question option, you can see that output can be predicted based on the input variable. Option-A: Input: Location, House Area and Output: House Price Option-B : Input: Weather condition, Output: Demand for the goods and services Option-C : Input: Number of Radiation Session Output: Tumor Size Reduction Option-D : Input: Number of students and Output: Sale quantity of text book
Question 39:
In which of the following scenario we can use naTve Bayes theorem for classification
A. Classify whether a given person is a male or a female based on the measured features. The features include height, weight and foot size.
B. To classify whether an email is spam or not spam
C. To identify whether a fruit is an orange or not based on features like diameter, color and shape
Correct Answer: ABC
Explanation: naive Bayes classifiers have worked quite well in many real-world situations, famously document classification and spam filtering. They requires a small amount of training data to estimate the necessary parameters
Question 40:
You have data of 10.000 people who make the purchasing from a specific grocery store. You also have their income detail in the data. You have created 5 clusters using this data. But in one of the cluster you see that only 30 people are falling as below 30, 2400, 2600, 2700, 2270 etc."
What would you do in this case?
A. You will be increasing number of clusters.
B. You will be decreasing the number of clusters.
C. You will remove that 30 people from dataset
D. You will be multiplying standard deviation with the 100
Correct Answer: B
Explanation: Decreasing the number of clusters will help in adjusting this outlier cluster to get adjusted in another cluster.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.