You are performing a market basket analysis using the Apriori algorithm. Which measure is a ratio describing the how many more times two items are present together than would be expected if those two items are statistically independent?
A. Lift
B. Leverage
C. Support
D. Confidence
In which lifecycle stage are appropriate analytical techniques determined?
A. Model planning
B. Model building
C. Data preparation
D. Discovery
What is Hadoop?
A. Java classes for HDFS types and MapReduce job management and HDFS
B. Java classes for HDFS types and MapReduce job management and the MapReduce paradigm
C. MapReduce paradigm and HDFS
D. MapReduce paradigm and massive unstructured data storage on commodity hardware
You are using k-means clustering to classify heart patients for a hospital. You have chosen Patient Sex, Height, Weight, Age and Income as measures and have used 3 clusters. When you create a pair-wise plot of the clusters, you notice that there is significant overlap between the clusters. What should you do?
A. Identify additional measures to add to the analysis
B. Remove one of the measures
C. Decrease the number of clusters
D. Increase the number of clusters
How does Pig's use of a schema differ from that of a traditional RDBMS?
A. Pig's schema is optional
B. Pig's schema requires that the data is physically present when the schema is defined
C. Pig's schema is required for ETL
D. Pig's schema supports a single data type
You are provided four different datasets. Initial analysis on these datasets show that they have identical mean, variance and correlation values. What should your next step in the analysis be?
A. Visualize the data to further explore the characteristics of each data set
B. Select one of the four datasets and begin planning and building a model
C. Combine the data from all four of the datasets and begin planning and bulding a model
D. Recalculate the descriptive statistics since they are unlikely to be identical for each dataset
You have been assigned to run a linear regression model for each of 5, 000 distinct districts, and all the data is currently stored in a PostgreSQL database. Which tool/library would you use to produce these models with the least effort?
A. MADlib
B. Mahout
C. R
D. HBase
Which characteristic applies only to Business Intelligence as opposed to Data Science?
A. Uses only structured data
B. Supports solving "what if" scenarios
C. Uses large data sets
D. Uses predictive modeling techniques
Which word or phrase completes the statement? Mahout is to Hadoop as MADlib is to ____________ .
A. PostgreSQL
B. R
C. Excel
D. SAS
In linear regression modeling, which action can be taken to improve the linearity of the relationship between the dependent and independent variables?
A. Apply a transformation to a variable
B. Use a different statistical package
C. Calculate the R-Squared value
D. Change the units of measurement on the independent variable
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only EMC exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your E20-026 exam preparations and EMC certification application, do not hesitate to visit our Vcedump.com to find your solutions here.