E20-007 Exam Details

  • Exam Code
    :E20-007
  • Exam Name
    :Data Science and Big Data Analytics
  • Certification
    :EMC Certifications
  • Vendor
    :EMC
  • Total Questions
    :198 Q&As
  • Last Updated
    :May 31, 2026

EMC E20-007 Online Questions & Answers

  • Question 151:

    In R, functions like plot() and hist() are known as what?

    A. generic functions
    B. virtual methods
    C. virtual functions
    D. generic methods

  • Question 152:

    Refer to the exhibit.

    After analyzing a dataset, you report findings to your team:

    1.

    Variables A and C are significantly and positively impacting the dependent variable.

    2.

    Variable B is significantly and negatively impacting the dependent variable.

    3.

    Variable D is not significantly impacting the dependent variable.

    After seeing your findings, the majority of your team agreed that variable B should be positively impacting the dependent variable.

    What is a possible reason the coefficient for variable B was negative and not positive?

    A. Variable B is interacting with another variable due to correlated inputs
    B. Variable B needs a quadratic transformation due to its relationship to the dependent variable
    C. The information gain from variable B is already provided by another variable
    D. Variable B needs a logarithmic transformation due to its relationship to the dependent variable

  • Question 153:

    You have been assigned to perform a study of the daily revenue effect of a pricing model of online transactions. All data currently available to you has been loaded into your analytics database. This includes revenue data, pricing data, and

    online transaction data.

    You discover that all data comes in different levels of granularity. The transaction data has timestamps consisting of day, hour, minutes, and seconds. Pricing is stored at the daily level and revenue data is only reported monthly.

    What is the next step?

    A. Report back to the business owner that the current data model does not support the business question.
    B. Interpolate a daily model for revenue from the monthly revenue data.
    C. Aggregate all data to the monthly level in order to create a monthly revenue model.
    D. Disregard revenue as the key reason in the pricing model and create a daily model based on pricing and transactions only.

  • Question 154:

    The web analytics team uses Hadoop to process access logs. They now want to correlate this data with structured user data residing in a production single-instance JDBC database. They collaborate with the production team to import the data into Hadoop. Which tool should they use?

    A. Sqoop
    B. Pig
    C. Chukwa
    D. Scribe

  • Question 155:

    In addition to less data movement and the ability to use larger datasets in calculations, what is a benefit of analytical calculations in a database?

    A. quicker time to insight
    B. more efficient handling of categorical values
    C. improved connections between disparate data sources
    D. full use of data aggregation functionality

  • Question 156:

    Refer to the exhibit.

    You ran a linear regression, and the final output is seen in the exhibit.

    Based only on the information in the exhibit and an acceptable confidence level of 95%, how would you interpret the interaction of variable D with the dependent variable?

    A. In this model, Variable D is not significantly interacting with the dependent variable
    B. For every 1 unit increase in variable D, holding all other variables constant, we can expect the dependent variable to increase by 10.23 units
    C. For every 1 unit increase in variable D, holding all other variables constant, we can expect the dependent variable to be multiplied by 10.23 units
    D. Variable D is more significant than variables A, B, and C.

  • Question 157:

    You are studying the behavior of a population and are provided with multi-dimensional data at the individual level. You have identified four specific individuals who are valuable to your study. You would like to find all users who are most similar to each individual.

    Which algorithm is most appropriate for this study?

    A. K-means clustering
    B. Linear regression
    C. Association rules
    D. Decision trees

  • Question 158:

    Which method is used to solve for coefficients b0, b1, .., bn in your linear regression model : Y = b0 + b1x1+b2x2+....+bnxn

    A. Ordinary Least squares
    B. Apriori Algorithm
    C. Ridge and Lasso
    D. Integer programming

  • Question 159:

    In a Student's t-test, what is the meaning of the p-value?

    A. it is the area under the appropriate tails of the Student's distribution
    B. it is the "power" of the Student's t-test
    C. it is the mean of the distribution for the null hypothesis
    D. it is the mean of the distribution for the alternate hypothesis

  • Question 160:

    Data visualization is used in the final presentation of an analytics project. For what else is this technique commonly used?

    A. Assessing data quality
    B. Descriptive statistics
    C. ETLT
    D. Model selection

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only EMC exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your E20-007 exam preparations and EMC certification application, do not hesitate to visit our Vcedump.com to find your solutions here.