Which of the following statements about lazy evaluation is incorrect?
A. Predicate pushdown is a feature resulting from lazy evaluation.
B. Execution is triggered by transformations.
C. Spark will fail a job only during execution, but not during definition.
D. Accumulators do not change the lazy evaluation model of Spark.
E. Lineages allow Spark to coalesce transformations into stages
The code block displayed below contains an error. The code block should combine data from DataFrames itemsDf and transactionsDf, showing all rows of DataFrame itemsDf that have a matching value in column itemId with a value in column transactionsId of DataFrame transactionsDf.
Find the error.
Code block:
itemsDf.join(itemsDf.itemId==transactionsDf.transactionId)
A. The join statement is incomplete.
B. The union method should be used instead of join.
C. The join method is inappropriate.
D. The merge method should be used instead of join.
E. The join expression is malformed.
Which of the following is a problem with using accumulators?
A. Only unnamed accumulators can be inspected in the Spark UI.
B. Only numeric values can be used in accumulators.
C. Accumulator values can only be read by the driver, but not by executors.
D. Accumulators do not obey lazy evaluation.
E. Accumulators are difficult to use for debugging because they will only be updated once, independent if a task has to be re-run due to hardware failure.
Which of the following code blocks removes all rows in the 6-column DataFrame transactionsDf that have missing data in at least 3 columns?
A. transactionsDf.dropna("any")
B. transactionsDf.dropna(thresh=4)
C. transactionsDf.drop.na("",2)
D. transactionsDf.dropna(thresh=2)
E. transactionsDf.dropna("",4)
Which of the following describes tasks?
A. A task is a command sent from the driver to the executors in response to a transformation.
B. Tasks transform jobs into DAGs.
C. A task is a collection of slots.
D. A task is a collection of rows.
E. Tasks get assigned to the executors by the driver.
The code block shown below should convert up to 5 rows in DataFrame transactionsDf that have the value 25 in column storeId into a Python list. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__).__3__(__4__)
A. 1. filter
2.
"storeId"==25
3.
collect
4.
5
B. 1. filter
2.
col("storeId")==25
3.
toLocalIterator
4.
5
C. 1. select
2.
storeId==25
3.
head
4.
5
D. 1. filter
2.
col("storeId")==25
3.
take
4.
5
E. 1. filter
2.
col("storeId")==25
3.
collect
4.
5
The code block displayed below contains an error. The code block should arrange the rows of DataFrame transactionsDf using information from two columns in an ordered fashion, arranging first by column value, showing smaller numbers at the top and greater numbers at the bottom, and then by column predError, for which all values should be arranged in the inverse way of the order of items in column value. Find the error.
Code block:
transactionsDf.orderBy('value', asc_nulls_first(col('predError')))
A. Two orderBy statements with calls to the individual columns should be chained, instead of having both columns in one orderBy statement.
B. Column value should be wrapped by the col() operator.
C. Column predError should be sorted in a descending way, putting nulls last.
D. Column predError should be sorted by desc_nulls_first() instead.
E. Instead of orderBy, sort should be used.
Which of the following code blocks reads in the JSON file stored at filePath as a DataFrame?
A. spark.read.json(filePath)
B. spark.read.path(filePath, source="json")
C. spark.read().path(filePath)
D. spark.read().json(filePath)
E. spark.read.path(filePath)
Which of the following describes a difference between Spark's cluster and client execution modes?
A. In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.
B. In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.
C. In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.
D. In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.
E. In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.
Which of the following code blocks selects all rows from DataFrame transactionsDf in which column productId is zero or smaller or equal to 3?
A. transactionsDf.filter(productId==3 or productId<1)
B. transactionsDf.filter((col("productId")==3) or (col("productId")<1))
C. transactionsDf.filter(col("productId")==3 | col("productId")<1)
D. transactionsDf.where("productId"=3).or("productId"<1))
E. transactionsDf.filter((col("productId")==3) | (col("productId")<1))
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.