The code block displayed below contains an error. The code block below is intended to add a column itemNameElements to DataFrame itemsDf that includes an array of all words in column itemName. Find the error.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+ 4.|1 |Thick Coat for Walking in the Snow|Sports
Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.withColumnRenamed("itemNameElements", split("itemName"))
A. All column names need to be wrapped in the col() operator.
B. Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument "," needs to be passed to the split method.
C. Operator withColumnRenamed needs to be replaced with operator withColumn and the split method needs to be replaced by the splitString method.
D. Operator withColumnRenamed needs to be replaced with operator withColumn and a second argument " " needs to be passed to the split method. E. The expressions "itemNameElements" and split("itemName") need to be swapped.
Which of the following statements about the differences between actions and transformations is correct?
A. Actions are evaluated lazily, while transformations are not evaluated lazily.
B. Actions generate RDDs, while transformations do not.
C. Actions do not send results to the driver, while transformations do.
D. Actions can be queued for delayed execution, while transformations can only be processed immediately.
E. Actions can trigger Adaptive Query Execution, while transformation cannot.
Which of the following code blocks shows the structure of a DataFrame in a tree-like way, containing both column names and types?
A. 1.print(itemsDf.columns) 2.print(itemsDf.types)
B. itemsDf.printSchema()
C. spark.schema(itemsDf)
D. itemsDf.rdd.printSchema()
E. itemsDf.print.schema()
The code block displayed below contains an error. The code block should return the average of rows in
column value grouped by unique storeId. Find the error.
Code block:
transactionsDf.agg("storeId").avg("value")
A. Instead of avg("value"), avg(col("value")) should be used.
B. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
C. All column names should be wrapped in col() operators.
D. agg should be replaced by groupBy.
E. "storeId" and "value" should be swapped.
The code block displayed below contains an error. The code block is intended to join DataFrame itemsDf with the larger DataFrame transactionsDf on column itemId. Find the error.
Code block:
transactionsDf.join(itemsDf, "itemId", how="broadcast")
A. The syntax is wrong, how= should be removed from the code block.
B. The join method should be replaced by the broadcast method.
C. Spark will only perform the broadcast operation if this behavior has been enabled on the Spark cluster.
D. The larger DataFrame transactionsDf is being broadcasted, rather than the smaller DataFrame itemsDf.
E. broadcast is not a valid join type.
Which of the following code blocks stores a part of the data in DataFrame itemsDf on executors?
A. itemsDf.cache().count()
B. itemsDf.cache(eager=True)
C. cache(itemsDf)
D. itemsDf.cache().filter()
E. itemsDf.rdd.storeCopy()
The code block displayed below contains one or more errors. The code block should load parquet files at location filePath into a DataFrame, only loading those files that have been modified before 2029-03-20
05:44:46. Spark should enforce a schema according to the schema shown below. Find the error.
Schema:
1.root
2.
|-- itemId: integer (nullable = true)
3.
|-- attributes: array (nullable = true)
4.
| |-- element: string (containsNull = true)
5.
|-- supplier: string (nullable = true)
Code block:
1.schema = StructType([
2.
StructType("itemId", IntegerType(), True),
3.
StructType("attributes", ArrayType(StringType(), True), True),
4.
StructType("supplier", StringType(), True)
5.])
6.
7.spark.read.options("modifiedBefore", "2029-03-20T05:44:46").schema(schema).load(filePath)
A. The attributes array is specified incorrectly, Spark cannot identify the file format, and the syntax of the call to Spark's DataFrameReader is incorrect.
B. Columns in the schema definition use the wrong object type and the syntax of the call to Spark's DataFrameReader is incorrect.
C. The data type of the schema is incompatible with the schema() operator and the modification date threshold is specified incorrectly.
D. Columns in the schema definition use the wrong object type, the modification date threshold is specified incorrectly, and Spark cannot identify the file format.
E. Columns in the schema are unable to handle empty values and the modification date threshold is specified incorrectly.
Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?
A. transactionsDf.drop(["predError", "value"])
B. transactionsDf.drop("predError", "value")
C. transactionsDf.drop(col("predError"), col("value"))
D. transactionsDf.drop(predError, value)
E. transactionsDf.drop("predError and value")
The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.
Code block:
transactionsDf.filter(col('predError').in([3, 6])).count()
A. The number of rows cannot be determined with the count() operator.
B. Instead of filter, the select method should be used.
C. The method used on column predError is incorrect.
D. Instead of a list, the values need to be passed as single arguments to the in operator.
E. Numbers 3 and 6 need to be passed as string variables.
Which of the following statements about broadcast variables is correct?
A. Broadcast variables are serialized with every single task.
B. Broadcast variables are commonly used for tables that do not fit into memory.
C. Broadcast variables are immutable.
D. Broadcast variables are occasionally dynamically updated on a per-task basis.
E. Broadcast variables are local to the worker node and not shared across the cluster.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.