Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?
A. transactionsDf.withColumnRenamed("productId", "productNumber")
B. transactionsDf.withColumn("productId", "productNumber")
C. transactionsDf.withColumnRenamed("productNumber", "productId")
D. transactionsDf.withColumnRenamed(col(productId), col(productNumber))
E. transactionsDf.withColumnRenamed(productId, productNumber)
The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.
Code block:
itemsDf.filter(Column('supplier').isin('et'))
A. The Column operator should be replaced by the col operator and instead of isin, contains should be used.
B. The expression inside the filter parenthesis is malformed and should be replaced by isin('et', 'supplier').
C. Instead of isin, it should be checked whether column supplier contains the letters et, so isin should be replaced with contains. In addition, the column should be accessed using col['supplier'].
D. The expression only returns a single column and filter should be replaced by select.
The code block shown below should return a single-column DataFrame with a column named
consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame
itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+
Code block:
itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))
A. 1. length
2.
regexp_extract
3.
upper
4.
col("itemName")
5.
as
B. 1. size
2.
regexp_replace
3.
lower
4.
"itemName"
5.
alias
C. 1. lower
2.
regexp_replace
3.
length
4.
"itemName"
5.
alias
D. 1. length
2.
regexp_replace
3.
lower
4.
col("itemName")
5.
alias
E. 1. size
2.
regexp_extract
3.
lower
4.
col("itemName")
5.
alias
In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?
1.
itemsDf.createOrReplaceView("itemsDf")
2.
spark.sql("FROM itemsDf SELECT 'supplier', explode('Attributes')")
3.
spark.sql("FROM itemsDf SELECT supplier, explode(attributes)")
4.
itemsDf.createOrReplaceTempView("itemsDf")
A. 4, 3
B. 1, 3
C. 2
D. 4, 2
E. 1, 2
Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?
A. transactionsDf["storeId"].distinct()
B. transactionsDf.select("storeId").distinct()
C. transactionsDf.filter("storeId").distinct()
D. transactionsDf.select(col("storeId").distinct())
E. transactionsDf.distinct("storeId")
Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?
Sample of itemsDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+
A. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", StringType()),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
B. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType),
3.
StructField("attributes", ArrayType(StringType)),
4.
StructField("supplier", StringType)])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
C. 1.itemsDf = spark.read.schema('itemId integer, attributes
D. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType(StringType())),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
E. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType([StringType()])),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)
Which of the following describes a narrow transformation?
A. narrow transformation is an operation in which data is exchanged across partitions.
B. A narrow transformation is a process in which data from multiple RDDs is used.
C. A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.
D. A narrow transformation is an operation in which data is exchanged across the cluster.
E. A narrow transformation is an operation in which no data is exchanged across the cluster.
Which of the following statements about Spark's execution hierarchy is correct?
A. In Spark's execution hierarchy, a job may reach over multiple stage boundaries.
B. In Spark's execution hierarchy, manifests are one layer above jobs.
C. In Spark's execution hierarchy, a stage comprises multiple jobs.
D. In Spark's execution hierarchy, executors are the smallest unit.
E. In Spark's execution hierarchy, tasks are one layer above slots.
The code block displayed below contains an error. The code block should return a DataFrame in which column predErrorAdded contains the results of Python function add_2_if_geq_3 as applied to numeric and nullable column predError in DataFrame transactionsDf.
Find the error.
Code block:
1.def add_2_if_geq_3(x):
2.
if x is None:
3.
return x
4.
elif x >= 3:
5.
return x+2
6.
return x
7.
8.add_2_if_geq_3_udf = udf(add_2_if_geq_3)
9.
10.transactionsDf.withColumnRenamed("predErrorAdded", add_2_if_geq_3_udf(col("predError")))
A. The operator used to adding the column does not add column predErrorAdded to the DataFrame.
B. Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so transactionsDf.predError.
C. The udf() method does not declare a return type.
D. UDFs are only available through the SQL API, but not in the Python API as shown in the code block.
E. The Python function is unable to handle null values, resulting in the code block crashing on execution.
Which of the following code blocks reads JSON file imports.json into a DataFrame?
A. spark.read().mode("json").path("/FileStore/imports.json")
B. spark.read.format("json").path("/FileStore/imports.json")
C. spark.read("json", "/FileStore/imports.json")
D. spark.read.json("/FileStore/imports.json")
E. spark.read().json("/FileStore/imports.json")
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.