The code block displayed below contains an error. The code block should return a new DataFrame that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5.
Find the error.
Code block:
transactionsDf.where("col(predError) >= 5")
A. The argument to the where method should be "predError >= 5".
B. Instead of where(), filter() should be used.
C. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
D. The argument to the where method cannot be a string.
E. Instead of >=, the SQL operator GEQ should be used.
The code block shown below should return the number of columns in the CSV file stored at location filePath. From the CSV file, only lines should be read that do not start with a # character. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Code block:
__1__(__2__.__3__.csv(filePath, __4__).__5__)
A. 1. size
2.
spark
3.
read()
4.
escape='#'
5.
columns
B. 1. DataFrame
2.
spark
3.
read()
4.
escape='#'
5.
shape[0]
C. 1. len
2.
pyspark
3.
DataFrameReader
4.
comment='#'
5.
columns
D. 1. size
2.
pyspark
3.
DataFrameReader
4.
comment='#'
5.
columns
E. 1. len
2.
spark
3.
read
4.
comment='#'
5.
columns
Which of the following describes how Spark achieves fault tolerance?
A. Spark helps fast recovery of data in case of a worker fault by providing the MEMORY_AND_DISK storage level option.
B. If an executor on a worker node fails while calculating an RDD, that RDD can be recomputed by another executor using the lineage.
C. Spark builds a fault-tolerant layer on top of the legacy RDD data system, which by itself is not fault tolerant.
D. Due to the mutability of DataFrames after transformations, Spark reproduces them using observed lineage in case of worker node failure.
E. Spark is only fault-tolerant if this feature is specifically enabled via the spark.fault_recovery.enabled property.
Which of the following code blocks returns a DataFrame that is an inner join of DataFrame itemsDf and DataFrame transactionsDf, on columns itemId and productId, respectively and in which every itemId just appears once?
A. itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId").distinct("itemId")
B. itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates(["itemId"])
C. itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId).dropDuplicates("itemId")
D. itemsDf.join(transactionsDf, itemsDf.itemId==transactionsDf.productId, how="inner").distinct(["itemId"])
E. itemsDf.join(transactionsDf, "itemsDf.itemId==transactionsDf.productId", how="inner").dropDuplicates (["itemId"])
Which of the following code blocks reduces a DataFrame from 12 to 6 partitions and performs a full shuffle?
A. DataFrame.repartition(12)
B. DataFrame.coalesce(6).shuffle()
C. DataFrame.coalesce(6)
D. DataFrame.coalesce(6, shuffle=True)
E. DataFrame.repartition(6)
Which of the following is not a feature of Adaptive Query Execution?
A. Replace a sort merge join with a broadcast join, where appropriate.
B. Coalesce partitions to accelerate data processing.
C. Split skewed partitions into smaller partitions to avoid differences in partition processing time.
D. Reroute a query in case of an executor failure.
E. Collect runtime statistics during query execution.
Which of the following code blocks reads in the two-partition parquet file stored at filePath, making sure all columns are included exactly once even though each partition has a different schema?
Schema of first partition:
1.root
2.
|-- transactionId: integer (nullable = true)
3.
|-- predError: integer (nullable = true)
4.
|-- value: integer (nullable = true)
5.
|-- storeId: integer (nullable = true)
6.
|-- productId: integer (nullable = true)
7.
|-- f: integer (nullable = true)
Schema of second partition:
1.root
2.
|-- transactionId: integer (nullable = true)
3.
|-- predError: integer (nullable = true)
4.
|-- value: integer (nullable = true)
5.
|-- storeId: integer (nullable = true)
6.
|-- rollId: integer (nullable = true)
7.
|-- f: integer (nullable = true)
8.
|-- tax_id: integer (nullable = false)
A. spark.read.parquet(filePath, mergeSchema='y')
B. spark.read.option("mergeSchema", "true").parquet(filePath)
C. spark.read.parquet(filePath)
D. 1.nx = 0 2.for file in dbutils.fs.ls(filePath):
3.
if not file.name.endswith(".parquet"):
4.
continue
5.
df_temp = spark.read.parquet(file.path)
6.
if nx == 0:
7.
df = df_temp
8.
else:
9.
df = df.union(df_temp)
10.
nx = nx+1
11.df
E. 1.nx = 0 2.for file in dbutils.fs.ls(filePath):
3.
if not file.name.endswith(".parquet"):
4.
continue
5.
df_temp = spark.read.parquet(file.path)
6.
if nx == 0:
7.
df = df_temp
8.
else:
9.
df = df.join(df_temp, how="outer")
10.
nx = nx+1
11.df
The code block displayed below contains an error. The code block should write DataFrame transactionsDf
as a parquet file to location filePath after partitioning it on column storeId.
Find the error.
Code block:
transactionsDf.write.partitionOn("storeId").parquet(filePath)
A. The partitioning column as well as the file path should be passed to the write() method of DataFrame transactionsDf directly and not as appended commands as in the code block.
B. The partitionOn method should be called before the write method.
C. The operator should use the mode() option to configure the DataFrameWriter so that it replaces any existing files at location filePath.
D. Column storeId should be wrapped in a col() operator.
E. No method partitionOn() exists for the DataFrame class, partitionBy() should be used instead.
Which of the following code blocks returns a DataFrame with a single column in which all items in column
attributes of DataFrame itemsDf are listed that contain the letter i?
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+
A. itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(attributes_exploded.c ontains ("i"))
B. itemsDf.explode(attributes).alias("attributes_exploded").filter(col("attributes_exploded").con tains("i"))
C. itemsDf.select(explode("attributes")).filter("attributes_exploded".contains("i"))
D. itemsDf.select(explode("attributes").alias("attributes_exploded")).filter(col("attributes_explo ded").contains("i"))
E. itemsDf.select(col("attributes").explode().alias("attributes_exploded")).filter(col("attributes_e xploded").contains("i"))
The code block displayed below contains an error. The code block should configure Spark to split data in
20 parts when exchanging data between executors for joins or aggregations.
Find the error.
Code block:
spark.conf.set(spark.sql.shuffle.partitions, 20)
A. The code block uses the wrong command for setting an option.
B. The code block sets the wrong option.
C. The code block expresses the option incorrectly.
D. The code block sets the incorrect number of parts.
E. The code block is missing a parameter.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.