Vcedump 100% Guareented DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions and Answers. 100% Pass Guarantee. Latest Questions with Accurate Answers.

Exam Details

Exam Code
:DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
Exam Name
:Databricks Certified Associate Developer for Apache Spark 3.0
Certification
:Databricks Certifications
Vendor
:Databricks
Total Questions
:180 Q&As
Last Updated
:Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

Question 41:

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?
A. transactionsDf.withColumnRenamed("productId", "productNumber")
B. transactionsDf.withColumn("productId", "productNumber")
C. transactionsDf.withColumnRenamed("productNumber", "productId")
D. transactionsDf.withColumnRenamed(col(productId), col(productNumber))
E. transactionsDf.withColumnRenamed(productId, productNumber)

Correct Answer: A

More info: pyspark.sql.DataFrame.withColumnRenamed -- PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2, 35 (Databricks import instructions)
Question 42:

The code block displayed below contains an error. The code block should return a DataFrame where all entries in column supplier contain the letter combination et in this order. Find the error.
Code block:
itemsDf.filter(Column('supplier').isin('et'))
A. The Column operator should be replaced by the col operator and instead of isin, contains should be used.
B. The expression inside the filter parenthesis is malformed and should be replaced by isin('et', 'supplier').
C. Instead of isin, it should be checked whether column supplier contains the letters et, so isin should be replaced with contains. In addition, the column should be accessed using col['supplier'].
D. The expression only returns a single column and filter should be replaced by select.

Correct Answer: B

Correct code block: itemsDf.filter(col('supplier').contains('et')) A mixup can easily happen here between isin and contains. Since we want to check whether a column "contains" the values et, this is the operator we should use here. Note that both methods are methods of Spark's Column object. See below for documentation links. A specific Column object can be accessed through the col() method and not the Column() method or through col[], which is an essential thing to know here. In PySpark, Column references a generic column object. To use it for queries, you need to link the generic column object to a specific DataFrame. This can be achieved, for example, through the col() method.
More info:
-isin documentation: pyspark.sql.Column.isin -- PySpark 3.1.1 documentation
-contains documentation: pyspark.sql.Column.contains -- PySpark 3.1.1 documentation
Static notebook | Dynamic notebook: See test 1, 51 (Databricks import instructions)
Question 43:

The code block shown below should return a single-column DataFrame with a column named
consonant_ct that, for each row, shows the number of consonants in column itemName of DataFrame
itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
DataFrame itemsDf:
1.+------+----------------------------------+-----------------------------+-------------------+
2.|itemId|itemName |attributes |supplier |
3.+------+----------------------------------+-----------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|[blue, winter, cozy] |Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |[red, summer, fresh, cooling]|YetiX |
6.|3 |Outdoors Backpack |[green, summer, travel] |Sports Company Inc.|
7.+------+----------------------------------+-----------------------------+-------------------+
Code block:
itemsDf.select(__1__(__2__(__3__(__4__), "a|e|i|o|u|\s", "")).__5__("consonant_ct"))
A. 1. length
2.
regexp_extract
3.
upper
4.
col("itemName")
5.
as
B. 1. size
2.
regexp_replace
3.
lower
4.
"itemName"
5.
alias
C. 1. lower
2.
regexp_replace
3.
length
4.
"itemName"
5.
alias
D. 1. length
2.
regexp_replace
3.
lower
4.
col("itemName")
5.
alias
E. 1. size
2.
regexp_extract
3.
lower
4.
col("itemName")
5.
alias

Correct Answer: D

Correct code block:
itemsDf.select(length(regexp_replace(lower(col("itemName")), "a|e|i|o|u|\s", "")).alias("consonant_ct"))
Returned DataFrame:
+------------+
|consonant_ct|
+------------+
| 19|
| 16|
| 10|
+------------+
This tries to make you think about the string functions Spark provides and in which order they should be
applied. Arguably the most difficult part, the regular expression "a|e|i|o|u|
\s", is not a numbered blank. However, if you are not familiar with the string functions, it may be a good
idea to review those before the exam. The size operator and the length operator can easily be confused.
size works on arrays, while length works on strings. Luckily, this is something you can read up about in the
documentation.
The code block works by first converting all uppercase letters in column itemName into lowercase (the
lower() part). Then, it replaces all vowels by "nothing" - an empty character "" (the
regexp_replace() part). Now, only lowercase characters without spaces are included in the DataFrame.
Then, per row, the length operator counts these remaining characters. Note that column
itemName in itemsDf does not include any numbers or other characters, so we do not need to make any
provisions for these. Finally, by using the alias() operator, we rename the resulting column to
consonant_ct.
More info:
-lower: pyspark.sql.functions.lower -- PySpark 3.1.2 documentation
-regexp_replace: pyspark.sql.functions.regexp_replace -- PySpark 3.1.2 documentation
-length: pyspark.sql.functions.length -- PySpark 3.1.2 documentation
-alias: pyspark.sql.Column.alias -- PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, 51 (Databricks import instructions)
Question 44:

In which order should the code blocks shown below be run in order to create a table of all values in column attributes next to the respective values in column supplier in DataFrame itemsDf?
1.
itemsDf.createOrReplaceView("itemsDf")
2.
spark.sql("FROM itemsDf SELECT 'supplier', explode('Attributes')")
3.
spark.sql("FROM itemsDf SELECT supplier, explode(attributes)")
4.
itemsDf.createOrReplaceTempView("itemsDf")
A. 4, 3
B. 1, 3
C. 2
D. 4, 2
E. 1, 2

Correct Answer: A

Static notebook | Dynamic notebook: See test 1, 56 (Databricks import instructions)
Question 45:

Which of the following code blocks returns all unique values of column storeId in DataFrame transactionsDf?
A. transactionsDf["storeId"].distinct()
B. transactionsDf.select("storeId").distinct()
C. transactionsDf.filter("storeId").distinct()
D. transactionsDf.select(col("storeId").distinct())
E. transactionsDf.distinct("storeId")

Correct Answer: B

distinct() is a method of a DataFrame. Knowing this, or recognizing this from the documentation, is the key to solving this question. More info: pyspark.sql.DataFrame.distinct -- PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2, 19 (Databricks import instructions)
Question 46:

Which of the following code blocks reads the parquet file stored at filePath into DataFrame itemsDf, using a valid schema for the sample of itemsDf shown below?
Sample of itemsDf:
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+
A. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", StringType()),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
B. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType),
3.
StructField("attributes", ArrayType(StringType)),
4.
StructField("supplier", StringType)])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
C. 1.itemsDf = spark.read.schema('itemId integer, attributes , supplier string').parquet(filePath)
D. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType(StringType())),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read.schema(itemsDfSchema).parquet(filePath)
E. 1.itemsDfSchema = StructType([
2.
StructField("itemId", IntegerType()),
3.
StructField("attributes", ArrayType([StringType()])),
4.
StructField("supplier", StringType())])
5.
6.itemsDf = spark.read(schema=itemsDfSchema).parquet(filePath)

Correct Answer: D

The challenge in this comes from there being an array variable in the schema. In addition, you should know how to pass a schema to the DataFrameReader that is invoked by spark.read. The correct way to define an array of strings in a schema is through ArrayType(StringType()). A schema can be passed to the DataFrameReader by simply appending schema(structType) to the read() operator. Alternatively, you can also define a schema as a string. For example, for the schema of itemsDf, the following string would make sense: itemId integer, attributes array, supplier string. A thing to keep in mind is that in schema definitions, you always need to instantiate the types, like so: StringType(). Just using StringType does not work in pySpark and will fail. Another concern with schemas is whether columns should be nullable, so allowed to have null values. In the case at hand, this is not a concern however, since the just asks for a "valid" schema. Both non-nullable and nullable column schemas would be valid here, since no null value appears in the DataFrame sample. More info: Learning Spark, 2nd Edition, Chapter 3 Static notebook | Dynamic notebook: See test 3, 19 (Databricks import instructions)
Question 47:

Which of the following describes a narrow transformation?
A. narrow transformation is an operation in which data is exchanged across partitions.
B. A narrow transformation is a process in which data from multiple RDDs is used.
C. A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables.
D. A narrow transformation is an operation in which data is exchanged across the cluster.
E. A narrow transformation is an operation in which no data is exchanged across the cluster.

Correct Answer: E

A narrow transformation is an operation in which no data is exchanged across the cluster. Correct! In narrow transformations, no data is exchanged across the cluster, since these transformations do not require any data from outside of the partition they are applied on. Typical narrow transformations include filter, drop, and coalesce. A narrow transformation is an operation in which data is exchanged across partitions. No, that would be one definition of a wide transformation, but not of a narrow transformation. Wide transformations typically cause a shuffle, in which data is exchanged across partitions, executors, and the cluster. A narrow transformation is an operation in which data is exchanged across the cluster. No, see explanation just above this one. A narrow transformation is a process in which 32-bit float variables are cast to smaller float variables, like 16-bit or 8-bit float variables. No, type conversion has nothing to do with narrow transformations in Spark. A narrow transformation is a process in which data from multiple RDDs is used. No. A resilient distributed dataset (RDD) can be described as a collection of partitions. In a narrow transformation, no data is exchanged between partitions. Thus, no data is exchanged between RDDs. One could say though that a narrow transformation and, in fact, any transformation results in a new RDD being created. This is because a transformation results in a change to an existing RDD (RDDs are the foundation of other Spark data structures, like DataFrames). But, since RDDs are immutable, a new RDD needs to be created to reflect the change caused by the transformation. More info: Spark Transformation and Action: A Deep Dive | by Misbah Uddin | CodeX | Medium
Question 48:

Which of the following statements about Spark's execution hierarchy is correct?
A. In Spark's execution hierarchy, a job may reach over multiple stage boundaries.
B. In Spark's execution hierarchy, manifests are one layer above jobs.
C. In Spark's execution hierarchy, a stage comprises multiple jobs.
D. In Spark's execution hierarchy, executors are the smallest unit.
E. In Spark's execution hierarchy, tasks are one layer above slots.

Correct Answer: A

In Spark's execution hierarchy, a job may reach over multiple stage boundaries. Correct. A job is a sequence of stages, and thus may reach over multiple stage boundaries. In Spark's execution hierarchy, tasks are one layer above slots. Incorrect. Slots are not a part of the execution hierarchy. Tasks are the lowest layer. In Spark's execution hierarchy, a stage comprises multiple jobs. No. It is the other way around ?a job consists of one or multiple stages. In Spark's execution hierarchy, executors are the smallest unit. False. Executors are not a part of the execution hierarchy. Tasks are the smallest unit! In Spark's execution hierarchy, manifests are one layer above jobs. Wrong. Manifests are not a part of the Spark ecosystem.
Question 49:

The code block displayed below contains an error. The code block should return a DataFrame in which column predErrorAdded contains the results of Python function add_2_if_geq_3 as applied to numeric and nullable column predError in DataFrame transactionsDf.
Find the error.
Code block:
1.def add_2_if_geq_3(x):
2.
if x is None:
3.
return x
4.
elif x >= 3:
5.
return x+2
6.
return x
7.
8.add_2_if_geq_3_udf = udf(add_2_if_geq_3)
9.
10.transactionsDf.withColumnRenamed("predErrorAdded", add_2_if_geq_3_udf(col("predError")))
A. The operator used to adding the column does not add column predErrorAdded to the DataFrame.
B. Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so transactionsDf.predError.
C. The udf() method does not declare a return type.
D. UDFs are only available through the SQL API, but not in the Python API as shown in the code block.
E. The Python function is unable to handle null values, resulting in the code block crashing on execution.

Correct Answer: A

Correct code block:
def add_2_if_geq_3(x):
if x is None:
return x
elif x >= 3:
return x+2
return x
add_2_if_geq_3_udf = udf(add_2_if_geq_3)
transactionsDf.withColumn("predErrorAdded",
add_2_if_geq_3_udf(col("predError"))).show()
Instead of withColumnRenamed, you should use the withColumn operator.
The udf() method does not declare a return type.
It is fine that the udf() method does not declare a return type, this is not a required argument. However, the
default return type is StringType. This may not be the ideal return type for numeric,
nullable data ?but the code will run without specified return type nevertheless. The Python function is
unable to handle null values, resulting in the code block crashing on execution.
The Python function is able to handle null values, this is what the statement if x is None does.
UDFs are only available through the SQL API, but not in the Python API as shown in the code block.
No, they are available through the Python API. The code in the code block that concerns UDFs is correct.
Instead of col("predError"), the actual DataFrame with the column needs to be passed, like so
transactionsDf.predError.
You may choose to use the transactionsDf.predError syntax, but the col("predError") syntax is fine.
Question 50:

Which of the following code blocks reads JSON file imports.json into a DataFrame?
A. spark.read().mode("json").path("/FileStore/imports.json")
B. spark.read.format("json").path("/FileStore/imports.json")
C. spark.read("json", "/FileStore/imports.json")
D. spark.read.json("/FileStore/imports.json")
E. spark.read().json("/FileStore/imports.json")

Correct Answer: D
Static notebook | Dynamic notebook: See test 1, 25 (Databricks import instructions) (https://flrs.github.io/ spark_practice_tests_code/#1/25.html , https://bit.ly/sparkpracticeexams_import_instructions)

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

Question 41:

Question 42:

Question 43:

Question 44:

Question 45:

Question 46:

Question 47:

Question 48:

Question 49:

Question 50:

Related Exams:

DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK

DATABRICKS-CERTIFIED-DATA-ANALYST-ASSOCIATE

DATABRICKS-CERTIFIED-DATA-ENGINEER-ASSOCIATE

DATABRICKS-CERTIFIED-GENERATIVE-AI-ENGINEER-ASSOCIATE

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER

DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-SCIENTIST

DATABRICKS-MACHINE-LEARNING-ASSOCIATE

DATABRICKS-MACHINE-LEARNING-PROFESSIONAL

Tips on How to Prepare for the Exams

Databricks Certified Associate Developer for Apache Spark 3.0

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

Question 41:

Question 42:

Question 43:

Question 44:

Question 45:

Question 46:

Question 47:

Question 48:

Question 49:

Question 50:

Related Exams:

Tips on How to Prepare for the Exams