Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
  • Exam Name
    :Databricks Certified Associate Developer for Apache Spark 3.0
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :180 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

  • Question 81:

    Which of the following code blocks returns a DataFrame where columns predError and productId are removed from DataFrame transactionsDf?

    Sample of DataFrame transactionsDf:

    1.+-------------+---------+-----+-------+---------+----+

    2.|transactionId|predError|value|storeId|productId|f |

    3.+-------------+---------+-----+-------+---------+----+

    4.|1 |3 |4 |25 |1 |null|

    5.|2 |6 |7 |2 |2 |null|

    6.|3 |3 |null |25 |3 |null|

    7.+-------------+---------+-----+-------+---------+----+

    A. transactionsDf.withColumnRemoved("predError", "productId")

    B. transactionsDf.drop(["predError", "productId", "associateId"])

    C. transactionsDf.drop("predError", "productId", "associateId")

    D. transactionsDf.dropColumns("predError", "productId", "associateId")

    E. transactionsDf.drop(col("predError", "productId"))

  • Question 82:

    Which of the following describes characteristics of the Spark driver?

    A. The Spark driver requests the transformation of operations into DAG computations from the worker nodes.

    B. If set in the Spark configuration, Spark scales the Spark driver horizontally to improve parallel processing performance.

    C. The Spark driver processes partitions in an optimized, distributed fashion.

    D. In a non-interactive Spark application, the Spark driver automatically creates the SparkSession object.

    E. The Spark driver's responsibility includes scheduling queries for execution on worker nodes.

  • Question 83:

    The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

    dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

    Sample of DataFrame transactionsDf:

    1.+-------------+---------+-----+-------+---------+----+----------------+

    2.|transactionId|predError|value|storeId|productId| f| transactionDate| 3.+-------------+---------+-----+-------+---------+----+----------------+

    4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|

    5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|

    6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|

    7.+-------------+---------+-----+-------+---------+----+----------------+

    Code block:

    1.transactionsDf = transactionsDf.drop("transactionDate")

    2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM- dd")

    A. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

    B. Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column transactionDate should be wrapped in a col() operator.

    C. Column transactionDate should be wrapped in a col() operator.

    D. The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column transactionDate with the new column transactionTimestamp.

    E. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment.

  • Question 84:

    The code block displayed below contains an error. The code block should create DataFrame itemsAttributesDf which has columns itemId and attribute and lists every attribute from the attributes column in DataFrame itemsDf next to the itemId of the respective row in itemsDf. Find the error.

    A sample of DataFrame itemsDf is below.

    Code block:

    itemsAttributesDf = itemsDf.explode("attributes").alias("attribute").select("attribute", "itemId")

    A. Since itemId is the index, it does not need to be an argument to the select() method.

    B. The alias() method needs to be called after the select() method.

    C. The explode() method expects a Column object rather than a string.

    D. explode() is not a method of DataFrame. explode() should be used inside the select() method instead.

    E. The split() method should be used inside the select() method instead of the explode() method.

  • Question 85:

    The code block displayed below contains an error. The code block should configure Spark so that DataFrames up to a size of 20 MB will be broadcast to all worker nodes when performing a join.

    Find the error.

    Code block:

    A. spark.conf.set("spark.sql.autoBroadcastJoinThreshold", 20)

    B. Spark will only broadcast DataFrames that are much smaller than the default value.

    C. The correct option to write configurations is through spark.config and not spark.conf.

    D. Spark will only apply the limit to threshold joins and not to other joins.

    E. The passed limit has the wrong variable type.

    F. The command is evaluated lazily and needs to be followed by an action.

  • Question 86:

    Which of the following statements about Spark's DataFrames is incorrect?

    A. Spark's DataFrames are immutable.

    B. Spark's DataFrames are equal to Python's DataFrames.

    C. Data in DataFrames is organized into named columns.

    D. RDDs are at the core of DataFrames.

    E. The data in DataFrames may be split into multiple chunks.

  • Question 87:

    The code block shown below should return a two-column DataFrame with columns transactionId and supplier, with combined information from DataFrames itemsDf and transactionsDf. The code block should merge rows in which column productId of DataFrame transactionsDf matches the value of column itemId in DataFrame itemsDf, but only where column storeId of DataFrame

    transactionsDf does not match column itemId of DataFrame itemsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

    Code block:

    transactionsDf.__1__(itemsDf, __2__).__3__(__4__)

    A. 1. join

    2.

    transactionsDf.productId==itemsDf.itemId, how="inner"

    3.

    select

    4.

    "transactionId", "supplier"

    B. 1. select

    2.

    "transactionId", "supplier"

    3.

    join

    4.

    [transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId]

    C. 1. join

    2.

    [transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId]

    3.

    select

    4.

    "transactionId", "supplier"

    D. 1. filter

    2.

    "transactionId", "supplier"

    3.

    join

    4.

    "transactionsDf.storeId!=itemsDf.itemId, transactionsDf.productId==itemsDf.itemId"

    E. 1. join

    2.

    transactionsDf.productId==itemsDf.itemId, transactionsDf.storeId!=itemsDf.itemId

    3.

    filter

    4.

    "transactionId", "supplier"

  • Question 88:

    Which of the following describes the conversion of a computational query into an execution plan in Spark?

    A. Spark uses the catalog to resolve the optimized logical plan.

    B. The catalog assigns specific resources to the optimized memory plan.

    C. The executed physical plan depends on a cost optimization from a previous stage.

    D. Depending on whether DataFrame API or SQL API are used, the physical plan may differ.

    E. The catalog assigns specific resources to the physical plan.

  • Question 89:

    Which of the following statements about storage levels is incorrect?

    A. The cache operator on DataFrames is evaluated like a transformation.

    B. In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

    C. Caching can be undone using the DataFrame.unpersist() operator.

    D. MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

    E. DISK_ONLY will not use the worker node's memory.

  • Question 90:

    Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

    A. transactionsDf.sort(asc_nulls_last("predError"))

    B. transactionsDf.orderBy("predError").desc_nulls_last()

    C. transactionsDf.sort("predError", ascending=False)

    D. transactionsDf.desc_nulls_last("predError")

    E. transactionsDf.orderBy("predError").asc_nulls_last()

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.