Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
  • Exam Name
    :Databricks Certified Associate Developer for Apache Spark 3.0
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :180 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

  • Question 21:

    Which of the following is the idea behind dynamic partition pruning in Spark?

    A. Dynamic partition pruning is intended to skip over the data you do not need in the results of a query.

    B. Dynamic partition pruning concatenates columns of similar data types to optimize join performance.

    C. Dynamic partition pruning performs wide transformations on disk instead of in memory.

    D. Dynamic partition pruning reoptimizes physical plans based on data types and broadcast variables.

    E. Dynamic partition pruning reoptimizes query plans based on runtime statistics collected during query execution.

  • Question 22:

    Which of the following code blocks adds a column predErrorSqrt to DataFrame transactionsDf that is the square root of column predError?

    A. transactionsDf.withColumn("predErrorSqrt", sqrt(predError))

    B. transactionsDf.select(sqrt(predError))

    C. transactionsDf.withColumn("predErrorSqrt", col("predError").sqrt())

    D. transactionsDf.withColumn("predErrorSqrt", sqrt(col("predError")))

    E. transactionsDf.select(sqrt("predError"))

  • Question 23:

    Which of the following code blocks returns a single-row DataFrame that only has a column corr which shows the Pearson correlation coefficient between columns predError and value in DataFrame transactionsDf?

    A. transactionsDf.select(corr(["predError", "value"]).alias("corr")).first()

    B. transactionsDf.select(corr(col("predError"), col("value")).alias("corr")).first()

    C. transactionsDf.select(corr(predError, value).alias("corr"))

    D. transactionsDf.select(corr(col("predError"), col("value")).alias("corr"))

    E. transactionsDf.select(corr("predError", "value"))

  • Question 24:

    Which of the following code blocks returns a DataFrame with approximately 1,000 rows from the 10,000row DataFrame itemsDf, without any duplicates, returning the same rows even if the code block is run twice?

    A. itemsDf.sampleBy("row", fractions={0: 0.1}, seed=82371)

    B. itemsDf.sample(fraction=0.1, seed=87238)

    C. itemsDf.sample(fraction=1000, seed=98263)

    D. itemsDf.sample(withReplacement=True, fraction=0.1, seed=23536)

    E. itemsDf.sample(fraction=0.1)

  • Question 25:

    The code block displayed below contains an error. The code block should merge the rows of DataFrames transactionsDfMonday and transactionsDfTuesday into a new DataFrame, matching column names and inserting null values where column names do not appear in both DataFrames. Find the error.

    Sample of DataFrame transactionsDfMonday:

    1.+-------------+---------+-----+-------+---------+----+

    2.|transactionId|predError|value|storeId|productId| f|

    3.+-------------+---------+-----+-------+---------+----+

    4.| 5| null| null| null| 2|null|

    5.| 6| 3| 2| 25| 2|null|

    6.+-------------+---------+-----+-------+---------+----+

    Sample of DataFrame transactionsDfTuesday:

    1.+-------+-------------+---------+-----+

    2.|storeId|transactionId|productId|value|

    3.+-------+-------------+---------+-----+

    4.| 25| 1| 1| 4|

    5.| 2| 2| 2| 7|

    6.| 3| 4| 2| null|

    7.| null| 5| 2| null|

    8.+-------+-------------+---------+-----+

    Code block:

    sc.union([transactionsDfMonday, transactionsDfTuesday])

    A. The DataFrames' RDDs need to be passed into the sc.union method instead of the DataFrame variable names.

    B. Instead of union, the concat method should be used, making sure to not use its default arguments.

    C. Instead of the Spark context, transactionDfMonday should be called with the join method instead of the union method, making sure to use its default arguments.

    D. Instead of the Spark context, transactionDfMonday should be called with the union method.

    E. Instead of the Spark context, transactionDfMonday should be called with the unionByName method instead of the union method, making sure to not use its default arguments.

  • Question 26:

    Which of the following statements about data skew is incorrect?

    A. Spark will not automatically optimize skew joins by default.

    B. Broadcast joins are a viable way to increase join performance for skewed data over sort- merge joins.

    C. In skewed DataFrames, the largest and the smallest partition consume very different amounts of memory.

    D. To mitigate skew, Spark automatically disregards null values in keys when joining.

    E. Salting can resolve data skew.

  • Question 27:

    The code block displayed below contains an error. The code block should display the schema of DataFrame transactionsDf. Find the error.

    Code block:

    transactionsDf.rdd.printSchema

    A. There is no way to print a schema directly in Spark, since the schema can be printed easily through using print(transactionsDf.columns), so that should be used instead.

    B. The code block should be wrapped into a print() operation.

    C. PrintSchema is only accessible through the spark session, so the code block should be rewritten as spark.printSchema(transactionsDf).

    D. PrintSchema is a method and should be written as printSchema(). It is also not callable through transactionsDf.rdd, but should be called directly from transactionsDf.

    E. PrintSchema is a not a method of transactionsDf.rdd. Instead, the schema should be printed via transactionsDf.print_schema().

  • Question 28:

    Which of the following options describes the responsibility of the executors in Spark?

    A. The executors accept jobs from the driver, analyze those jobs, and return results to the driver.

    B. The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.

    C. The executors accept tasks from the driver, execute those tasks, and return results to the driver.

    D. The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.

    E. The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

  • Question 29:

    The code block shown below should return only the average prediction error (column predError) of a random subset, without replacement, of approximately 15% of rows in DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.

    transactionsDf.__1__(__2__, __3__).__4__(avg('predError'))

    A. 1. sample

    2.

    True

    3.

    0.15

    4.

    filter

    B. 1. sample

    2.

    False

    3.

    0.15

    4.

    select

    C. 1. sample

    2.

    0.85

    3.

    False

    4.

    select

    D. 1. fraction

    2.

    0.15

    3.

    True

    4.

    where

    E. 1. fraction

    2.

    False

    3.

    0.85

    4.

    select

  • Question 30:

    Which of the following code blocks returns a one-column DataFrame of all values in column supplier of DataFrame itemsDf that do not contain the letter X? In the DataFrame, every value should only be listed once.

    Sample of DataFrame itemsDf:

    1.+------+--------------------+--------------------+-------------------+

    2.|itemId| itemName| attributes| supplier|

    3.+------+--------------------+--------------------+-------------------+

    4.| 1|Thick Coat for Wa...|[blue, winter, cozy]|Sports Company Inc.|

    5.| 2|Elegant Outdoors ...|[red, summer, fre...| YetiX|

    6.| 3| Outdoors Backpack|[green, summer, t...|Sports Company Inc.|

    7.+------+--------------------+--------------------+-------------------+

    A. itemsDf.filter(col(supplier).not_contains('X')).select(supplier).distinct()

    B. itemsDf.select(~col('supplier').contains('X')).distinct()

    C. itemsDf.filter(not(col('supplier').contains('X'))).select('supplier').unique()

    D. itemsDf.filter(~col('supplier').contains('X')).select('supplier').distinct()

    E. itemsDf.filter(!col('supplier').contains('X')).select(col('supplier')).unique()

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.