Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
  • Exam Name
    :Databricks Certified Associate Developer for Apache Spark 3.0
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :180 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

  • Question 131:

    Which of the following describes a valid concern about partitioning?

    A. A shuffle operation returns 200 partitions if not explicitly set.

    B. Decreasing the number of partitions reduces the overall runtime of narrow transformations if there are more executors available than partitions.

    C. No data is exchanged between executors when coalesce() is run.

    D. Short partition processing times are indicative of low skew.

    E. The coalesce() method should be used to increase the number of partitions.

  • Question 132:

    The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that correctly fills the blanks in the code block to accomplish this.

    transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)

    A. 1. save

    2.

    mode

    3.

    "ignore"

    4.

    "compression"

    5.

    path

    B. 1. store

    2.

    with

    3.

    "replacement"

    4.

    "compression"

    5.

    path

    C. 1. write

    2.

    mode

    3.

    "overwrite"

    4.

    "compression"

    5.

    save

    (Correct)

    D. 1. save

    2.

    mode

    3.

    "replace"

    4.

    "compression"

    5.

    path

    E. 1. write

    2.

    mode

    3.

    "overwrite"

    4.

    compression

    5.

    parquet

  • Question 133:

    Which of the following statements about Spark's configuration properties is incorrect?

    A. The maximum number of tasks that an executor can process at the same time is controlled by the spark.task.cpus property.

    B. The maximum number of tasks that an executor can process at the same time is controlled by the spark.executor.cores property.

    C. The default value for spark.sql.autoBroadcastJoinThreshold is 10MB.

    D. The default number of partitions to use when shuffling data for joins or aggregations is 300.

    E. The default number of partitions returned from certain transformations can be controlled by the spark.default.parallelism property.

  • Question 134:

    Which of the following describes the difference between client and cluster execution modes?

    A. In cluster mode, the driver runs on the worker nodes, while the client mode runs the driver on the client machine.

    B. In cluster mode, the driver runs on the edge node, while the client mode runs the driver in a worker node.

    C. In cluster mode, each node will launch its own executor, while in client mode, executors will exclusively run on the client machine.

    D. In client mode, the cluster manager runs on the same host as the driver, while in cluster mode, the cluster manager runs on a separate node.

    E. In cluster mode, the driver runs on the master node, while in client mode, the driver runs on a virtual machine in the cloud.

  • Question 135:

    Which of the following code blocks reads in parquet file /FileStore/imports.parquet as a DataFrame?

    A. spark.mode("parquet").read("/FileStore/imports.parquet")

    B. spark.read.path("/FileStore/imports.parquet", source="parquet")

    C. spark.read().parquet("/FileStore/imports.parquet")

    D. spark.read.parquet("/FileStore/imports.parquet")

    E. spark.read().format('parquet').open("/FileStore/imports.parquet")

  • Question 136:

    Which of the following statements about stages is correct?

    A. Different stages in a job may be executed in parallel.

    B. Stages consist of one or more jobs.

    C. Stages ephemerally store transactions, before they are committed through actions.

    D. Tasks in a stage may be executed by multiple machines at the same time.

    E. Stages may contain multiple actions, narrow, and wide transformations.

  • Question 137:

    Which of the following code blocks reads all CSV files in directory filePath into a single DataFrame, with column names defined in the CSV file headers? Content of directory filePath: 1._SUCCESS 2._committed_2754546451699747124 3._started_2754546451699747124 4.part-00000-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-298-1- c000.csv.gz 5.part-00001-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-299-1- c000.csv.gz 6.part-00002-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-300-1- c000.csv.gz 7.part-00003-tid-2754546451699747124-10eb85bf-8d91-4dd0-b60b-2f3c02eeecaa-301-1- c000.csv.gz spark.option("header",True).csv(filePath)

    A. spark.read.format("csv").option("header",True).option("compression","zip").load(filePath)

    B. spark.read().option("header",True).load(filePath)

    C. spark.read.format("csv").option("header",True).load(filePath)

    D. spark.read.load(filePath)

  • Question 138:

    Which of the following code blocks shuffles DataFrame transactionsDf, which has 8 partitions, so that it has 10 partitions?

    A. transactionsDf.repartition(transactionsDf.getNumPartitions()+2)

    B. transactionsDf.repartition(transactionsDf.rdd.getNumPartitions()+2)

    C. transactionsDf.coalesce(10)

    D. transactionsDf.coalesce(transactionsDf.getNumPartitions()+2)

    E. transactionsDf.repartition(transactionsDf._partitions+2)

  • Question 139:

    Which of the following code blocks returns a one-column DataFrame for which every row contains an array of all integer numbers from 0 up to and including the number given in column predError of DataFrame transactionsDf, and null if predError is null?

    Sample of DataFrame transactionsDf: 1.+-------------+---------+-----+-------+---------+----+ 2.|transactionId|predError|value|storeId|productId| f| 3.+-------------+---------+-----+-------+---------+----+ 4.| 1| 3| 4| 25| 1|null| 5.| 2| 6| 7| 2| 2|null|

    6.| 3| 3| null| 25| 3|null|

    7.| 4| null| null| 3| 2|null|

    8.| 5| null| null| null| 2|null|

    9.| 6| 3| 2| 25| 2|null|

    10.+-------------+---------+-----+-------+---------+----+

    A. 1.def count_to_target(target):

    2.

    if target is None:

    3.

    return

    4.

    5.

    result = [range(target)]

    6.

    return result

    7.

    8.count_to_target_udf = udf(count_to_target, ArrayType[IntegerType]) 9.

    10.transactionsDf.select(count_to_target_udf(col('predError')))

    B. 1.def count_to_target(target):

    2.

    if target is None:

    3.

    return

    4.

    5.

    result = list(range(target))

    6.

    return result

    7.

    8.transactionsDf.select(count_to_target(col('predError')))

    C. 1.def count_to_target(target):

    2.

    if target is None:

    3.

    return

    4.

    5.

    result = list(range(target))

    6.

    return result

    7.

    8.count_to_target_udf = udf(count_to_target, ArrayType(IntegerType())) 9.

    10.transactionsDf.select(count_to_target_udf('predError')) (Correct)

    D. 1.def count_to_target(target):

    2.

    result = list(range(target))

    3.

    return result

    4.

    5.count_to_target_udf = udf(count_to_target, ArrayType(IntegerType())) 6.

    7.df = transactionsDf.select(count_to_target_udf('predError'))

    E. 1.def count_to_target(target):

    2.

    if target is None:

    3.

    return

    4.

    5.

    result = list(range(target))

    6.

    return result

    7.

    8.count_to_target_udf = udf(count_to_target)

    9.

    10.transactionsDf.select(count_to_target_udf('predError'))

  • Question 140:

    The code block displayed below contains an error. The code block should return a copy of DataFrame transactionsDf where the name of column transactionId has been changed to transactionNumber. Find the error.

    Code block:

    transactionsDf.withColumn("transactionNumber", "transactionId")

    A. The arguments to the withColumn method need to be reordered.

    B. The arguments to the withColumn method need to be reordered and the copy() operator should be appended to the code block to ensure a copy is returned.

    C. The copy() operator should be appended to the code block to ensure a copy is returned.

    D. Each column name needs to be wrapped in the col() method and method withColumn should be replaced by method withColumnRenamed.

    E. The method withColumn should be replaced by method withColumnRenamed and the arguments to the method need to be reordered.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.