Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK
  • Exam Name
    :Databricks Certified Associate Developer for Apache Spark 3.0
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :180 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Questions & Answers

  • Question 31:

    Which of the following code blocks prints out in how many rows the expression Inc. appears in the stringtype column supplier of DataFrame itemsDf?

    A. 1.counter = 0

    2.

    3.for index, row in itemsDf.iterrows():

    4.

    if 'Inc.' in row['supplier']:

    5.

    counter = counter + 1

    6.

    7.print(counter)

    B. 1.counter = 0

    2.

    3.def count(x):

    4.

    if 'Inc.' in x['supplier']:

    5.

    counter = counter + 1

    6.

    7.itemsDf.foreach(count)

    8.print(counter)

    C. print(itemsDf.foreach(lambda x: 'Inc.' in x))

    D. print(itemsDf.foreach(lambda x: 'Inc.' in x).sum())

    E. 1.accum=sc.accumulator(0)

    2.

    3.def check_if_inc_in_supplier(row):

    4.

    if 'Inc.' in row['supplier']:

    5.

    accum.add(1)

    6.

    7.itemsDf.foreach(check_if_inc_in_supplier)

    8.print(accum.value)

  • Question 32:

    Which of the following code blocks performs a join in which the small DataFrame transactionsDf is sent to all executors where it is joined with DataFrame itemsDf on columns storeId and itemId, respectively?

    A. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "right_outer")

    B. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.storeId, "broadcast")

    C. itemsDf.merge(transactionsDf, "itemsDf.itemId == transactionsDf.storeId", "broadcast")

    D. itemsDf.join(broadcast(transactionsDf), itemsDf.itemId == transactionsDf.storeId)

    E. itemsDf.join(transactionsDf, broadcast(itemsDf.itemId == transactionsDf.storeId))

  • Question 33:

    The code block displayed below contains an error. The code block should read the csv file located at path data/transactions.csv into DataFrame transactionsDf, using the first row as column header and casting the columns in the most appropriate type. Find the error. First 3 rows of transactions.csv: 1.transactionId;storeId;productId;name 2.1;23;12;green grass 3.2;35;31;yellow sun 4.3;23;12;green grass Code block: transactionsDf = spark.read.load("data/transactions.csv", sep=";", format="csv", header=True)

    A. The DataFrameReader is not accessed correctly.

    B. The transaction is evaluated lazily, so no file will be read.

    C. Spark is unable to understand the file type.

    D. The code block is unable to capture all columns.

    E. The resulting DataFrame will not have the appropriate schema.

  • Question 34:

    The code block displayed below contains an error. The code block should return all rows of DataFrame transactionsDf, but including only columns storeId and predError. Find the error.

    Code block:

    spark.collect(transactionsDf.select("storeId", "predError"))

    A. Instead of select, DataFrame transactionsDf needs to be filtered using the filter operator.

    B. Columns storeId and predError need to be represented as a Python list, so they need to be wrapped in brackets ([]).

    C. The take method should be used instead of the collect method.

    D. Instead of collect, collectAsRows needs to be called.

    E. The collect method is not a method of the SparkSession object.

  • Question 35:

    Which of the following statements about reducing out-of-memory errors is incorrect?

    A. Concatenating multiple string columns into a single column may guard against out-of- memory errors.

    B. Reducing partition size can help against out-of-memory errors.

    C. Limiting the amount of data being automatically broadcast in joins can help against out- of-memory errors.

    D. Setting a limit on the maximum size of serialized data returned to the driver may help prevent out-ofmemory errors.

    E. Decreasing the number of cores available to each executor can help against out-of- memory errors.

  • Question 36:

    The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that correctly fills the blanks in the code block to accomplish this.

    A. transactionsDf.remove(transactionsDf.storeId==25)

    B. transactionsDf.where(transactionsDf.storeId!=25)

    C. transactionsDf.filter(transactionsDf.storeId==25)

    D. transactionsDf.drop(transactionsDf.storeId==25)

    E. transactionsDf.select(transactionsDf.storeId!=25)

  • Question 37:

    Which of the following DataFrame methods is classified as a transformation?

    A. DataFrame.count()

    B. DataFrame.show()

    C. DataFrame.select()

    D. DataFrame.foreach()

    E. DataFrame.first()

  • Question 38:

    Which of the following code blocks saves DataFrame transactionsDf in location /FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

    A. transactionsDf.write.save("/FileStore/transactions.csv")

    B. transactionsDf.write.format("csv").mode("error").path("/FileStore/transactions.csv")

    C. transactionsDf.write.format("csv").mode("ignore").path("/FileStore/transactions.csv")

    D. transactionsDf.write("csv").mode("error").save("/FileStore/transactions.csv")

    E. transactionsDf.write.format("csv").mode("error").save("/FileStore/transactions.csv")

  • Question 39:

    The code block displayed below contains at least one error. The code block should return a DataFrame

    with only one column, result. That column should include all values in column value from DataFrame transactionsDf raised to the power of 5, and a null value for rows in which there is no value in column value. Find the error(s).

    Code block: 1.from pyspark.sql.functions import udf 2.from pyspark.sql import types as T

    3.

    4.transactionsDf.createOrReplaceTempView('transactions')

    5.

    6.def pow_5(x):

    7.

    return x**5

    8.

    9.spark.udf.register(pow_5, 'power_5_udf', T.LongType())

    10.spark.sql('SELECT power_5_udf(value) FROM transactions')

    A. The pow_5 method is unable to handle empty values in column value and the name of the column in the returned DataFrame is not result.

    B. The returned DataFrame includes multiple columns instead of just one column.

    C. The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and the SparkSession cannot access the transactionsDf DataFrame.

    D. The pow_5 method is unable to handle empty values in column value, the name of the column in the returned DataFrame is not result, and Spark driver does not call the UDF function appropriately.

    E. The pow_5 method is unable to handle empty values in column value, the UDF function is not registered properly with the Spark driver, and the name of the column in the returned DataFrame is not result.

  • Question 40:

    The code block displayed below contains an error. The code block is intended to perform an outer join of

    DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively.

    Find the error.

    Code block:

    transactionsDf.join(itemsDf, [itemsDf.itemId, transactionsDf.productId], "outer")

    A. The "outer" argument should be eliminated, since "outer" is the default join type.

    B. The join type needs to be appended to the join() operator, like join().outer() instead of listing it as the last argument inside the join() call.

    C. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.itemId == transactionsDf.productId.

    D. The term [itemsDf.itemId, transactionsDf.productId] should be replaced by itemsDf.col("itemId") == transactionsDf.col("productId").

    E. The "outer" argument should be eliminated from the call and join should be replaced by joinOuter.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.