Which of the following statements about DAGs is correct?
A. DAGs help direct how Spark executors process tasks, but are a limitation to the proper execution of a query when an executor fails.
B. DAG stands for "Directing Acyclic Graph".
C. Spark strategically hides DAGs from developers, since the high degree of automation in Spark means that developers never need to consider DAG layouts.
D. In contrast to transformations, DAGs are never lazily executed.
E. DAGs can be decomposed into tasks that are executed in parallel.
Which of the following is a characteristic of the cluster manager?
A. Each cluster manager works on a single partition of data.
B. The cluster manager receives input from the driver through the SparkContext.
C. The cluster manager does not exist in standalone mode.
D. The cluster manager transforms jobs into DAGs.
E. In client mode, the cluster manager runs on the edge node.
Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?
A. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))
B. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-ddHH:mm:ss"))
C. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))
D. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-ddHH:mm:ss"))
E. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])
Which of the following describes a way for resizing a DataFrame from 16 to 8 partitions in the most efficient way?
A. Use operation DataFrame.repartition(8) to shuffle the DataFrame and reduce the number of partitions.
B. Use operation DataFrame.coalesce(8) to fully shuffle the DataFrame and reduce the number of partitions.
C. Use a narrow transformation to reduce the number of partitions.
D. Use a wide transformation to reduce the number of partitions.
E. Use operation DataFrame.coalesce(0.5) to halve the number of partitions in the DataFrame.
The code block shown below should return a DataFrame with columns transactionsId, predError, value, and f from DataFrame transactionsDf. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__)
A. 1. filter
2. "transactionId", "predError", "value", "f"
B. 1. select
2. "transactionId, predError, value, f"
C. 1. select
2. ["transactionId", "predError", "value", "f"]
D. 1. where
2. col("transactionId"), col("predError"), col("value"), col("f") E. 1. select
2. col(["transactionId", "predError", "value", "f"])
Which of the following code blocks returns a new DataFrame with only columns predError and values of
every second row of DataFrame transactionsDf?
Entire DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
A. transactionsDf.filter(col("transactionId").isin([3,4,6])).select([predError, value])
B. transactionsDf.select(col("transactionId").isin([3,4,6]), "predError", "value")
C. transactionsDf.filter("transactionId" % 2 == 0).select("predError", "value")
D. transactionsDf.filter(col("transactionId") % 2 == 0).select("predError", "value") (Correct)
E. 1.transactionsDf.createOrReplaceTempView("transactionsDf") 2.spark.sql("FROM transactionsDf SELECT predError, value WHERE transactionId % 2 = 2")
F. transactionsDf.filter(col(transactionId).isin([3,4,6]))
Which is the highest level in Spark's execution hierarchy?
A. Task
B. Executor
C. Slot
D. Job
E. Stage
Which of the following code blocks reorders the values inside the arrays in column attributes of DataFrame
itemsDf from last to first one in the alphabet?
1.+------+-----------------------------+-------------------+
2.|itemId|attributes |supplier |
3.+------+-----------------------------+-------------------+
4.|1 |[blue, winter, cozy] |Sports Company Inc.|
5.|2 |[red, summer, fresh, cooling]|YetiX |
6.|3 |[green, summer, travel] |Sports Company Inc.|
7.+------+-----------------------------+-------------------+
A. itemsDf.withColumn('attributes', sort_array(col('attributes').desc()))
B. itemsDf.withColumn('attributes', sort_array(desc('attributes')))
C. itemsDf.withColumn('attributes', sort(col('attributes'), asc=False))
D. itemsDf.withColumn("attributes", sort_array("attributes", asc=False))
E. itemsDf.select(sort_array("attributes"))
Which of the following describes characteristics of the Dataset API?
A. The Dataset API does not support unstructured data.
B. In Python, the Dataset API mainly resembles Pandas' DataFrame API.
C. In Python, the Dataset API's schema is constructed via type hints.
D. The Dataset API is available in Scala, but it is not available in Python.
E. The Dataset API does not provide compile-time type safety.
Which of the following are valid execution modes?
A. Kubernetes, Local, Client
B. Client, Cluster, Local
C. Server, Standalone, Client
D. Cluster, Server, Local
E. Standalone, Client, Cluster
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.