Which of the following is one of the big performance advantages that Spark has over Hadoop?
A. Spark achieves great performance by storing data in the DAG format, whereas Hadoop can only use parquet files.
B. Spark achieves higher resiliency for queries since, different from Hadoop, it can be deployed on Kubernetes.
C. Spark achieves great performance by storing data and performing computation in memory, whereas large jobs in Hadoop require a large amount of relatively slow disk I/O operations.
D. Spark achieves great performance by storing data in the HDFS format, whereas Hadoop can only use parquet files.
E. Spark achieves performance gains for developers by extending Hadoop's DataFrames with a user-friendly API.
Which of the following statements about executors is correct?
A. Executors are launched by the driver.
B. Executors stop upon application completion by default.
C. Each node hosts a single executor.
D. Executors store data in memory only.
E. An executor can serve multiple applications.
Which of the following describes the characteristics of accumulators?
A. Accumulators are used to pass around lookup tables across the cluster.
B. All accumulators used in a Spark application are listed in the Spark UI.
C. Accumulators can be instantiated directly via the accumulator(n) method of the pyspark.RDD module.
D. Accumulators are immutable.
E. If an action including an accumulator fails during execution and Spark manages to restart the action and complete it successfully, only the successful attempt will be counted in the accumulator.
Which of the following statements about garbage collection in Spark is incorrect?
A. Garbage collection information can be accessed in the Spark UI's stage detail view.
B. Optimizing garbage collection performance in Spark may limit caching ability.
C. Manually persisting RDDs in Spark prevents them from being garbage collected.
D. In Spark, using the G1 garbage collector is an alternative to using the default Parallel garbage collector.
E. Serialized caching is a strategy to increase the performance of garbage collection.
Which of the following code blocks returns a single row from DataFrame transactionsDf?
Full DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
A. transactionsDf.where(col("storeId").between(3,25))
B. transactionsDf.filter((col("storeId")!=25) | (col("productId")==2))
C. transactionsDf.filter(col("storeId")==25).select("predError","storeId").distinct()
D. transactionsDf.select("productId", "storeId").where("storeId == 2 OR storeId != 25")
E. transactionsDf.where(col("value").isNull()).select("productId", "storeId").distinct()
Which of the following code blocks generally causes a great amount of network traffic?
A. DataFrame.select()
B. DataFrame.coalesce()
C. DataFrame.collect()
D. DataFrame.rdd.map()
E. DataFrame.count()
Which of the following statements about RDDs is incorrect?
A. An RDD consists of a single partition.
B. The high-level DataFrame API is built on top of the low-level RDD API.
C. RDDs are immutable.
D. RDD stands for Resilient Distributed Dataset.
E. RDDs are great for precisely instructing Spark on how to do a query.
Which of the following code blocks efficiently converts DataFrame transactionsDf from 12 into 24 partitions?
A. transactionsDf.repartition(24, boost=True)
B. transactionsDf.repartition()
C. transactionsDf.repartition("itemId", 24)
D. transactionsDf.coalesce(24)
E. transactionsDf.repartition(24)
Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column predError in DataFrame transactionsDf?
A. transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))
B. transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))
C. transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))
D. transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))
E. transactionsDf.withColumn("predErrorSquared", "predError"**2)
Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns
transactionId, storeId, productId and f?
Sample of DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.+-------------+---------+-----+-------+---------+----+
A. transactionsDf.drop(col("value"), col("predError"))
B. transactionsDf.drop("predError", "value")
C. transactionsDf.drop(value, predError)
D. transactionsDf.drop(["predError", "value"])
E. transactionsDf.drop([col("predError"), col("value")])
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.