Which of the following is the deepest level in Spark's execution hierarchy?
A. Job
B. Task
C. Executor
D. Slot
E. Stage
Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?
A. Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions
B. Decrease values for the properties spark.default.parallelism and spark.sql.partitions
C. Increase values for the properties spark.sql.parallelism and spark.sql.partitions
D. Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions
E. Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions
The code block shown below should write DataFrame transactionsDf as a parquet file to path storeDir, using brotli compression and replacing any previously existing file. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.format("parquet").__2__(__3__).option(__4__, "brotli").__5__(storeDir)
A. 1. save
2.
mode
3.
"ignore"
4.
"compression"
5.
path
B. 1. store
2.
with
3.
"replacement"
4.
"compression"
5.
path
C. 1. write
2.
mode
3.
"overwrite"
4.
"compression"
5.
save
(Correct)
D. 1. save
2.
mode
3.
"replace"
4.
"compression"
5.
path
E. 1. write
2.
mode
3.
"overwrite"
4.
compression
5.
parquet
The code block shown below should return a DataFrame with only columns from DataFrame transactionsDf for which there is a corresponding transactionId in DataFrame itemsDf. DataFrame itemsDf is very small and much smaller than DataFrame transactionsDf. The query should be executed in an optimized way. Choose the answer that correctly fills the blanks in the code block to accomplish this. __1__.__2__(__3__, __4__, __5__)
A. 1. transactionsDf
2.
join
3.
broadcast(itemsDf)
4.
transactionsDf.transactionId==itemsDf.transactionId
5.
"outer"
B. 1. transactionsDf
2.
join
3.
itemsDf
4.
transactionsDf.transactionId==itemsDf.transactionId
5.
"anti"
C. 1. transactionsDf
2.
join
3.
broadcast(itemsDf)
4.
"transactionId"
5.
"left_semi"
D. 1. itemsDf
2.
broadcast
3.
transactionsDf
4.
"transactionId"
5.
"left_semi"
E. 1. itemsDf
2.
join
3.
broadcast(transactionsDf)
4.
"transactionId"
5.
"left_semi"
Which of the following code blocks reads in the JSON file stored at filePath, enforcing the schema expressed in JSON format in variable json_schema, shown in the code block below?
Code block: 1.json_schema = """ 2.{"type": "struct",
3.
"fields": [
4.
{
5.
"name": "itemId",
6.
"type": "integer",
7.
"nullable": true,
8.
"metadata": {}
9.
},
10.
{
11.
"name": "supplier",
12.
"type": "string",
13.
"nullable": true,
14.
"metadata": {}
15.
}
16.
] 17.} 18."""
A. spark.read.json(filePath, schema=json_schema)
B. spark.read.schema(json_schema).json(filePath) 1.schema = StructType.fromJson(json.loads(json_schema)) 2.spark.read.json(filePath, schema=schema)
C. spark.read.json(filePath, schema=schema_of_json(json_schema))
D. spark.read.json(filePath, schema=spark.read.json(json_schema))
Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed?
A. from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY)
B. transactionsDf.cache()
C. transactionsDf.storage_level('MEMORY_ONLY')
D. transactionsDf.persist()
E. transactionsDf.clear_persist()
F. from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY)
Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way?
A. transactionsDf.sort(asc(value)).show(10)
B. transactionsDf.sort(col("value")).show(10)
C. transactionsDf.sort(col("value").desc()).head()
D. transactionsDf.sort(col("value").asc()).print(10)
E. transactionsDf.orderBy("value").asc().show(10)
The code block shown below should add a column itemNameBetweenSeparators to DataFrame itemsDf.
The column should contain arrays of maximum 4 strings. The arrays should be composed of the values in
column itemsDf which are separated at - or whitespace characters. Choose the answer that correctly fills
the blanks in the code block to accomplish this.
Sample of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+
2.|itemId|itemName |supplier |
3.+------+----------------------------------+-------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|
5.|2 |Elegant Outdoors Summer Dress |YetiX |
6.|3 |Outdoors Backpack |Sports Company Inc.|
7.+------+----------------------------------+-------------------+
Code block:
itemsDf.__1__(__2__, __3__(__4__, "[\s\-]", __5__))
A. 1. withColumn
2.
"itemNameBetweenSeparators"
3.
split
4.
"itemName"
5.
4
(Correct)
B. 1. withColumnRenamed
2.
"itemNameBetweenSeparators"
3.
split
4.
"itemName"
5.
4
C. 1. withColumnRenamed
2.
"itemName"
3.
split
4.
"itemNameBetweenSeparators"
5.
4
D. 1. withColumn
2.
"itemNameBetweenSeparators"
3.
split
4.
"itemName"
5.
5
E. 1. withColumn
2.
itemNameBetweenSeparators
3.
str_split
4.
"itemName"
5.
5
Which of the elements that are labeled with a circle and a number contain an error or are misrepresented?
A. 1, 10
B. 1, 8
C. 10
D. 7, 9, 10
E. 1, 4, 6, 9
Which of the following code blocks silently writes DataFrame itemsDf in avro format to location fileLocation if a file does not yet exist at that location?
A. itemsDf.write.avro(fileLocation)
B. itemsDf.write.format("avro").mode("ignore").save(fileLocation)
C. itemsDf.write.format("avro").mode("errorifexists").save(fileLocation)
D. itemsDf.save.format("avro").mode("ignore").write(fileLocation)
E. spark.DataFrameWriter(itemsDf).format("avro").write(fileLocation)
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.