Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?
A. itemsDf.withColumn(["supplier", "manufacturer"])
B. itemsDf.withColumn("supplier").alias("manufacturer")
C. itemsDf.withColumnRenamed("supplier", "manufacturer")
D. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
E. itemsDf.withColumnsRenamed("supplier", "manufacturer")
The code block shown below should write DataFrame transactionsDf to disk at path csvPath as a single CSV file, using tabs (\t characters) as separators between columns, expressing missing
values as string n/a, and omitting a header row with column names. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__.write.__2__(__3__, " ").__4__.__5__(csvPath)
A. 1. coalesce(1)
2.
option
3.
"sep"
4.
option("header", True)
5.
path
B. 1. coalesce(1)
2.
option
3.
"colsep"
4.
option("nullValue", "n/a")
5.
path
C. 1. repartition(1)
2.
option
3.
"sep"
4.
option("nullValue", "n/a")
5.
csv
D. 1. csv
2.
option
3.
"sep"
4.
option("emptyValue", "n/a")
5.
path
?
1.
repartition(1)
2.
mode
3.
"sep"
4.
mode("nullValue", "n/a")
5.
csv
The code block shown below should set the number of partitions that Spark uses when shuffling data for joins or aggregations to 100. Choose the answer that correctly fills the blanks in the code block to accomplish this.
spark.sql.shuffle.partitions
__1__.__2__.__3__(__4__, 100)
A. 1. spark
2.
conf
3.
set
4.
"spark.sql.shuffle.partitions"
B. 1. pyspark
2.
config
3.
set
4.
spark.shuffle.partitions
C. 1. spark
2.
conf
3.
get
4.
"spark.sql.shuffle.partitions"
D. 1. pyspark
2.
config
3.
set
4.
"spark.sql.shuffle.partitions"
E. 1. spark
2.
conf
3.
set
4.
"spark.sql.aggregate.partitions"
Which of the following DataFrame operators is never classified as a wide transformation?
A. DataFrame.sort()
B. DataFrame.aggregate()
C. DataFrame.repartition()
D. DataFrame.select()
E. DataFrame.join()
Which of the following describes a shuffle?
A. A shuffle is a process that is executed during a broadcast hash join.
B. A shuffle is a process that compares data across executors.
C. A shuffle is a process that compares data across partitions.
D. A shuffle is a Spark operation that results from DataFrame.coalesce().
E. A shuffle is a process that allocates partitions to executors.
The code block shown below should return a copy of DataFrame transactionsDf without columns value and productId and with an additional column associateId that has the value 5. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf.__1__(__2__, __3__).__4__(__5__, 'value')
A. 1. withColumn
2.
'associateId'
3.
5
4.
remove
5.
'productId'
B. 1. withNewColumn
2.
associateId
3.
lit(5)
4.
drop
5.
productId
C. 1. withColumn
2.
'associateId'
3.
lit(5)
4.
drop
5.
'productId'
D. 1. withColumnRenamed
2.
'associateId'
3.
5
4.
drop
5.
'productId'
E. 1. withColumn
2.
col(associateId)
3.
lit(5)
4.
drop
5.
col(productId)
The code block shown below should return all rows of DataFrame itemsDf that have at least 3 items in column itemNameElements. Choose the answer that correctly fills the blanks in the code block to accomplish this.
Example of DataFrame itemsDf:
1.+------+----------------------------------+-------------------+------------------------------------------+
2.|itemId|itemName |supplier |itemNameElements |
3.+------+----------------------------------+-------------------+------------------------------------------+
4.|1 |Thick Coat for Walking in the Snow|Sports Company Inc.|[Thick, Coat, for, Walking, in, the, Snow]|
5.|2 |Elegant Outdoors Summer Dress |YetiX |[Elegant, Outdoors, Summer, Dress] |
6.|3 |Outdoors Backpack |Sports Company Inc.|[Outdoors, Backpack] |
7.+------+----------------------------------+-------------------+------------------------------------------+
Code block:
itemsDf.__1__(__2__(__3__)__4__)
A. 1. select
2.
count
3.
col("itemNameElements")
4.
>3
B. 1. filter
2.
count
3.
itemNameElements
4.
>=3
C. 1. select
2.
count
3.
"itemNameElements"
4.
>3
D. 1. filter
2.
size
3.
"itemNameElements"
4.
>=3
E. 1. select
2.
size
3.
"itemNameElements"
4.
>3
In which order should the code blocks shown below be run in order to create a DataFrame that shows the mean of column predError of DataFrame transactionsDf per column storeId and productId, where productId should be either 2 or 3 and the returned DataFrame should be sorted in ascending order by column storeId, leaving out any nulls in that column?
DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
1.
.mean("predError")
2.
.groupBy("storeId")
3.
.orderBy("storeId")
4.
transactionsDf.filter(transactionsDf.storeId.isNotNull())
5.
.pivot("productId", [2, 3])
A. 4, 5, 2, 3, 1
B. 4, 2, 1
C. 4, 1, 5, 2, 3
D. 4, 2, 5, 1, 3
E. 4, 3, 2, 5, 1
Which of the following describes slots?
A. Slots are dynamically created and destroyed in accordance with an executor's workload.
B. To optimize I/O performance, Spark stores data on disk in multiple slots.
C. A Java Virtual Machine (JVM) working as an executor can be considered as a pool of slots for task execution.
D. A slot is always limited to a single core. Slots are the communication interface for executors and are used for receiving commands and sending results to the driver.
Which of the following describes Spark's standalone deployment mode?
A. Standalone mode uses a single JVM to run Spark driver and executor processes.
B. Standalone mode means that the cluster does not contain the driver.
C. Standalone mode is how Spark runs on YARN and Mesos clusters.
D. Standalone mode uses only a single executor per worker per application.
E. Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.