In which order should the code blocks shown below be run in order to assign articlesDf a DataFrame that lists all items in column attributes ordered by the number of times these items occur, from most to least often?
Sample of DataFrame articlesDf:
1.
articlesDf = articlesDf.groupby("col")
2.
articlesDf = articlesDf.select(explode(col("attributes")))
3.
articlesDf = articlesDf.orderBy("count").select("col")
4.
articlesDf = articlesDf.sort("count",ascending=False).select("col")
5.
articlesDf = articlesDf.groupby("col").count()
A. 4, 5
B. 2, 5, 3
C. 5, 2
D. 2, 3, 4
E. 2, 5, 4
Which of the following code blocks creates a new DataFrame with two columns season and wind_speed_ms where column season is of data type string and column wind_speed_ms is of data type double?
A. spark.DataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})
B. spark.createDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])
C. 1. from pyspark.sql import types as T
2. spark.createDataFrame((("summer", 4.5), ("winter", 7.5)),
D. StructType([T.StructField("season", T.CharType()), T.StructField("season",
E. DoubleType())]))
F. spark.newDataFrame([("summer", 4.5), ("winter", 7.5)], ["season", "wind_speed_ms"])
G. spark.createDataFrame({"season": ["winter","summer"], "wind_speed_ms": [4.5, 7.5]})
Which of the following code blocks creates a new DataFrame with 3 columns, productId, highest, and
lowest, that shows the biggest and smallest values of column value per value in column
productId from DataFrame transactionsDf?
Sample of DataFrame transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f|
3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.| 4| null| null| 3| 2|null|
8.| 5| null| null| null| 2|null|
9.| 6| 3| 2| 25| 2|null|
10.+-------------+---------+-----+-------+---------+----+
A. transactionsDf.max('value').min('value')
B. transactionsDf.agg(max('value').alias('highest'), min('value').alias('lowest'))
C. transactionsDf.groupby(col(productId)).agg(max(col(value)).alias("highest"), min(col(value)).alias ("lowest"))
D. transactionsDf.groupby('productId').agg(max('value').alias('highest'), min('value').alias('lowest'))
E. transactionsDf.groupby("productId").agg({"highest": max("value"), "lowest": min("value")})
Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk?
A. itemsDf.persist(StorageLevel.MEMORY_ONLY)
B. itemsDf.cache(StorageLevel.MEMORY_AND_DISK)
C. itemsDf.store()
D. itemsDf.cache()
E. itemsDf.write.option('destination', 'memory').save()
The code block displayed below contains an error. The code block should produce a DataFrame with color
as the only column and three rows with color values of red, blue, and green, respectively.
Find the error.
Code block:
1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color")
2.Instead of calling spark.createDataFrame, just DataFrame should be called.
A. The commas in the tuples with the colors should be eliminated.
B. The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples.
C. Instead of color, a data type should be specified.
D. The "color" expression needs to be wrapped in brackets, so it reads ["color"].
Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId?
A. transactionsDf.groupBy(col(storeId).avg())
B. transactionsDf.groupBy("storeId").avg(col("value"))
C. transactionsDf.groupBy("storeId").agg(avg("value"))
D. transactionsDf.groupBy("storeId").agg(average("value"))
E. transactionsDf.groupBy("value").average()
The code block shown below should return a copy of DataFrame transactionsDf with an added column
cos. This column should have the values in column value converted to degrees and having the cosine of
those converted values taken, rounded to two decimals. Choose the answer that correctly fills the blanks in
the code block to accomplish this.
Code block:
transactionsDf.__1__(__2__, round(__3__(__4__(__5__)),2))
A. 1. withColumn
2.
col("cos")
3.
cos
4.
degrees
5.
transactionsDf.value
B. 1. withColumnRenamed
2.
"cos"
3.
cos
4.
degrees
5.
"transactionsDf.value"
C. 1. withColumn
2.
"cos"
3.
cos
4.
degrees
5.
transactionsDf.value
D. 1. withColumn
2.
col("cos")
3.
cos
4.
degrees
5.
col("value")
E. 1. withColumn
2.
"cos"
3.
degrees
4.
cos
5.
col("value")
The code block displayed below contains multiple errors. The code block should return a DataFrame that
contains only columns transactionId, predError, value and storeId of DataFrame
transactionsDf. Find the errors.
Code block:
transactionsDf.select([col(productId), col(f)])
Sample of transactionsDf:
1.+-------------+---------+-----+-------+---------+----+
2.|transactionId|predError|value|storeId|productId| f| 3.+-------------+---------+-----+-------+---------+----+
4.| 1| 3| 4| 25| 1|null|
5.| 2| 6| 7| 2| 2|null|
6.| 3| 3| null| 25| 3|null|
7.+-------------+---------+-----+-------+---------+----+
A. The column names should be listed directly as arguments to the operator and not as a list.
B. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all column names should be expressed as strings without being wrapped in a col() operator.
C. The select operator should be replaced by a drop operator.
D. The column names should be listed directly as arguments to the operator and not as a list and following the pattern of how column names are expressed in the code block, columns productId and f should be replaced by transactionId, predError, value and storeId.
E. The select operator should be replaced by a drop operator, the column names should be listed directly as arguments to the operator and not as a list, and all col() operators should be removed.
Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame?
A. tranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer')
B. transactionsDf.select(col('value'), col('productId')).agg({'*': 'count'})
C. transactionsDf.select('value', 'productId').distinct()
D. transactionsDf.select('value').union(transactionsDf.select('productId')).distinct()
E. transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'})
The code block shown below should store DataFrame transactionsDf on two different executors, utilizing the executors' memory as much as possible, but not writing anything to disk. Choose the answer that correctly fills the blanks in the code block to accomplish this.
1.from pyspark import StorageLevel 2.transactionsDf.__1__(StorageLevel.__2__).__3__
A. 1. cache
2.
MEMORY_ONLY_2
3.
count()
B. 1. persist
2.
DISK_ONLY_2
3.
count()
C. 1. persist
2.
MEMORY_ONLY_2
3.
select()
D. 1. cache
2.
DISK_ONLY_2
3.
count()
E. 1. persist
2.
MEMORY_ONLY_2
3.
count()
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.