A developer is trying to join two tables,sales.purchases_fctandsales.customer_dim, using the following code:

fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'))
The developer has discovered that customers in thepurchases_fcttable that do not exist in thecustomer_dimtable are being dropped from the joined table.
Which change should be made to the code to stop these customer records from being dropped?
A. fact_df = purch_df.join(cust_df, F.col('customer_id') == F.col('custid'), 'left')Given the code:

df = spark.read.csv("large_dataset.csv")
filtered_df = df.filter(col("error_column").contains("error"))
mapped_df = filtered_df.select(split(col("timestamp")," ").getItem(0).alias("date"), lit(1).alias("count"))
reduced_df = mapped_df.groupBy("date").sum("count")
reduced_df.count()
reduced_df.show()
At which point will Spark actually begin processing the data?
A. When the filter transformation is appliedA developer wants to refactor some older Spark code to leverage built-in functions introduced in Spark 3.5.0. The existing code performs array manipulations manually. Which of the following code snippets utilizes new built-in functions in Spark 3.5.0 for array operations?


A Spark DataFramedfis cached using theMEMORY_AND_DISKstorage level, but the DataFrame is too large to fit entirely in memory. What is the likely behavior when Spark runs out of memory to store the DataFrame?
A. Spark duplicates the DataFrame in both memory and disk. If it doesn't fit in memory, the DataFrame is stored and retrieved from the disk entirely.A Spark application suffers from too many small tasks due to excessive partitioning. How can this be fixed without a full shuffle?
A. Use the distinct() transformation to combine similar partitionsGiven the code fragment:

import pyspark.pandas as ps
psdf = ps.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
Which method is used to convert a Pandas API on Spark DataFrame (pyspark.pandas.DataFrame) into a standard PySpark DataFrame (pyspark.sql.DataFrame)?
A. psdf.to_spark()A developer notices that all the post-shuffle partitions in a dataset are smaller than the value set forspark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold. Which type of join will Adaptive Query Execution (AQE) choose in this case?
A. A Cartesian joinA data engineer wants to create an external table from a JSON file located at/data/input.jsonwith the following requirements: Create an external table namedusers Automatically infer schema Merge records with differing schemas Which code snippet should the engineer use?
A. CREATE TABLE users USING json OPTIONS (path '/data/input.json')The following code fragment results in an error:

Which code fragment should be used instead?

A Data Analyst is working on the DataFramesensor_df, which contains two columns:
Which code fragment returns a DataFrame that splits therecordcolumn into separate columns and has one array item per row?

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK-35 exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.