Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Databricks Certified Data Engineer Professional
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :120 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 11:

    The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes.

    A junior engineer has written the following code to add CHECK constraints to the Delta Lake table:

    A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed. Which statement explains the cause of this failure?

    A. Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.

    B. The activity details table already exists; CHECK constraints can only be added during initial table creation.

    C. The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.

    D. The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table.

    E. The current table schema does not contain the field valid coordinates; schema evolution will need to be enabled before altering the table to add a constraint.

  • Question 12:

    A table named user_ltv is being used to create a view that will be used by data analysis on various teams. Users in the workspace are configured into groups, which are used for setting up data access using ACLs. The user_ltv table has the following schema:

    An analyze who is not a member of the auditing group executing the following query:

    Which result will be returned by this query?

    A. All columns will be displayed normally for those records that have an age greater than 18; records not meeting this condition will be omitted.

    B. All columns will be displayed normally for those records that have an age greater than 17; records not meeting this condition will be omitted.

    C. All age values less than 18 will be returned as null values all other columns will be returned with the values in user_ltv.

    D. All records from all columns will be displayed with the values in user_ltv.

  • Question 13:

    A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

    In which location can one review the timeline for cluster resizing events?

    A. Workspace audit logs

    B. Driver's log file

    C. Ganglia

    D. Cluster Event Log

    E. Executor's log file

  • Question 14:

    A CHECK constraint has been successfully added to the Delta table named activity_details using the following logic:

    A batch job is attempting to insert new records to the table, including a record where latitude = 45.50 and longitude = 212.67. Which statement describes the outcome of this batch insert?

    A. The write will fail when the violating record is reached; any records previously processed will be recorded to the target table.

    B. The write will fail completely because of the constraint violation and no records will be inserted into the target table.

    C. The write will insert all records except those that violate the table constraints; the violating records will be recorded to a quarantine table.

    D. The write will include all records in the target table; any violations will be indicated in the boolean column named valid_coordinates.

    E. The write will insert all records except those that violate the table constraints; the violating records will be reported in a warning log.

  • Question 15:

    When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?

    A. Cluster: New Job Cluster; Retries: Unlimited; Maximum Concurrent Runs: Unlimited

    B. Cluster: New Job Cluster; Retries: None; Maximum Concurrent Runs: 1

    C. Cluster: Existing All-Purpose Cluster; Retries: Unlimited; Maximum Concurrent Runs: 1

    D. Cluster: Existing All-Purpose Cluster; Retries: Unlimited; Maximum Concurrent Runs: 1

    E. Cluster: Existing All-Purpose Cluster; Retries: None; Maximum Concurrent Runs: 1

  • Question 16:

    The data science team has created and logged a production using MLFlow. The model accepts a list of column names and returns a new column of type DOUBLE.

    The following code correctly imports the production model, load the customer table containing the customer_id key column into a Dataframe, and defines the feature columns needed for the model.

    Which code block will output DataFrame with the schema'' customer_id LONG, predictions DOUBLE''?

    A. Model, predict (df, columns)

    B. Df, map (lambda k:midel (x [columns]) ,select (''customer_id predictions'')

    C. Df. Select (''customer_id''. Model (''columns) alias (''predictions'')

    D. Df.apply(model, columns). Select (''customer_id, prediction''

  • Question 17:

    The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comments, table comments, and setting the custom table property "contains_pii" = true. The following SQL DDL statement is executed to create a new table:

    Which command allows manual confirmation that these three requirements have been met?

    A. DESCRIBE EXTENDED dev.pii test

    B. DESCRIBE DETAIL dev.pii test

    C. SHOW TBLPROPERTIES dev.pii test

    D. DESCRIBE HISTORY dev.pii test

    E. SHOW TABLES dev

  • Question 18:

    The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".

    The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day. Which code block accomplishes this task while minimizing potential compute costs?

    A. Option A

    B. Option B

    C. Option C

    D. Option D

    E. Option E

  • Question 19:

    A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic from an Apache Kafka source, but they will write data with the same schema. To keep the directory structure simple, a data engineer has decided to nest a checkpoint directory to be shared by both streams.

    The proposed directory structure is displayed below: Which statement describes whether this checkpoint directory structure is valid for the given scenario and why?

    A. No; Delta Lake manages streaming checkpoints in the transaction log.

    B. Yes; both of the streams can share a single checkpoint directory.

    C. No; only one stream can write to a Delta Lake table.

    D. Yes; Delta Lake supports infinite concurrent writers.

    E. No; each of the streams needs to have its own checkpoint directory.

  • Question 20:

    An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to load data with the following code:

    df = spark.read.format("parquet").load(f"/mnt/source/(date)")

    Which code block should be used to create the date Python variable used in the above code block?

    A. date = spark.conf.get("date")

    B. input_dict = input() date= input_dict["date"]

    C. import sys date = sys.argv[1]

    D. date = dbutils.notebooks.getParam("date")

    E. dbutils.widgets.text("date", "null") date = dbutils.widgets.get("date")

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.