Exam Details

  • Exam Code
    :DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Databricks Certified Data Engineer Professional
  • Certification
    :Databricks Certifications
  • Vendor
    :Databricks
  • Total Questions
    :120 Q&As
  • Last Updated
    :Jul 02, 2025

Databricks Databricks Certifications DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 61:

    Review the following error traceback:

    Which statement describes the error being raised?

    A. The code executed was PvSoark but was executed in a Scala notebook.

    B. There is no column in the table named heartrateheartrateheartrate

    C. There is a type error because a column object cannot be multiplied.

    D. There is a type error because a DataFrame object cannot be multiplied.

    E. There is a syntax error because the heartrate column is not correctly identified as a column.

  • Question 62:

    An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:

    Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

    If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

    A. Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.

    B. Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.

    C. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.

    D. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will tail.

    E. Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.

  • Question 63:

    A Databricks job has been configured with 3 tasks, each of which is a Databricks notebook. Task A does not depend on other tasks. Tasks B and C run in parallel, with each having a serial dependency on Task A.

    If task A fails during a scheduled run, which statement describes the results of this run?

    A. Because all tasks are managed as a dependency graph, no changes will be committed to the Lakehouse until all tasks have successfully been completed.

    B. Tasks B and C will attempt to run as configured; any changes made in task A will be rolled back due to task failure.

    C. Unless all tasks complete successfully, no changes will be committed to the Lakehouse; because task A failed, all commits will be rolled back automatically.

    D. Tasks B and C will be skipped; some logic expressed in task A may have been committed before task failure.

    E. Tasks B and C will be skipped; task A will not commit any changes because of stage failure.

  • Question 64:

    A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an external orchestration tool to trigger job runs through the REST API. Both users authorized the REST API calls using their personal access tokens.

    Which statement describes the contents of the workspace audit logs concerning these events?

    A. Because the REST API was used for job creation and triggering runs, a Service Principal will be automatically used to identity these events.

    B. Because User B last configured the jobs, their identity will be associated with both the job creation events and the job run events.

    C. Because these events are managed separately, User A will have their identity associated with the job creation events and User B will have their identity associated with the job run events.

    D. Because the REST API was used for job creation and triggering runs, user identity will not be captured in the audit logs.

    E. Because User A created the jobs, their identity will be associated with both the job creation events and the job run events.

  • Question 65:

    A team of data engineer are adding tables to a DLT pipeline that contain repetitive expectations for many of the same data quality checks.

    One member of the team suggests reusing these data quality rules across all tables defined for this pipeline.

    What approach would allow them to do this?

    A. Maintain data quality rules in a Delta table outside of this pipeline's target schema, providing the schema name as a pipeline parameter.

    B. Use global Python variables to make expectations visible across DLT notebooks included in the same pipeline.

    C. Add data quality constraints to tables in this pipeline using an external job with access to pipeline configuration files.

    D. Maintain data quality rules in a separate Databricks notebook that each DLT notebook of file.

  • Question 66:

    Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.

    Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

    A. Stage's detail screen and Executor's files

    B. Stage's detail screen and Query's detail screen

    C. Driver's and Executor's log files

    D. Executor's detail screen and Executor's log files

  • Question 67:

    A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.

    Which situation is causing increased duration of the overall job?

    A. Task queueing resulting from improper thread pool assignment.

    B. Spill resulting from attached volume storage being too small.

    C. Network latency due to some cluster nodes being in different regions from the source data

    D. Skew caused by more data being assigned to a subset of spark-partitions.

    E. Credential validation errors while pulling data from an external system.

  • Question 68:

    A data engineer wants to reflector the following DLT code, which includes multiple definition with very similar code:

    In an attempt to programmatically create these tables using a parameterized table definition, the data engineer writes the following code.

    The pipeline runs an update with this refactored code, but generates a different DAG showing incorrect configuration values for tables. How can the data engineer fix this?

    A. Convert the list of configuration values to a dictionary of table settings, using table names as keys.

    B. Convert the list of configuration values to a dictionary of table settings, using different input the for loop.

    C. Load the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.

    D. Wrap the loop inside another table definition, using generalized names and properties to replace with those from the inner table

  • Question 69:

    The data engineer is using Spark's MEMORY_ONLY storage level.

    Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

    A. Size on Disk is> 0

    B. The number of Cached Partitions> the number of Spark Partitions

    C. The RDD Block Name included the '' annotation signaling failure to cache

    D. On Heap Memory Usage is within 75% of off Heap Memory usage

  • Question 70:

    The data engineer team has been tasked with configured connections to an external database that does not have a supported native connector with Databricks. The external database already has data security configured by group membership. These groups map directly to user group already created in Databricks that represent various teams within the company.

    A new login credential has been created for each group in the external database. The Databricks Utilities Secrets module will be used to make these credentials available to Databricks users.

    Assuming that all the credentials are configured correctly on the external database and group membership is properly configured on Databricks, which statement describes how teams can be granted the minimum necessary access to using these credentials?

    A. `'Read'' permissions should be set on a secret key mapped to those credentials that will be used by a given team.

    B. No additional configuration is necessary as long as all users are configured as administrators in the workspace where secrets have been added.

    C. "Read" permissions should be set on a secret scope containing only those credentials that will be used by a given team.

    D. "Manage" permission should be set on a secret scope containing only those credentials that will be used by a given team.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Databricks exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your DATABRICKS-CERTIFIED-PROFESSIONAL-DATA-ENGINEER exam preparations and Databricks certification application, do not hesitate to visit our Vcedump.com to find your solutions here.