Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 71:

    Suppose you have a dataset of images that are each labeled as to whether or not they contain a human face. To create a neural network that recognizes human faces in images using this labeled dataset, what approach would likely be the most effective?

    A. Use K-means Clustering to detect faces in the pixels.

    B. Use feature engineering to add features for eyes, noses, and mouths to the input data.

    C. Use deep learning by creating a neural network with multiple hidden layers to automatically detect features of faces.

    D. Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two categories.

  • Question 72:

    Which SQL keyword can be used to reduce the number of columns processed by BigQuery?

    A. BETWEEN

    B. WHERE

    C. SELECT

    D. LIMIT

  • Question 73:

    Which of the following is NOT true about Dataflow pipelines?

    A. Dataflow pipelines are tied to Dataflow, and cannot be run on any other runner

    B. Dataflow pipelines can consume data from other Google Cloud services

    C. Dataflow pipelines can be programmed in Java

    D. Dataflow pipelines use a unified programming model, so can work both with streaming and batch data sources

  • Question 74:

    Which of the following statements about Legacy SQL and Standard SQL is not true?

    A. Standard SQL is the preferred query language for BigQuery.

    B. If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.

    C. One difference between the two query languages is how you specify fully-qualified table names (i.e. table names that include their associated project name).

    D. You need to set a query language for each dataset and the default is Standard SQL.

  • Question 75:

    You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design of separating storage from compute, you decided to

    use the Cloud Storage connector to store all input data, output data, and intermediary data. However, you noticed that one Hadoop job runs very slowly with Cloud Dataproc, when compared with the on- premises bare-metal Hadoop

    environment (8-core nodes with 100-GB RAM). Analysis shows that this particular Hadoop job is disk I/O intensive. You want to resolve the issue.

    What should you do?

    A. Allocate sufficient memory to the Hadoop cluster, so that the intermediary data of that particular Hadoop job can be held in memory

    B. Allocate sufficient persistent disk space to the Hadoop cluster, and store the intermediate data of that particular Hadoop job on native HDFS

    C. Allocate more CPU cores of the virtual machine instances of the Hadoop cluster so that the networking bandwidth for each instance can scale up

    D. Allocate additional network interface card (NIC), and configure link aggregation in the operating system to use the combined throughput when working with Cloud Storage

  • Question 76:

    Which software libraries are supported by Cloud Machine Learning Engine?

    A. Theano and TensorFlow

    B. Theano and Torch

    C. TensorFlow

    D. TensorFlow and Torch

  • Question 77:

    Your chemical company needs to manually check documentation for customer order. You use a pull subscription in Pub/Sub so that sales agents get details from the order. You must ensure that you do not process orders twice with different sales agents and that you do not add more complexity to this workflow. What should you do?

    A. Create a transactional database that monitors the pending messages.

    B. Create a new Pub/Sub push subscription to monitor the orders processed in the agent's system.

    C. Use Pub/Sub exactly-once delivery in your pull subscription.

    D. Use a Deduphcate PTransform in Dataflow before sending the messages to the sales agents.

  • Question 78:

    Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to periodically identify the inputs and their timings during their campaign. Engineers have decided to use windowing and transformation in Google Cloud Dataflow for this purpose. However, when testing this feature, they find that the Cloud Dataflow job fails for the all streaming insert.

    What is the most likely cause of this problem?

    A. They have not assigned the timestamp, which causes the job to fail

    B. They have not set the triggers to accommodate the data coming in late, which causes the job to fail

    C. They have not applied a global windowing function, which causes the job to fail when the pipeline is created

    D. They have not applied a non-global windowing function, which causes the job to fail when the pipeline is created

  • Question 79:

    You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for implementing these transformations include:

    1.

    Executing the transformations on a schedule

    2.

    Enabling non-developer analysts to modify transformations Providing a graphical tool for designing transformations

    What should you do?

    A. Use Cloud Dataprep to build and maintain the transformation recipes, and execute them on a scheduled basis

    B. Load each month's CSV data into BigQuery, and write a SQL query to transform the data to a standard schema. Merge the transformed tables together with a SQL query

    C. Help the analysts write a Cloud Dataflow pipeline in Python to perform the transformation. The Python code should be stored in a revision control system and modified as the incoming data's schema changes

    D. Use Apache Spark on Cloud Dataproc to infer the schema of the CSV file before creating a Dataframe. Then implement the transformations in Spark SQL before writing the data out to Cloud Storage and loading into BigQuery

  • Question 80:

    You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

    A. Increase the cluster size with more non-preemptible workers.

    B. Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

    C. Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

    D. Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.