Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 121:

    You need to set access to BigQuery for different departments within your company.

    Your solution should comply with the following requirements:

    1.

    Each department should have access only to their data. Each department will have one or more leads who need to be able to create and update tables and provide them to their team.

    2.

    Each department has data analysts who need to be able to query but not modify data.

    How should you set access to the data in BigQuery?

    A. Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.

    B. Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.

    C. Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.

    D. Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.

  • Question 122:

    You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?

    A. Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.

    B. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.

    C. Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.

    D. Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.

  • Question 123:

    You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?

    A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.

    B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

    C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.

    D. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

  • Question 124:

    You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)

    A. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.

    B. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.

    C. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.

    D. Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones.

    E. Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.

  • Question 125:

    You are designing the architecture to process your data from Cloud Storage to BigQuery by using Dataflow. The network team provided you with the Shared VPC network and subnetwork to be used by your pipelines. You need to enable the deployment of the pipeline on the Shared VPC network. What should you do?

    A. Assign the compute. networkUser role to the Dataflow service agent.

    B. Assign the compute.networkUser role to the service account that executes the Dataflow pipeline.

    C. Assign the dataflow, admin role to the Dataflow service agent.

    D. Assign the dataflow, admin role to the service account that executes the Dataflow pipeline.

  • Question 126:

    You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose?

    A. Use BigQuery machine learning to be able to train the model at scale, so you can analyze the packages in batches.

    B. Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications.

    C. Use the Cloud Vision API to detect for damage, and raise an alert through Cloud Functions. Integrate the package tracking applications with this function.

    D. Use TensorFlow to create a model that is trained on your corpus of images. Create a Python notebook in Cloud Datalab that uses this model so you can analyze for damaged packages.

  • Question 127:

    Your business users need a way to clean and prepare data before using the data for analysis. Your business users are less technically savvy and prefer to work with graphical user interfaces to define their transformations. After the data has been transformed, the business users want to perform their analysis directly in a spreadsheet. You need to recommend a solution that they can use. What should you do?

    A. Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Connected Sheets.

    B. Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Looker Studio.

    C. Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Connected Sheets.

    D. Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Looker Studio.

  • Question 128:

    Your new customer has requested daily reports that show their net consumption of Google Cloud compute resources and who used the resources. You need to quickly and efficiently generate these daily reports. What should you do?

    A. Do daily exports of Cloud Logging data to BigQuery. Create views filtering by project, log type, resource, and user.

    B. Filter data in Cloud Logging by project, resource, and user; then export the data in CSV format.

    C. Filter data in Cloud Logging by project, log type, resource, and user, then import the data into BigQuery.

    D. Export Cloud Logging data to Cloud Storage in CSV format. Cleanse the data using Dataprep, filtering by project, resource, and user.

  • Question 129:

    You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?

    A. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.

    B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.

    C. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.

    D. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.

  • Question 130:

    You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbclO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL instance is running jn Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?

    A. Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.

    B. Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A

    C. Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.

    D. Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.