Vcedump 100% Guareented PROFESSIONAL-DATA-ENGINEER Questions and Answers. 100% Pass Guarantee. Latest Questions with Accurate Answers.

Exam Details

Exam Code
:PROFESSIONAL-DATA-ENGINEER
Exam Name
:Professional Data Engineer on Google Cloud Platform
Certification
:Google Certifications
Vendor
:Google
Total Questions
:331 Q&As
Last Updated
:Jul 10, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

Question 121:

You need to set access to BigQuery for different departments within your company.
Your solution should comply with the following requirements:
1.
Each department should have access only to their data. Each department will have one or more leads who need to be able to create and update tables and provide them to their team.
2.
Each department has data analysts who need to be able to query but not modify data.
How should you set access to the data in BigQuery?
A. Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.
B. Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.
C. Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.
D. Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.

Correct Answer: D
Question 122:

You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also want to monitor and accommodate input data volume that will vary in size with minimal manual intervention. What should you do?
A. Use Cloud Dataproc to run your transformations. Monitor CPU utilization for the cluster. Resize the number of worker nodes in your cluster via the command line.
B. Use Cloud Dataproc to run your transformations. Use the diagnose command to generate an operational output archive. Locate the bottleneck and adjust cluster resources.
C. Use Cloud Dataflow to run your transformations. Monitor the job system lag with Stackdriver. Use the default autoscaling setting for worker instances.
D. Use Cloud Dataflow to run your transformations. Monitor the total execution time for a sampling of jobs. Configure the job to use non-default Compute Engine machine types when needed.

Correct Answer: B
Question 123:

You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption keys that you can create, rotate, and destroy as needed. What should you do?
A. Create a dedicated service account, and use encryption at rest to reference your data stored in your Compute Engine cluster instances as part of your API service calls.
B. Create encryption keys in Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
C. Create encryption keys locally. Upload your encryption keys to Cloud Key Management Service. Use those keys to encrypt your data in all of the Compute Engine cluster instances.
D. Create encryption keys in Cloud Key Management Service. Reference those keys in your API service calls when accessing the data in your Compute Engine cluster instances.

Correct Answer: C
Question 124:

You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc? (Choose two.)
A. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.
B. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.
C. Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.
D. Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones.
E. Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.

Correct Answer: BC
Question 125:

You are designing the architecture to process your data from Cloud Storage to BigQuery by using Dataflow. The network team provided you with the Shared VPC network and subnetwork to be used by your pipelines. You need to enable the deployment of the pipeline on the Shared VPC network. What should you do?
A. Assign the compute. networkUser role to the Dataflow service agent.
B. Assign the compute.networkUser role to the service account that executes the Dataflow pipeline.
C. Assign the dataflow, admin role to the Dataflow service agent.
D. Assign the dataflow, admin role to the service account that executes the Dataflow pipeline.

Correct Answer: B
To use a Shared VPC network for a Dataflow pipeline, you need to specify the subnetwork parameter with the full URL of the subnetwork, and grant the service account that executes the pipeline the compute.networkUser role in the host
project. This role allows the service account to use the subnetworks in the Shared VPC network. The Dataflow service agent does not need this role, as it only creates and manages the resources for the pipeline, but does not execute it. The
dataflow.admin role is not related to the network access, but to the permissions to create and delete Dataflow jobs and resources. References:
Specify a network and subnetwork | Cloud Dataflow | Google Cloud How to config dataflow Pipeline to use a Shared VPC?
Question 126:

You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines to detect and track any visual damage to the packages in transit. You need to create a way to automate the detection of damaged packages and flag them for human review in real time while the packages are in transit. Which solution should you choose?
A. Use BigQuery machine learning to be able to train the model at scale, so you can analyze the packages in batches.
B. Train an AutoML model on your corpus of images, and build an API around that model to integrate with the package tracking applications.
C. Use the Cloud Vision API to detect for damage, and raise an alert through Cloud Functions. Integrate the package tracking applications with this function.
D. Use TensorFlow to create a model that is trained on your corpus of images. Create a Python notebook in Cloud Datalab that uses this model so you can analyze for damaged packages.

Correct Answer: A
Question 127:

Your business users need a way to clean and prepare data before using the data for analysis. Your business users are less technically savvy and prefer to work with graphical user interfaces to define their transformations. After the data has been transformed, the business users want to perform their analysis directly in a spreadsheet. You need to recommend a solution that they can use. What should you do?
A. Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Connected Sheets.
B. Use Dataprep to clean the data, and write the results to BigQuery Analyze the data by using Looker Studio.
C. Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Connected Sheets.
D. Use Dataflow to clean the data, and write the results to BigQuery. Analyze the data by using Looker Studio.

Correct Answer: A
For business users who are less technically savvy and prefer graphical user interfaces, Dataprep is an ideal tool for cleaning and preparing data, as it offers a user- friendly interface for defining data transformations without the need for coding. Once the data is cleaned and prepared, writing the results to BigQuery allows for the storage and management of large datasets. Analyzing the data using Connected Sheets enables business users to work within the familiar environment of a spreadsheet, leveraging the power of BigQuery directly within Google Sheets. This solution aligns with the needs of the users and follows Google's recommended practices for data cleaning, preparation, and analysis. References: Connected Sheets | Google Sheets | Google for Developers Professional Data Engineer Certification uide | Learn - Google Cloud Engineer Data in Google Cloud | Google Cloud Skills Boost - Qwiklabs
Question 128:

Your new customer has requested daily reports that show their net consumption of Google Cloud compute resources and who used the resources. You need to quickly and efficiently generate these daily reports. What should you do?
A. Do daily exports of Cloud Logging data to BigQuery. Create views filtering by project, log type, resource, and user.
B. Filter data in Cloud Logging by project, resource, and user; then export the data in CSV format.
C. Filter data in Cloud Logging by project, log type, resource, and user, then import the data into BigQuery.
D. Export Cloud Logging data to Cloud Storage in CSV format. Cleanse the data using Dataprep, filtering by project, resource, and user.

Correct Answer: B
https://cloud.google.com/logging/docs/view/logs-explorer- interface?cloudshell=true
Question 129:

You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL pipeline is regularly modified and can generate errors, but sometimes the errors are detected only after 2 weeks. You need to provide a method to recover from these errors, and your backups should be optimized for storage costs. How should you organize your data in BigQuery and store your backups?
A. Organize your data in a single table, export, and compress and store the BigQuery data in Cloud Storage.
B. Organize your data in separate tables for each month, and export, compress, and store the data in Cloud Storage.
C. Organize your data in separate tables for each month, and duplicate your data on a separate dataset in BigQuery.
D. Organize your data in separate tables for each month, and use snapshot decorators to restore the table to a time prior to the corruption.

Correct Answer: D
Question 130:

You are developing an Apache Beam pipeline to extract data from a Cloud SQL instance by using JdbclO. You have two projects running in Google Cloud. The pipeline will be deployed and executed on Dataflow in Project A. The Cloud SQL instance is running jn Project B and does not have a public IP address. After deploying the pipeline, you noticed that the pipeline failed to extract data from the Cloud SQL instance due to connection failure. You verified that VPC Service Controls and shared VPC are not in use in these projects. You want to resolve this error while ensuring that the data does not go through the public internet. What should you do?
A. Set up VPC Network Peering between Project A and Project B. Add a firewall rule to allow the peered subnet range to access all instances on the network.
B. Turn off the external IP addresses on the Dataflow worker. Enable Cloud NAT in Project A
C. Set up VPC Network Peering between Project A and Project B. Create a Compute Engine instance without external IP address in Project B on the peered subnet to serve as a proxy server to the Cloud SQL database.
D. Add the external IP addresses of the Dataflow worker as authorized networks in the Cloud SOL instance.

Correct Answer: C
Option A is incorrect because VPC Network Peering alone does not enable connectivity to Cloud SQL instances with private IP addresses. You also need to configure private services access and allocate an IP address range for the service
producer network1.
Option B is incorrect because Cloud NAT does not support Cloud SQL instances with private IP addresses. Cloud NAT only provides outbound connectivity for resources that do not have public IP addresses, such as VMs, GKE clusters, and
serverless instances2.
Option C is correct because it allows you to use a Compute Engine instance as a proxy server to connect to the Cloud SQL database over the peered network. The proxy server does not need an external IP address because it can
communicate with the Dataflow workers and the Cloud SQL instance using internal IP addresses. You need to install the Cloud SQL Auth proxy on the proxy server and configure it to use a service account that has the Cloud SQL Client role.
Option D is incorrect because it requires you to assign public IP addresses to the Dataflow workers, which exposes the data to the public internet. This violates the requirement of ensuring that the data does not go through the public internet.
Moreover, adding authorized networks does not work for Cloud SQL instances with private IP addresses.

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

Question 121:

Question 122:

Question 123:

Question 124:

Question 125:

Question 126:

Question 127:

Question 128:

Question 129:

Question 130:

Related Exams:

ADWORDS-DISPLAY

ADWORDS-FUNDAMENTALS

ADWORDS-MOBILE

ADWORDS-REPORTING

ADWORDS-SEARCH

ADWORDS-SHOPPING

ADWORDS-VIDEO

APIGEE-API-ENGINEER

ASSOCIATE-ANDROID-DEVELOPER

ASSOCIATE-CLOUD-ENGINEER

Tips on How to Prepare for the Exams

Professional Data Engineer on Google Cloud Platform

Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

Question 121:

Question 122:

Question 123:

Question 124:

Question 125:

Question 126:

Question 127:

Question 128:

Question 129:

Question 130:

Related Exams:

Tips on How to Prepare for the Exams