PROFESSIONAL-DATA-ENGINEER Practice Questions & Online Exam Preparation

PROFESSIONAL-DATA-ENGINEER Exam Details

Exam Code
:PROFESSIONAL-DATA-ENGINEER
Exam Name
:Professional Data Engineer on Google Cloud Platform
Certification
:Google Certifications
Vendor
:Google
Total Questions
:331 Q&As
Last Updated
:Jun 24, 2026

Google PROFESSIONAL-DATA-ENGINEER Online Questions & Answers

Question 271:

You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These ops are used inside your main training loop and are performing bulky matrix multiplications. It currently takes up to several days to train a model. You want to decrease this time significantly and keep the cost low by using an accelerator on Google Cloud. What should you do?
A. Use Cloud TPUs without any additional adjustment to your code.
B. Use Cloud TPUs after implementing GPU kernel support for your customs ops.
C. Use Cloud GPUs after implementing GPU kernel support for your customs ops.
D. Stay on CPUs, and increase the size of the cluster you're training your model on.

Correct Answer: B
Question 272:

You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30?0 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?
A. Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
B. Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
C. Modify your pipeline to maintain the last 30?0 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
D. Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.

Correct Answer: C
Question 273:

You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of you Cloud Bigtable cluster. Which two actions can you take to accomplish this? Choose 2 answers.
A. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Read pressure index is above 100.
B. Review Key Visualizer metrics. Increase the size of the Cloud Bigtable cluster when the Write pressure index is above 100.
C. Monitor the latency of write operations. Increase the size of the Cloud Bigtable cluster when there is a sustained increase in write latency.
D. Monitor storage utilization. Increase the size of the Cloud Bigtable cluster when utilization increases above 70% of max capacity.
E. Monitor latency of read operations. Increase the size of the Cloud Bigtable cluster of read operations take longer than 100 ms.

Correct Answer: AC
Question 274:

You have a data pipeline with a Dataflow job that aggregates and writes time series metrics to Bigtable. You notice that data is slow to update in Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. What should you do?
Choose 2 answers
A. Configure your Dataflow pipeline to use local execution.
B. Modify your Dataflow pipeline lo use the Flatten transform before writing to Bigtable.
C. Modify your Dataflow pipeline to use the CoGrcupByKey transform before writing to Bigtable.
D. Increase the maximum number of Dataflow workers by setting maxNumWorkers in PipelineOptions.
E. Increase the number of nodes in the Bigtable cluster.

Correct Answer: DE
https://cloud.google.com/bigtable/docs/performance#performance-write-throughput https://cloud.google.com/dataflow/docs/reference/pipeline-options
Question 275:

You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application's interface. You do not want individual users to authenticate to BigQuery and you do not want to give them access to the dataset. You need to securely access BigQuery from your IT application.
What should you do?
A. Create groups for your users and give those groups access to the dataset
B. Integrate with a single sign-on (SSO) platform, and pass each user's credentials along with the query request
C. Create a service account and grant dataset access to that account. Use the service account's private key to access the dataset
D. Create a dummy user and grant dataset access to that user. Store the username and password for that user in a file on the files system, and use those credentials to access the BigQuery dataset

Correct Answer: C
Question 276:

You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. You also need to monitor and adjust for null values, which must remain real-valued and cannot be removed. What should you do?
A. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to `none' using a Cloud Dataproc job.
B. Use Cloud Dataprep to find null values in sample source data. Convert all nulls to 0 using a Cloud Dataprep job.
C. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to `none' using a Cloud Dataprep job.
D. Use Cloud Dataflow to find null values in sample source data. Convert all nulls to using a custom script.

Correct Answer: C
Question 277:

You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?
A. Cloud Scheduler
B. Cloud Dataflow
C. Cloud Functions
D. Cloud Composer

Correct Answer: A
Question 278:

You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying aggregate values for multiple users who will query the data in Cloud Storage with multiple engines. Which storage service and schema design should you use?
A. Use Cloud Bigtable for storage. Install the HBase shell on a Compute Engine instance to query the Cloud Bigtable data.
B. Use Cloud Bigtable for storage. Link as permanent tables in BigQuery for query.
C. Use Cloud Storage for storage. Link as permanent tables in BigQuery for query.
D. Use Cloud Storage for storage. Link as temporary tables in BigQuery for query.

Correct Answer: A
Question 279:

You work for a large ecommerce company. You are using Pub/Sub to ingest the clickstream data to Google Cloud for analytics. You observe that when a new subscriber connects to an existing topic to analyze data, they are unable to subscribe to older data for an upcoming yearly sale event in two months, you need a solution that, once implemented, will enable any new subscriber to read the last 30 days of data. What should you do?
A. Create a new topic, and publish the last 30 days of data each time a new subscriber connects to an existing topic.
B. Set the topic retention policy to 30 days.
C. Set the subscriber retention policy to 30 days.
D. Ask the source system to re-push the data to Pub/Sub, and subscribe to it.

Correct Answer: B
By setting the topic retention policy to 30 days, you can ensure that any new subscriber can access the messages that were published to the topic within the last 30 days1. This feature allows you to replay previously acknowledged messages or initialize new subscribers with historical data2. You can configure the topic retention policy by using the Cloud Console, the gcloud command-line tool, or the Pub/Sub API1. Option A is not efficient, as it requires creating a new topic and duplicating the data for each new subscriber, which would increase the storage costs and complexity. Option C is not effective, as it only affects the unacknowledged messages in a subscription, and does not allow new subscribers to access older data3. Option D is not feasible, as it depends on the source system's ability and willingness to re-push the data, and it may cause data duplication or inconsistency. References:
1: Create a topic | Cloud Pub/Sub Documentation | Google Cloud
2: Replay and purge messages with seek | Cloud Pub/Sub Documentation | Google Cloud
3: When is a PubSub Subscription considered to be inactive?
Question 280:

You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across the organization. You need to support additional concurrent users and reduce the amount of time required to write the data. Which two actions should you take? (Choose two.)
A. Configure your Cloud Dataflow pipeline to use local execution
B. Increase the maximum number of Cloud Dataflow workers by setting maxNumWorkers in PipelineOptions
C. Increase the number of nodes in the Cloud Bigtable cluster
D. Modify your Cloud Dataflow pipeline to use the Flatten transform before writing to Cloud Bigtable
E. Modify your Cloud Dataflow pipeline to use the CoGroupByKey transform before writing to Cloud Bigtable

Correct Answer: BC

Related Exams:

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.

PROFESSIONAL-DATA-ENGINEER Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Google PROFESSIONAL-DATA-ENGINEER Online Questions & Answers

Question 271:

Question 272:

Question 273:

Question 274:

Question 275:

Question 276:

Question 277:

Question 278:

Question 279:

Question 280:

Related Exams:

ADWORDS-DISPLAY

ADWORDS-FUNDAMENTALS

ADWORDS-MOBILE

ADWORDS-REPORTING

ADWORDS-SEARCH

ADWORDS-SHOPPING

ADWORDS-VIDEO

APIGEE-API-ENGINEER

ASSOCIATE-ANDROID-DEVELOPER

ASSOCIATE-CLOUD-ENGINEER

Tips on How to Prepare for the Exams

Google PROFESSIONAL-DATA-ENGINEER Online Practice Questions and Exam Preparation

PROFESSIONAL-DATA-ENGINEER Exam Details

Exam Code

Exam Name

Certification

Vendor

Total Questions

Last Updated

Google PROFESSIONAL-DATA-ENGINEER Online Questions & Answers

Question 271:

Question 272:

Question 273:

Question 274:

Question 275:

Question 276:

Question 277:

Question 278:

Question 279:

Question 280:

Related Exams:

Tips on How to Prepare for the Exams