Professional Data Engineer on Google Cloud Platform
Exam Details
Exam Code
:PROFESSIONAL-DATA-ENGINEER
Exam Name
:Professional Data Engineer on Google Cloud Platform
Certification
:Google Certifications
Vendor
:Google
Total Questions
:331 Q&As
Last Updated
:May 19, 2025
Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers
Question 41:
What Dataflow concept determines when a Window's contents should be output based on certain criteria being met?
A. Sessions
B. OutputCriteria
C. Windows
D. Triggers
Correct Answer: D
Triggers control when the elements for a specific key and window are output. As elements arrive, they are put into one or more windows by a Window transform and its associated WindowFn, and then passed to the associated Trigger to determine if the Windows contents should be output. Reference: https://cloud.google.com/dataflow/java- sdk/JavaDoc/com/google/cloud/dataflow/sdk/transforms/windowing/Trigger
Question 42:
Scaling a Cloud Dataproc cluster typically involves ____.
A. increasing or decreasing the number of worker nodes
B. increasing or decreasing the number of master nodes
C. moving memory to run more applications on a single node
D. deleting applications from unused nodes periodically
Correct Answer: A
After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of worker nodes in the cluster at any time, even when jobs are running on the cluster. Cloud Dataproc clusters are typically scaled to: 1) increase the number of workers to make a job run faster 2) decrease the number of workers to save money 3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage Reference: https://cloud.google.com/dataproc/docs/concepts/scaling-clusters
Question 43:
Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so they can execute jobs?
A. Dataproc Worker
B. Dataproc Viewer
C. Dataproc Runner
D. Dataproc Editor
Correct Answer: A
Service accounts used with Cloud Dataproc must have Dataproc/Dataproc Worker role (or have all the permissions granted by Dataproc Worker role). Reference: https://cloud.google.com/dataproc/docs/concepts/service- accounts#important_notes
Question 44:
Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?
A. dataflow.worker
B. dataflow.compute
C. dataflow.developer
D. dataflow.viewer
Correct Answer: A
The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline Reference: https://cloud.google.com/dataflow/access-control
Question 45:
Cloud Bigtable is Google's ______ Big Data database service.
A. Relational
B. mySQL
C. NoSQL D. SQL Server
Correct Answer: C
Cloud Bigtable is Google's NoSQL Big Data database service. It is the same database that Google uses for services, such as Search, Analytics, Maps, and Gmail. It is used for requirements that are low latency and high throughput including
Internet of Things (IoT), user analytics, and financial data analysis.
Reference: https://cloud.google.com/bigtable/
Question 46:
By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?
A. Windows at every 100 MB of data
B. Single, Global Window
C. Windows at every 1 minute
D. Windows at every 10 minutes
Correct Answer: B
Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections
How would you query specific partitions in a BigQuery table?
A. Use the DAY column in the WHERE clause
B. Use the EXTRACT(DAY) clause
C. Use the __PARTITIONTIME pseudo-column in the WHERE clause
D. Use DATE BETWEEN in the WHERE clause
Correct Answer: C
Partitioned tables include a pseudo column named _PARTITIONTIME that contains a date- based timestamp for data loaded into the table. To limit a query to particular partitions (such as Jan 1st and 2nd of 2017), use a clause similar to this:
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2017-01-01') AND TIMESTAMP('2017-01-02')
When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and ____.
A. zone
B. node
C. label
D. type
Correct Answer: A
At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation: The project in which the cluster will be created The region to use The name of the cluster The zone in which the cluster will be created You can specify many more details beyond these minimum requirements. For example, you can also specify the number of workers, whether preemptible compute should be used, and the network settings. Reference: https://cloud.google.com/dataproc/docs/tutorials/python-library-example#create_a_new_cloud_dataproc_cluste
Question 49:
Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?
A. A sequential numeric ID
B. A timestamp followed by a stock symbol
C. A non-sequential numeric ID
D. A stock symbol followed by a timestamp
Correct Answer: AB
using a timestamp as the first element of a row key can cause a variety of problems. In brief, when a row key for a time series includes a timestamp, all of your writes will target a single node; fill that node; and then move onto the next node in the cluster, resulting in hotspotting. Suppose your system assigns a numeric ID to each of your application's users. You might be tempted to use the user's numeric ID as the row key for your table. However, since new users are more likely to be active users, this approach is likely to push most of your traffic to a small number of nodes. [https://cloud.google.com/bigtable/docs/schema-design] Reference: https://cloud.google.com/bigtable/docs/schema-design-timeseries#ensure_that_your_row_key_avoids_hotspotting
Question 50:
To give a user read permission for only the first three columns of a table, which access control method would you use?
A. Primitive role
B. Predefined role
C. Authorized view
D. It's not possible to give access to only the first three columns of a table.
Correct Answer: C
An authorized view allows you to share query results with particular users and groups without giving them read access to the underlying tables. Authorized views can only be created in a dataset that does not contain the tables queried by the view. When you create an authorized view, you use the view's SQL query to restrict access to only the rows and columns you want the users to see. Reference: https://cloud.google.com/bigquery/docs/views#authorized-views
Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.