Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 131:

    You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query:

    SELECT country, state, city FROM [myproject:mydataset.mytable] GROUP BY country

    You check the query plan for the query and see the following output in the Read section of Stage:1:

    What is the most likely cause of the delay for this query?

    A. Users are running too many concurrent queries in the system

    B. The [myproject:mydataset.mytable] table has too many partitions

    C. Either the state or the city columns in the [myproject:mydataset.mytable] table have too many NULL values

    D. Most rows in the [myproject:mydataset.mytable] table have the same value in the country column, causing data skew

  • Question 132:

    You are using Cloud Bigtable to persist and serve stock market data for each of the major indices. To serve the trading application, you need to access only the most recent stock prices that are streaming in How should you design your row key and tables to ensure that you can access the data with the most simple query?

    A. Create one unique table for all of the indices, and then use the index and timestamp as the row key design

    B. Create one unique table for all of the indices, and then use a reverse timestamp as the row key design.

    C. For each index, have a separate table and use a timestamp as the row key design

    D. For each index, have a separate table and use a reverse timestamp as the row key design

  • Question 133:

    You are designing a cloud-native historical data processing system to meet the following conditions:

    1.

    The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Cloud Dataproc, BigQuery, and Compute Engine.

    2.

    A streaming data pipeline stores new data daily.

    3.

    Peformance is not a factor in the solution.

    4.

    The solution design should maximize availability. How should you design data storage for this solution?

    A. Create a Cloud Dataproc cluster with high availability. Store the data in HDFS, and peform analysis as needed.

    B. Store the data in BigQuery. Access the data using the BigQuery Connector or Cloud Dataproc and Compute Engine.

    C. Store the data in a regional Cloud Storage bucket. Aceess the bucket directly using Cloud Dataproc, BigQuery, and Compute Engine.

    D. Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Cloud Dataproc, BigQuery, and Compute Engine.

  • Question 134:

    Your startup has a web application that currently serves customers out of a single region in Asia. You are targeting funding that will allow your startup lo serve customers globally. Your current goal is to optimize for cost, and your post-funding goat is to optimize for global presence and performance. You must use a native JDBC driver. What should you do?

    A. Use Cloud Spanner to configure a single region instance initially. and then configure multi-region C oud Spanner instances after securing funding.

    B. Use a Cloud SQL for PostgreSQL highly available instance first, and 8table with US.Europe, and Asia replication alter securing funding

    C. Use a Cloud SQL for PostgreSQL zonal instance first and Bigtable with US. Europe, and Asia after securing funding.

    D. Use a Cloud SOL for PostgreSQL zonal instance first, and Cloud SOL for PostgreSQL with highly available configuration after securing funding.

  • Question 135:

    Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?

    A. Migrate the workload to Google Cloud Dataflow

    B. Use pre-emptible virtual machines (VMs) for the cluster

    C. Use a higher-memory node so that the job runs faster

    D. Use SSDs on the worker nodes so that the job can run faster

  • Question 136:

    Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers process those bids. Each bid event contains the item, amount, user,

    and timestamp. You want to collate those bid events into a single location in real time to determine which user bid first.

    What should you do?

    A. Create a file on a shared file and have the application servers write all bid events to that file. Process the file with Apache Hadoop to identify which user bid first.

    B. Have each application server write the bid events to Cloud Pub/Sub as they occur. Push the events from Cloud Pub/Sub to a custom endpoint that writes the bid event information into Cloud SQL.

    C. Set up a MySQL database for each application server to write bid events into. Periodically query each of those distributed MySQL databases and update a master MySQL database with bid event information.

    D. Have each application server write the bid events to Google Cloud Pub/Sub as they occur. Use a pull subscription to pull the bid events using Google Cloud Dataflow. Give the bid for each item to the user in the bid event that is processed first.

  • Question 137:

    Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload, they will add new data from existing on-premises applications every day. What should they do?

    A. Execute gsutil rsync from the on-premises servers.

    B. Use Cloud Dataflow and write the data to Cloud Storage.

    C. Write a job template in Cloud Dataproc to perform the data transfer.

    D. Install an FTP server on a Compute Engine VM to receive the files and move them to Cloud Storage.

  • Question 138:

    You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

    A. Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in- time snapshot to recover the data.

    B. Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

    C. Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

    D. Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

  • Question 139:

    You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformations that will not require programming or knowledge of SQL.

    What should you do?

    A. Use Cloud Dataflow with Beam to detect errors and perform transformations.

    B. Use Cloud Dataprep with recipes to detect errors and perform transformations.

    C. Use Cloud Dataproc with a Hadoop job to detect errors and perform transformations.

    D. Use federated tables in BigQuery with queries to detect errors and perform transformations.

  • Question 140:

    You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping the costs low. You also want to create snapshots of the data periodically, so that you can make a point-in-time (PIT) recovery, or clone a copy of the data for Cloud Datastore in a different environment. You want to archive these snapshots for a long time. Which two methods can accomplish this? Choose 2 answers.

    A. Use managed export, and store the data in a Cloud Storage bucket using Nearline or Coldline class.

    B. Use managed exportm, and then import to Cloud Datastore in a separate project under a unique namespace reserved for that export.

    C. Use managed export, and then import the data into a BigQuery table created just for that export, and delete temporary export files.

    D. Write an application that uses Cloud Datastore client libraries to read all the entities. Treat each entity as a BigQuery table row via BigQuery streaming insert. Assign an export timestamp for each export, and attach it as an extra column for each row. Make sure that the BigQuery table is partitioned using the export timestamp column.

    E. Write an application that uses Cloud Datastore client libraries to read all the entities. Format the exported data into a JSON file. Apply compression before storing the data in Cloud Source Repositories.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.