Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 221:

    Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

    A. Issue a command to restart the database servers.

    B. Retry the query with exponential backoff, up to a cap of 15 minutes.

    C. Retry the query every second until it comes back online to minimize staleness of data.

    D. Reduce the query frequency to once every hour until the database comes back online.

  • Question 222:

    You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

    A. There are very few occurrences of mutations relative to normal samples.

    B. There are roughly equal occurrences of both normal and mutated samples in the database.

    C. You expect future mutations to have different features from the mutated samples in the database.

    D. You expect future mutations to have similar features to the mutated samples in the database.

    E. You already have labels for which samples are mutated and which are normal in the database.

  • Question 223:

    Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error: # Syntax error : Expected end of statement but got "-" at [4:11] SELECT age FROM bigquery-public-data.noaa_gsod.gsod WHERE age != 99 AND_TABLE_SUFFIX = `1929' ORDER BY age DESC Which table name will make the SQL statement work correctly?

    A. `bigquery-public-data.noaa_gsod.gsod`

    B. bigquery-public-data.noaa_gsod.gsod*

    C. `bigquery-public-data.noaa_gsod.gsod'*

    D. `bigquery-public-data.noaa_gsod.gsod*`

  • Question 224:

    You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update. What should you do?

    A. Update the current pipeline and use the drain flag.

    B. Update the current pipeline and provide the transform mapping JSON object.

    C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.

    D. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

  • Question 225:

    You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

    A. Continuously retrain the model on just the new data.

    B. Continuously retrain the model on a combination of existing data and the new data.

    C. Train on the existing data while using the new data as your test set.

    D. Train on the new data while using the existing data as your test set.

  • Question 226:

    You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

    A. Send the data to Google Cloud Datastore and then export to BigQuery.

    B. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.

    C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.

    D. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

  • Question 227:

    Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations. What should you do?

    A. Add a node to the MySQL cluster and build an OLAP cube there.

    B. Use an ETL tool to load the data from MySQL into Google BigQuery.

    C. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.

    D. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

  • Question 228:

    You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

    A. Add capacity (memory and disk space) to the database server by the order of 200.

    B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.

    C. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.

    D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.

  • Question 229:

    You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

    A. The message body for the sensor event is too large.

    B. Your custom endpoint has an out-of-date SSL certificate.

    C. The Cloud Pub/Sub topic has too many messages published to it.

    D. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.

  • Question 230:

    Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

    A. Threading

    B. Serialization

    C. Dropout Methods

    D. Dimensionality Reduction

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.