Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 81:

    You are a retailer that wants to integrate your online sales capabilities with different in- home assistants, such as Google Home. You need to interpret customer voice commands and issue an order to the backend systems. Which solutions should you choose?

    A. Cloud Speech-to-Text API

    B. Cloud Natural Language API

    C. Dialogflow Enterprise Edition

    D. Cloud AutoML Natural Language

  • Question 82:

    You have a streaming pipeline that ingests data from Pub/Sub in production. You need to update this streaming pipeline with improved business logic. You need to ensure that the updated pipeline reprocesses the previous two days of delivered Pub/Sub messages.

    What should you do? Choose 2 answers

    A. Use Pub/Sub Seek with a timestamp.

    B. Use the Pub/Sub subscription clear-retry-policy flag.

    C. Create a new Pub/Sub subscription two days before the deployment.

    D. Use the Pub/Sub subscription retain-asked-messages flag.

    E. Use Pub/Sub Snapshot capture two days before the deployment.

  • Question 83:

    You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run. To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

    A. Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline on Dataproc to write the data into BigQuery

    B. Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

    C. Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table

    D. Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

  • Question 84:

    Your United States-based company has created an application for assessing and responding to user actions. The primary table's data volume grows by 250,000 records per second. Many third parties use your application's APIs to build the functionality into their own frontend applications. Your application's APIs should comply with the following requirements:

    1.

    Single global endpoint

    2.

    ANSI SQL support

    3.

    Consistent access to the most up-to-date data

    What should you do?

    A. Implement BigQuery with no region selected for storage or processing.

    B. Implement Cloud Spanner with the leader in North America and read-only replicas in Asia and Europe.

    C. Implement Cloud SQL for PostgreSQL with the master in Norht America and read replicas in Asia and Europe.

    D. Implement Cloud Bigtable with the primary cluster in North America and secondary clusters in Asia and Europe.

  • Question 85:

    You are developing a new deep teaming model that predicts a customer's likelihood to buy on your ecommerce site. Alter running an evaluation of the model against both the original training data and new test data, you find that your model is overfitting the data. You want to improve the accuracy of the model when predicting new data. What should you do?

    A. Increase the size of the training dataset, and increase the number of input features.

    B. Increase the size of the training dataset, and decrease the number of input features.

    C. Reduce the size of the training dataset, and increase the number of input features.

    D. Reduce the size of the training dataset, and decrease the number of input features.

  • Question 86:

    You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so you'd like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so you'd like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.

    What should you do?

    A. Increase the size of your parquet files to ensure them to be 1 GB minimum.

    B. Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

    C. Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

    D. Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

  • Question 87:

    An organization maintains a Google BigQuery dataset that contains tables with user-level datA. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?

    A. Create and share an authorized view that provides the aggregate results.

    B. Create and share a new dataset and view that provides the aggregate results.

    C. Create and share a new dataset and table that contains the aggregate results.

    D. Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.

  • Question 88:

    You are building an application to share financial market data with consumers, who will receive data feeds. Data is collected from the markets in real time. Consumers will receive the data in the following ways:

    1.

    Real-time event stream

    2.

    ANSI SQL access to real-time stream and historical data Batch historical exports

    Which solution should you use?

    A. Cloud Dataflow, Cloud SQL, Cloud Spanner

    B. Cloud Pub/Sub, Cloud Storage, BigQuery

    C. Cloud Dataproc, Cloud Dataflow, BigQuery

    D. Cloud Pub/Sub, Cloud Dataproc, Cloud SQL

  • Question 89:

    You are migrating an application that tracks library books and information about each book, such as author or year published, from an on-premises data warehouse to BigQuery In your current relational database, the author information is kept in a separate table and joined to the book information on a common key Based on Google's recommended practice for schema design, how would you structure the data to ensure optimal speed of queries about the author of each book that has been borrowed?

    A. Keep the schema the same, maintain the different tables for the book and each of the attributes, and query as you are doing today

    B. Create a table that is wide and includes a column for each attribute, including the author's first name, last name, date of birth, etc

    C. Create a table that includes information about the books and authors, but nest the author fields inside the author column

    D. Keep the schema the same, create a view that joins all of the tables, and always query the view

  • Question 90:

    You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel load from the input locations using Google recommended practices. What should you do?

    A. Transform text files to compressed Avro using Cloud Dataflow. Use BigQuery for storage and query.

    B. Transform text files to compressed Avro using Cloud Dataflow. Use Cloud Storage and BigQuery permanent linked tables for query.

    C. Compress text files to gzip using the Grid Computing Tools. Use BigQuery for storage and query.

    D. Compress text files to gzip using the Grid Computing Tools. Use Cloud Storage, and then import into Cloud Bigtable for query.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.