Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 08, 2024

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 31:

    Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply updates from multiple transactional database sources on a daily basis. With BigQuery, your company hopes to improve its handling of CDC so that changes to the source systems are available to query in BigQuery in near-real time using log-based CDC streams, while also optimizing for the performance of applying changes to the data warehouse.

    Which two steps should they take to ensure that changes are available in the BigQuery reporting table with minimal latency while reducing compute overhead? (Choose two.)

    A. Perform a DML INSERT, UPDATE, or DELETE to replicate each individual CDC record in real time directly on the reporting table.

    B. Insert each new CDC record and corresponding operation type to a staging table in real time.

    C. Periodically DELETE outdated records from the reporting table.

    D. Periodically use a DML MERGE to perform several DML INSERT, UPDATE, and DELETE operations at the same time on the reporting table.

    E. Insert each new CDC record and corresponding operation type in real time to the reporting table, and use a materialized view to expose only the newest version of each unique record.

  • Question 32:

    You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which strategy should you choose?

    A. The current epoch time

    B. A concatenation of the product name and the current epoch time

    C. A random universally unique identifier number (version 4 UUID)

    D. The original order identification number from the sales system, which is a monotonically increasing integer

  • Question 33:

    You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?

    A. Create a table in BigQuery, and append the new samples for CPU and memory to the table

    B. Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second

    C. Create a narrow table in Cloud Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second

    D. Create a wide table in Cloud Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.

  • Question 34:

    An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to transact directly via the application.

    They need to manage their shopping transactions and analyze combined data from multiple datasets using a business intelligence (BI) tool. They want to use only a single database for this purpose. Which Google Cloud database should they choose?

    A. BigQuery

    B. Cloud SQL

    C. Cloud BigTable

    D. Cloud Datastore

  • Question 35:

    You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jobs involve executing shell scripts, running Hadoop jobs, and running queries in BigQuery. The jobs are expected to run for many minutes up to several hours. If the steps fail, they must be retried a fixed number of times. Which service should you use to manage the execution of these jobs?

    A. Cloud Scheduler

    B. Cloud Dataflow

    C. Cloud Functions

    D. Cloud Composer

  • Question 36:

    You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?

    A. Create a Cloud SQL instance in one zone, and create a failover replica in another zone within the same region.

    B. Create a Cloud SQL instance in one zone, and create a read replica in another zone within the same region.

    C. Create a Cloud SQL instance in one zone, and configure an external read replica in a zone in a different region.

    D. Create a Cloud SQL instance in a region, and configure automatic backup to a Cloud Storage bucket in the same region.

  • Question 37:

    You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center Because this is a priority for your company, you know that bandwidth will be made available for the initial data load to the cloud.

    The files being transferred are not large in number, but each file is 90 GB Additionally, you want your transactional systems to continually update the warehouse on Google Cloud in real time What tools should you use to migrate the data and ensure that it continues to write to your warehouse?

    A. Storage Transfer Service for the migration, Pub/Sub and Cloud Data Fusion for the real-time updates

    B. BigQuery Data Transfer Service lor the migration, Pub/Sub and Dataproc for the real-time updates

    C. gsutil for the migration; Pub/Sub and Dataflow for the real-time updates

    D. gsutil for both the migration and the real-time updates

  • Question 38:

    You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You want to update the running pipeline with the new version. You want to ensure that no data is lost during the update. What should you do?

    A. Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to the existing job name

    B. Update the Cloud Dataflow pipeline inflight by passing the --update option with the --jobName set to a new unique job name

    C. Stop the Cloud Dataflow pipeline with the Cancel option. Create a new Cloud Dataflow job with the updated code

    D. Stop the Cloud Dataflow pipeline with the Drain option. Create a new Cloud Dataflow job with the updated code

  • Question 39:

    Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requirements are:

    1.

    The ability to seek to a particular offset in a topic, possibly back to the start of all data ever captured

    2.

    Support for publish/subscribe semantics on hundreds of topics

    3.

    Retain per-key ordering

    Which system should you choose?

    A. Apache Kafka

    B. Cloud Storage

    C. Cloud Pub/Sub

    D. Firebase Cloud Messaging

  • Question 40:

    An organization maintains a Google BigQuery dataset that contains tables with user-level datA. They want to expose aggregates of this data to other Google Cloud projects, while still controlling access to the user-level data. Additionally, they need to minimize their overall storage cost and ensure the analysis cost for other projects is assigned to those projects. What should they do?

    A. Create and share an authorized view that provides the aggregate results.

    B. Create and share a new dataset and view that provides the aggregate results.

    C. Create and share a new dataset and table that contains the aggregate results.

    D. Create dataViewer Identity and Access Management (IAM) roles on the dataset to enable sharing.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.