Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 08, 2024

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 261:

    You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update. What should you do?

    A. Update the current pipeline and use the drain flag.

    B. Update the current pipeline and provide the transform mapping JSON object.

    C. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.

    D. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

  • Question 262:

    Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks. What should you do?

    A. Run a local version of Jupiter on the laptop.

    B. Grant the user access to Google Cloud Shell.

    C. Host a visualization tool on a VM on Google Compute Engine.

    D. Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

  • Question 263:

    You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time. What should you do?

    A. Send the data to Google Cloud Datastore and then export to BigQuery.

    B. Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.

    C. Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google Cloud Dataproc whenever analysis is required.

    D. Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

  • Question 264:

    Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?

    A. Create a Google Cloud Dataflow job to process the data.

    B. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.

    C. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.

    D. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.

    E. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

  • Question 265:

    Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other's data. You want to ensure appropriate access to the data. Which three steps should you take? (Choose three.)

    A. Load data into different partitions.

    B. Load data into a different dataset for each client.

    C. Put each client's BigQuery dataset into a different table.

    D. Restrict a client's dataset to approved users.

    E. Only allow a service account to access the datasets.

    F. Use the appropriate identity and access management (IAM) roles for each client's users.

  • Question 266:

    Your company is using WHILECARD tables to query data across multiple tables with similar names. The SQL statement is currently failing with the following error: # Syntax error : Expected end of statement but got "-" at [4:11] SELECT age FROM bigquery-public-data.noaa_gsod.gsod WHERE age != 99 AND_TABLE_SUFFIX = `1929' ORDER BY age DESC Which table name will make the SQL statement work correctly?

    A. `bigquery-public-data.noaa_gsod.gsod`

    B. bigquery-public-data.noaa_gsod.gsod*

    C. `bigquery-public-data.noaa_gsod.gsod'*

    D. `bigquery-public-data.noaa_gsod.gsod*`

  • Question 267:

    You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling. Which Google database service should you use?

    A. Cloud SQL

    B. BigQuery

    C. Cloud Bigtable

    D. Cloud Datastore

  • Question 268:

    You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)

    A. There are very few occurrences of mutations relative to normal samples.

    B. There are roughly equal occurrences of both normal and mutated samples in the database.

    C. You expect future mutations to have different features from the mutated samples in the database.

    D. You expect future mutations to have similar features to the mutated samples in the database.

    E. You already have labels for which samples are mutated and which are normal in the database.

  • Question 269:

    You designed a database for patient records as a pilot project to cover a few hundred patients in three clinics. Your design used a single database table to represent all patients and their visits, and you used self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the project has expanded. The database must now store 100 times more patient records. You can no longer run the reports, because they either take too long or they encounter errors with insufficient compute resources. How should you adjust the database design?

    A. Add capacity (memory and disk space) to the database server by the order of 200.

    B. Shard the tables into smaller ones based on date ranges, and only generate reports with prespecified date ranges.

    C. Normalize the master patient-record table into the patient table and the visits table, and create other necessary tables to avoid self-join.

    D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table pairs, and use unions for consolidated reports.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.