Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 08, 2024

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 11:

    A live TV show asks viewers to cast votes using their mobile phones. The event generates a large volume of data during a 3 minute period. You are in charge of the Voting restructure* and must ensure that the platform can handle the load and Hal all votes are processed. You must display partial results write voting is open. After voting doses you need to count the votes exactly once white optimizing cost. What should you do?

    A. Create a Memorystore instance with a high availability (HA) configuration

    B. Write votes to a Pub Sub tope and have Cloud Functions subscribe to it and write voles to BigQuery

    C. Write votes to a Pub/Sub tope and toad into both Bigtable and BigQuery via a Dataflow pipeline Query Bigtable for real-time results and BigQuery for later analysis Shutdown the Bigtable instance when voting concludes

    D. Create a Cloud SQL for PostgreSQL database with high availability (HA) configuration and multiple read replicas

  • Question 12:

    MJTelco's Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations.

    You want to allow Cloud Dataflow to scale its compute power up as required.

    Which Cloud Dataflow pipeline configuration setting should you update?

    A. The zone

    B. The number of workers

    C. The disk size per worker

    D. The maximum number of workers

  • Question 13:

    You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery. Your access pattern is based on recent data tittered by location_id and device_version with the following query:

    You want to optimize your queries for cost and performance. How should you structure your data?

    A. Partition table data by create_date, location_id and device_version

    B. Partition table data by create_date cluster table data by tocation_id and device_version

    C. Cluster table data by create_date location_id and device_version

    D. Cluster table data by create_date, partition by location and device_version

  • Question 14:

    An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming

    as few resources as possible.

    What should you do?

    A. Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

    B. Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

    C. Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

    D. Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

  • Question 15:

    You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component in order to train and serve the model your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

    A. Create a new view with BigQuery that does not include a column with city information.

    B. Use SQL in BigQuery to transform the state column using a one-hot encoding method, and make each city a column with binary values.

    C. Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file and upload that as part of your model to BigQuery ML.

    D. Use Cloud Data Fusion to assign each city to a region that is labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.

  • Question 16:

    You are migrating a table to BigQuery and are deeding on the data model. Your table stores information related to purchases made across several store locations and includes information like the time of the transaction, items purchased, the store ID and the city and state in which the store is located You frequently query this table to see how many of each item were sold over the past 30 days and to look at purchasing trends by state city and individual store. You want to model this table to minimize query time and cost. What should you do?

    A. Partition by transaction time; cluster by state first, then city then store ID

    B. Partition by transaction tome cluster by store ID first, then city, then stale

    C. Top-level cluster by stale first, then city then store

    D. Top-level cluster by store ID first, then city then state.

  • Question 17:

    You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to aggregate events across hourly intervals before loading the results to BigQuery for analysis. Your solution must be scalable so it can process and load large volumes of events to BigQuery. What should you do?

    A. Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.

    B. Schedule a Cloud Function to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.

    C. Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.

    D. Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.

  • Question 18:

    Your company currently runs a large on-premises cluster using Spark Hive and Hadoop Distributed File System (HDFS) in a colocation facility. The duster is designed to support peak usage on the system, however, many jobs are batch n nature, and usage of the cluster fluctuates quite dramatically.

    Your company is eager to move to the cloud to reduce the overhead associated with on- premises infrastructure and maintenance and to benefit from the cost savings. They are also hoping to modernize their existing infrastructure to use more servers offerings m order to take advantage of the cloud Because of the tuning of their contract renewal with the colocation facility they have only 2 months for their initial migration

    How should you recommend they approach thee upcoming migration strategy so they can maximize their cost savings in the cloud will still executing the migration in time?

    A. Migrate the workloads to Dataproc plus HOPS, modernize later

    B. Migrate the workloads to Dataproc plus Cloud Storage modernize later

    C. Migrate the Spark workload to Dataproc plus HDFS, and modernize the Hive workload for BigQuery

    D. Modernize the Spark workload for Dataflow and the Hive workload for BigQuery

  • Question 19:

    Government regulations in the banking industry mandate the protection of client's personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII.

    What should you do?

    A. Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources

    B. Use one service account to access a Cloud SQL database and use separate service accounts for each human user

    C. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users

    D. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group

  • Question 20:

    You are updating the code for a subscriber to a Put/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. You subscriber is not set up to retain acknowledged messages. What should you do to ensure that you can recover from errors after deployment?

    A. Use Cloud Build for your deployment if an error occurs after deployment, use a Seek operation to locate a tmestamp logged by Cloud Build at the start of the deployment

    B. Create a Pub/Sub snapshot before deploying new subscriber code. Use a Seek operation to re-deliver messages that became available after the snapshot was created

    C. Set up the Pub/Sub emulator on your local machine Validate the behavior of your new subscriber togs before deploying it to production

    D. Enable dead-lettering on the Pub/Sub topic to capture messages that aren't successful acknowledged if an error occurs after deployment, re-deliver any messages captured by the dead-letter queue

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.