Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 281:

    You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and replication status from MariaDB with minimal development effort and use StackDriver for dashboards and alerts.

    What should you do?

    A. Install the OpenCensus Agent and create a custom metric collection application with a StackDriver exporter.

    B. Place the MariaDB instances in an Instance Group with a Health Check.

    C. Install the StackDriver Logging Agent and configure fluentd in_tail plugin to read MariaDB logs.

    D. Install the StackDriver Agent and configure the MySQL plugin.

  • Question 282:

    Government regulations in your industry mandate that you have to maintain an auditable record of access to certain types of datA. Assuming that all expiring logs will be archived correctly, where should you store data that is subject to that mandate?

    A. Encrypted on Cloud Storage with user-supplied encryption keys. A separate decryption key will be given to each authorized user.

    B. In a BigQuery dataset that is viewable only by authorized personnel, with the Data Access log used to provide the auditability.

    C. In Cloud SQL, with separate database user names to each user. The Cloud SQL Admin activity logs will be used to provide the auditability.

    D. In a bucket on Cloud Storage that is accessible only by an AppEngine service that collects user information and logs the access before providing a link to the bucket.

  • Question 283:

    You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component in order to train and serve the model your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?

    A. Use SQL in BigQuery to transform the stale column using a one-hot encoding method, and make each city a column with binary values.

    B. Create a new view with BigQuery that does not include a column which city information.

    C. Cloud Data Fusion to assign each city to a region that is labeled as 1, 2 3, 4, or 5, and then use that number to represent the city in the model.

    D. Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file and upload that as part of your model to BigQuery ML.

  • Question 284:

    You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do?

    A. Deploy the Cloud SQL Proxy on the Cloud Dataproc master

    B. Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet

    C. Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter

    D. Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role

  • Question 285:

    A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictions for an individual user ID with latency under 100 milliseconds. You use the following query to generate predictions: SELECT predicted_label, user_id FROM ML.PREDICT (MODEL `dataset.model', table user_features). How should you create the ML pipeline?

    A. Add a WHERE clause to the query, and grant the BigQuery Data Viewer role to the application service account.

    B. Create an Authorized View with the provided query. Share the dataset that contains the view with the application service account.

    C. Create a Cloud Dataflow pipeline using BigQueryIO to read results from the query. Grant the Dataflow Worker role to the application service account.

    D. Create a Cloud Dataflow pipeline using BigQueryIO to read predictions for all users from the query. Write the results to Cloud Bigtable using BigtableIO. Grant the Bigtable Reader role to the application service account so that the application can read predictions for individual users from Cloud Bigtable.

  • Question 286:

    Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you do?

    A. Create a Stackdriver Monitoring dashboard based on the BigQuery metric query/scanned_bytes

    B. Create a Stackdriver Monitoring dashboard based on the BigQuery metric slots/allocated_for_project

    C. Create a log export for each project, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric

    D. Create an aggregated log export at the organization level, capture the BigQuery job execution logs, create a custom metric based on the totalSlotMs, and create a Stackdriver Monitoring dashboard based on the custom metric

  • Question 287:

    You want to optimize your queries for cost and performance. How should you structure your data?

    A. Partition table data by create_date, location_id and device_version

    B. Partition table data by create_date cluster table data by location_Id and device_version

    C. Cluster table data by create_date location_id and device_version

    D. Cluster table data by create_date partition by locationed and device_version

  • Question 288:

    You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.

    You have the following requirements:

    1.

    You will batch-load the posts once per day and run them through the Cloud Natural Language API.

    2.

    You will extract topics and sentiment from the posts.

    3.

    You must store the raw posts for archiving and reprocessing.

    4.

    You will create dashboards to be shared with people both inside and outside your organization.

    You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving. What should you do?

    A. Store the social media posts and the data extracted from the API in BigQuery.

    B. Store the social media posts and the data extracted from the API in Cloud SQL.

    C. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.

    D. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.

  • Question 289:

    An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming

    as few resources as possible.

    What should you do?

    A. Use a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

    B. Write a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

    C. Use Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

    D. Use an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

  • Question 290:

    You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once, and must be ordered within windows of 1 hour. How should you design the solution?

    A. Use Apache Kafka for message ingestion and use Cloud Dataproc for streaming analysis.

    B. Use Apache Kafka for message ingestion and use Cloud Dataflow for streaming analysis.

    C. Use Cloud Pub/Sub for message ingestion and Cloud Dataproc for streaming analysis.

    D. Use Cloud Pub/Sub for message ingestion and Cloud Dataflow for streaming analysis.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.