Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 211:

    You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to receive notifications for other tables. What should you do?

    A. Make a call to the Stackdriver API to list all logs, and apply an advanced filter.

    B. In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.

    C. In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.

    D. Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

  • Question 212:

    Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

    A. Use a row key of the form .

    B. Use a row key of the form .

    C. Use a row key of the form #.

    D. Use a row key of the form >##.

  • Question 213:

    You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:

    No interaction by the user on the site for 1 hour Has added more than $30 worth of products to the basket Has not completed a transaction

    You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

    A. Use a fixed-time window with a duration of 60 minutes.

    B. Use a sliding time window with a duration of 60 minutes.

    C. Use a session window with a gap time duration of 60 minutes.

    D. Use a global window with a time based trigger with a delay of 60 minutes.

  • Question 214:

    Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

    The data scientists have written the following code to read the data for a new key features in the logs.

    BigQueryIO.Read

    .named("ReadLogData")

    .from("clouddataflow-readonly:samples.log_data")

    You want to improve the performance of this data read. What should you do?

    A. Specify the TableReference object in the code.

    B. Use .fromQuery operation to read specific fields from the table.

    C. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.

    D. Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

  • Question 215:

    Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can

    be applied to the data. Which three machine learning applications can you use? (Choose three.)

    A. Supervised learning to determine which transactions are most likely to be fraudulent.

    B. Unsupervised learning to determine which transactions are most likely to be fraudulent.

    C. Clustering to divide the transactions into N categories based on feature similarity.

    D. Supervised learning to predict the location of a transaction.

    E. Reinforcement learning to predict the location of a transaction.

    F. Unsupervised learning to predict the location of a transaction.

  • Question 216:

    Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other's data. You want to ensure appropriate access to the data. Which three steps should you take? (Choose three.)

    A. Load data into different partitions.

    B. Load data into a different dataset for each client.

    C. Put each client's BigQuery dataset into a different table.

    D. Restrict a client's dataset to approved users.

    E. Only allow a service account to access the datasets.

    F. Use the appropriate identity and access management (IAM) roles for each client's users.

  • Question 217:

    Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully. What should you do next?

    A. Check the dashboard application to see if it is not displaying correctly.

    B. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.

    C. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.

    D. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

  • Question 218:

    Your company's on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50 TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block storage. You want to minimize the storage cost of the migration. What should you do?

    A. Put the data into Google Cloud Storage.

    B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.

    C. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.

    D. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

  • Question 219:

    An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

    A. Use federated data sources, and check data in the SQL query.

    B. Enable BigQuery monitoring in Google Stackdriver and create an alert.

    C. Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.

    D. Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

  • Question 220:

    Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

    A. Redefine the schema by evenly distributing reads and writes across the row space of the table.

    B. The performance issue should be resolved over time as the site of the BigDate cluster is increased.

    C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.

    D. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.