Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 19, 2025

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 91:

    You have a table that contains millions of rows of sales data, partitioned by date Various applications and users query this data many times a minute. The query requires aggregating values by using avg. max. and sum, and does not require joining to other tables. The required aggregations are only computed over the past year of data, though you need to retain full historical data in the base tables You want to ensure that the query results always include the latest data from the tables, while also reducing computation cost, maintenance overhead, and duration. What should you do?

    A. Create a materialized view to aggregate the base table data Configure a partition expiration on the base table to retain only the last one year of partitions.

    B. Create a materialized view to aggregate the base table data include a filter clause to specify the last one year of partitions.

    C. Create a new table that aggregates the base table data include a filter clause to specify the last year of partitions. Set up a scheduled query to recreate the new table every hour.

    D. Create a view to aggregate the base table data Include a filter clause to specify the last year of partitions.

  • Question 92:

    Government regulations in the banking industry mandate the protection of client's personally identifiable information (PII). Your company requires PII to be access controlled encrypted and compliant with major data protection standards In addition to using Cloud Data Loss Prevention (Cloud DIP) you want to follow Google-recommended practices and use service accounts to control access to PII. What should you do?

    A. Assign the required identity and Access Management (IAM) roles to every employee, and create a single service account to access protect resources

    B. Use one service account to access a Cloud SQL database and use separate service accounts for each human user

    C. Use Cloud Storage to comply with major data protection standards. Use one service account shared by all users

    D. Use Cloud Storage to comply with major data protection standards. Use multiple service accounts attached to IAM groups to grant the appropriate access to each group

  • Question 93:

    You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipients' personally identifiable information (PII) to analytics systems, which violates user privacy rules. You want to quickly build a scalable solution using cloud-native managed services to prevent exposure of PII to the analytics systems. What should you do?

    A. Create an authorized view in BigQuery to restrict access to tables with sensitive data.

    B. Install a third-party data validation tool on Compute Engine virtual machines to check the incoming data for sensitive information.

    C. Use Stackdriver logging to analyze the data passed through the total pipeline to identify transactions that may contain sensitive information.

    D. Build a Cloud Function that reads the topics and makes a call to the Cloud Data Loss Prevention API. Use the tagging and confidence levels to either pass or quarantine the data in a bucket for review.

  • Question 94:

    You are building a teal-lime prediction engine that streams files, which may contain Pll (personal identifiable information) data, into Cloud Storage and eventually into BigQuery You want to ensure that the sensitive data is masked but still maintains referential Integrity, because names and emails are often used as join keys How should you use the Cloud Data Loss Prevention API (DLP API) to ensure that the Pll data is not accessible by unauthorized individuals?

    A. Create a pseudonym by replacing the Pll data with cryptogenic tokens, and store the non-tokenized data in a locked-down button.

    B. Redact all Pll data, and store a version of the unredacted data in a locked-down bucket

    C. Scan every table in BigQuery, and mask the data it finds that has Pll

    D. Create a pseudonym by replacing Pll data with a cryptographic format-preserving token

  • Question 95:

    You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of the data appears to be corrupt. You need to modify the Cloud Dataflow pipeline to filter out this corrupt data. What should you do?

    A. Add a SideInput that returns a Boolean if the element is corrupt.

    B. Add a ParDo transform in Cloud Dataflow to discard corrupt elements.

    C. Add a Partition transform in Cloud Dataflow to separate valid data from corrupt data.

    D. Add a GroupByKey transform in Cloud Dataflow to group all of the valid data together and discard the rest.

  • Question 96:

    As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments and target audiences. Each project requires unique access control configurations. The central IT team needs to have access to all projects. Furthermore, data from Cloud Storage buckets and BigQuery datasets must be shared for use in other projects in an ad hoc way. You want to simplify access control management by minimizing the number of policies. Which two steps should you take? Choose 2 answers.

    A. Use Cloud Deployment Manager to automate access provision.

    B. Introduce resource hierarchy to leverage access control policy inheritance.

    C. Create distinct groups for various teams, and specify groups in Cloud IAM policies.

    D. Only use service accounts when sharing data for Cloud Storage buckets and BigQuery datasets.

    E. For each Cloud Storage bucket or BigQuery dataset, decide which projects need access. Find all the active members who have access to these projects, and create a Cloud IAM policy to grant access to all these users.

  • Question 97:

    You need to choose a database for a new project that has the following requirements:

    1.

    Fully managed

    2.

    Able to automatically scale up

    3.

    Transactionally consistent

    4.

    Able to scale up to 6 TB

    5.

    Able to be queried using SQL

    Which database do you choose?

    A. Cloud SQL

    B. Cloud Bigtable

    C. Cloud Spanner

    D. Cloud Datastore

  • Question 98:

    You want to store your team's shared tables in a single dataset to make data easily accessible to various analysts. You want to make this data readable but unmodifiable by analysts. At the same time, you want to provide the analysts with individual workspaces in the same project, where they can create and store tables for their own use, without the tables being accessible by other analysts. What should you do?

    A. Give analysts the BigQuery Data Viewer role at the project level Create one other dataset, and give the analysts the BigQuery Data Editor role on that dataset.

    B. Give analysts the BigQuery Data Viewer role at the project level Create a dataset for each analyst, and give each analyst the BigQuery Data Editor role at the project level.

    C. Give analysts the BigQuery Data Viewer role on the shared dataset. Create a dataset for each analyst, and give each analyst the BigQuery Data Editor role at the dataset level for their assigned dataset

    D. Give analysts the BigQuery Data Viewer role on the shared dataset Create one other dataset and give the analysts the BigQuery Data Editor role on that dataset.

  • Question 99:

    You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thousands of transactions are updated with a new status. The size of the intitial dataset is 1.5 PB, and it will grow by 3 TB per day. The data is heavily structured, and your data science team will build machine learning models based on this data. You want to maximize performance and usability for your data science team. Which two strategies should you adopt? Choose 2 answers.

    A. Denormalize the data as must as possible.

    B. Preserve the structure of the data as much as possible.

    C. Use BigQuery UPDATE to further reduce the size of the dataset.

    D. Develop a data pipeline where status updates are appended to BigQuery instead of updated.

    E. Copy a daily snapshot of transaction data to Cloud Storage and store it as an Avro file.Use BigQuery's support for external data sources to query.

  • Question 100:

    You want to schedule a number of sequential load and transformation jobs Data files will be added to a Cloud Storage bucket by an upstream process There is no fixed schedule for when the new data arrives Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery The transformation jobs are different for every table These jobs might take hours to complete You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

    A. 1Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage. Dataproc. and BigQuery operators 2 Use a single shared DAG for all tables that need to go through the pipeline 3 Schedule the DAG to run hourly

    B. 1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators. 2 Create a separate DAG for each table that needs to go through the pipeline 3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

    C. 1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Cloud Storage, Dataproc. and BigQuery operators 2 Create a separate DAG for each table that needs to go through the pipeline 3 Schedule the DAGs to run hourly

    D. 1 Create an Apache Airflow directed acyclic graph (DAG) in Cloud Composer with sequential tasks by using the Dataproc and BigQuery operators 2 Use a single shared DAG for all tables that need to go through the pipeline. 3 Use a Cloud Storage object trigger to launch a Cloud Function that triggers the DAG

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.