Exam Details

  • Exam Code
    :PROFESSIONAL-DATA-ENGINEER
  • Exam Name
    :Professional Data Engineer on Google Cloud Platform
  • Certification
    :Google Certifications
  • Vendor
    :Google
  • Total Questions
    :331 Q&As
  • Last Updated
    :May 08, 2024

Google Google Certifications PROFESSIONAL-DATA-ENGINEER Questions & Answers

  • Question 251:

    Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

    A. Redefine the schema by evenly distributing reads and writes across the row space of the table.

    B. The performance issue should be resolved over time as the site of the BigDate cluster is increased.

    C. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.

    D. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

  • Question 252:

    You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?

    A. Grant the consultant the Viewer role on the project.

    B. Grant the consultant the Cloud Dataflow Developer role on the project.

    C. Create a service account and allow the consultant to log on with it.

    D. Create an anonymized sample of the data for the consultant to work with in a different project.

  • Question 253:

    You are designing a basket abandonment system for an ecommerce company. The system will send a message to a user based on these rules:

    1.

    No interaction by the user on the site for 1 hour

    2.

    Has added more than $30 worth of products to the basket

    3.

    Has not completed a transaction

    You use Google Cloud Dataflow to process the data and decide if a message should be sent. How should you design the pipeline?

    A. Use a fixed-time window with a duration of 60 minutes.

    B. Use a sliding time window with a duration of 60 minutes.

    C. Use a session window with a gap time duration of 60 minutes.

    D. Use a global window with a time based trigger with a delay of 60 minutes.

  • Question 254:

    You want to use Google Stackdriver Logging to monitor Google BigQuery usage. You need an instant notification to be sent to your monitoring tool when new data is appended to a certain table using an insert job, but you do not want to

    receive notifications for other tables.

    What should you do?

    A. Make a call to the Stackdriver API to list all logs, and apply an advanced filter.

    B. In the Stackdriver logging admin interface, and enable a log sink export to BigQuery.

    C. In the Stackdriver logging admin interface, enable a log sink export to Google Cloud Pub/Sub, and subscribe to the topic from your monitoring tool.

    D. Using the Stackdriver API, create a project sink with advanced log filter to export to Pub/Sub, and subscribe to the topic from your monitoring tool.

  • Question 255:

    You create an important report for your large team in Google Data Studio 360. The report uses Google BigQuery as its data source. You notice that visualizations are not showing data that is less than 1 hour old. What should you do?

    A. Disable caching by editing the report settings.

    B. Disable caching in BigQuery by editing table details.

    C. Refresh your browser tab showing the visualizations.

    D. Clear your browser history for the past hour then reload the tab showing the virtualizations.

  • Question 256:

    Your company built a TensorFlow neural-network model with a large number of neurons and layers. The model fits well for the training data. However, when tested against new data, it performs poorly. What method can you employ to address this?

    A. Threading

    B. Serialization

    C. Dropout Methods

    D. Dimensionality Reduction

  • Question 257:

    You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

    A. Continuously retrain the model on just the new data.

    B. Continuously retrain the model on a combination of existing data and the new data.

    C. Train on the existing data while using the new data as your test set.

    D. Train on the new data while using the existing data as your test set.

  • Question 258:

    You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy. What can you do?

    A. Eliminate features that are highly correlated to the output labels.

    B. Combine highly co-dependent features into one representative feature.

    C. Instead of feeding in each feature individually, average their values in batches of 3.

    D. Remove the features that have null values for more than 50% of the training records.

  • Question 259:

    Your startup has never implemented a formal security policy. Currently, everyone in the company has access to the datasets stored in Google BigQuery. Teams have freedom to use the service as they see fit, and they have not documented their use cases. You have been asked to secure the data warehouse. You need to discover what everyone is doing. What should you do first?

    A. Use Google Stackdriver Audit Logs to review data access.

    B. Get the identity and access management IIAM) policy of each table

    C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.

    D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

  • Question 260:

    You work for a car manufacturer and have set up a data pipeline using Google Cloud Pub/Sub to capture anomalous sensor events. You are using a push subscription in Cloud Pub/Sub that calls a custom HTTPS endpoint that you have created to take action of these anomalous events as they occur. Your custom HTTPS endpoint keeps getting an inordinate amount of duplicate messages. What is the most likely cause of these duplicate messages?

    A. The message body for the sensor event is too large.

    B. Your custom endpoint has an out-of-date SSL certificate.

    C. The Cloud Pub/Sub topic has too many messages published to it.

    D. Your custom endpoint is not acknowledging messages within the acknowledgement deadline.

Tips on How to Prepare for the Exams

Nowadays, the certification exams become more and more important and required by more and more enterprises when applying for a job. But how to prepare for the exam effectively? How to prepare for the exam in a short time with less efforts? How to get a ideal result and how to find the most reliable resources? Here on Vcedump.com, you will find all the answers. Vcedump.com provide not only Google exam questions, answers and explanations but also complete assistance on your exam preparation and certification application. If you are confused on your PROFESSIONAL-DATA-ENGINEER exam preparations and Google certification application, do not hesitate to visit our Vcedump.com to find your solutions here.