Learn how to efficiently provide access to real-time customer order data for data scientists without negatively affecting your ecommerce website’s performance. Discover the best practices for using BigQuery and Datastream to replicate Cloud SQL tables seamlessly.
Table of Contents
Question
You are a developer at a company that operates an ecommerce website. The website stores the customer order data in a Cloud SQL for PostgreSQL database. Data scientists on the marketing team access this data to run their reports. Every time they run these reports, the website’s performance is negatively affected. You want to provide access to up-to-date customer order datasets without affecting your website. What should you do?
A. Configure Cloud Scheduler to run an hourly Cloud Function that exports the data from the Cloud SQL database into CSV format and sends the data to a Cloud Storage bucket.
B. Set up a Bigtable table for the data science team. Configure the application to perform dual writes to both Cloud SQL and Bigtable simultaneously.
C. Set up a BigQuery dataset for the data science team. Configure Datastream to replicate the relevant Cloud SQL tables in BigQuery.
D. Create a clone of the PostgreSQL database instance for the data science team. Schedule a job to create a new clone every 15 minutes.
Answer
C. Set up a BigQuery dataset for the data science team. Configure Datastream to replicate the relevant Cloud SQL tables in BigQuery.
Explanation
The most effective solution to provide access to up-to-date customer order datasets without affecting your ecommerce website’s performance is to set up a BigQuery dataset for the data science team and configure Datastream to replicate the relevant Cloud SQL tables in BigQuery (Option C).
Here’s why this approach is the best:
- BigQuery is a highly scalable and fast data warehouse designed for analytical workloads. It can handle complex queries and large datasets without impacting the performance of your transactional database (Cloud SQL).
- Datastream is a serverless change data capture (CDC) and replication service that allows you to synchronize data from Cloud SQL to BigQuery in real-time. This ensures that the data scientists always have access to the most recent customer order data without any manual intervention.
- By replicating only the relevant tables, you minimize the data transfer and storage costs while ensuring that the data scientists have access to the necessary information for their reports.
- BigQuery’s separation from the transactional database ensures that the data scientists’ queries do not compete for resources with your website’s database, eliminating the performance impact on your ecommerce site.
The other options have the following drawbacks:
- Option A (exporting data to CSV format hourly) does not provide real-time data and may not scale well for large datasets.
- Option B (dual writes to Cloud SQL and Bigtable) increases the complexity of your application and may lead to data inconsistencies if not implemented correctly.
- Option D (creating database clones every 15 minutes) is resource-intensive and does not provide real-time data. It may also lead to increased costs due to frequent cloning operations.
By leveraging BigQuery and Datastream, you can efficiently provide access to up-to-date customer order data for your data science team while ensuring that your ecommerce website’s performance remains unaffected.
Google Professional Cloud Developer certification exam assessment practice question and answer (Q&A) dump including multiple choice questions (MCQ) and objective type questions, with detail explanation and reference available free, helpful to pass the Google Professional Cloud Developer exam and earn Google Professional Cloud Developer certification.