Table of Contents
Question
The Data Engineering team at a large manufacturing company needs to engineer data coming from many sources to support a wide variety of use cases and data consumer requirements which include:
- Finance and Vendor Management team members who require reporting and visualization.
- Data Science team members who require access to raw data for ML model development.
- Sales team members who require engineered and protected data for data monetization.
What Snowflake data modeling approaches will meet these requirements? (Choose two.)
A. Consolidate data in the company’s data lake and use EXTERNAL TABLES.
B. Create a raw database for landing and persisting raw data entering the data pipelines.
C. Create a set of profile-specific databases that aligns data with usage patterns.
D. Create a single star schema in a single database to support all consumers’ requirements.
E. Create a Data Vault as the sole data pipeline endpoint and have all consumers directly access the Vault.
Correct Answer
B. Create a raw database for landing and persisting raw data entering the data pipelines.
C. Create a set of profile-specific databases that aligns data with usage patterns.
Explanation
According to the Snowflake website, data modeling is the process of organizing and mapping data using simplified diagrams, symbols, and text to represent data associations and flow. There are four primary approaches to data modeling: hierarchical, relational, entity-relationship, and graph. There are also three primary types of data models: conceptual, logical, and physical.
Based on the scenario you provided, I think the two Snowflake data modeling approaches that will meet the requirements are:
- B. Create a raw database for landing and persisting raw data entering the data pipelines.
- C. Create a set of profile-specific databases that aligns data with usage patterns.
Here is my reasoning:
- Option B is a good choice because it allows the Data Engineering team to store the raw data coming from many sources in a single database, without any transformation or manipulation. This way, the Data Science team members can access the raw data for ML model development, and the Data Engineering team can also use the raw data as the source for further processing and engineering.
- Option C is also a good choice because it allows the Data Engineering team to create different databases for different use cases and data consumer requirements. For example, they can create a database with reporting and visualization tables for the Finance and Vendor Management team members, a database with engineered and protected data for the Sales team members, and so on. This way, each database can be optimized for its specific purpose and audience, and the data consumers can access the data they need easily and efficiently.
The other options are not as suitable because:
- Option A is not a good choice because it relies on external tables, which are pointers to files stored outside of Snowflake in a cloud storage service or Hadoop cluster. External tables are useful for querying semi-structured or unstructured data without loading it into Snowflake, but they have some limitations and drawbacks compared to internal tables¹. For example, external tables do not support DML operations (such as INSERT, UPDATE, DELETE), clustering, time travel, fail-safe, or secure views. External tables also depend on the availability and performance of the external storage service or cluster, which may not be reliable or consistent. Therefore, external tables are not suitable for engineering data to support a wide variety of use cases and data consumer requirements.
- Option D is not a good choice because it suggests creating a single star schema in a single database to support all consumers’ requirements. A star schema is a type of relational data model that consists of one or more fact tables referencing any number of dimension tables¹. A star schema is useful for simplifying queries and improving query performance, but it has some disadvantages compared to other data models¹. For example, a star schema may require denormalizing or duplicating data, which can increase storage costs and introduce inconsistencies or errors. A star schema may also not be able to capture complex relationships or hierarchies among data elements, which can limit its flexibility and scalability. Therefore, a star schema is not suitable for supporting multiple data modeling approaches or diverse data consumer requirements.
- Option E is not a good choice because it suggests creating a Data Vault as the sole data pipeline endpoint and having all consumers directly access the Vault. A Data Vault is a type of hybrid data model that combines aspects of 3rd normal form (3NF) and star schema². A Data Vault consists of three main types of tables: hubs (unique list of business keys), links (unique list of associations/transactions), and satellites (descriptive data for hubs and links)². A Data Vault is useful for tracking historical changes, ensuring auditability, and enabling parallel loading², but it has some challenges compared to other data models². For example, a Data Vault may require complex ETL processes to load and extract data from multiple sources and targets, which can increase development time and maintenance costs. A Data Vault may also not be easy to understand or query by non-technical users, which can reduce usability and accessibility. Therefore, a Data Vault is not suitable for being the sole data pipeline endpoint or having all consumers directly access it.
Reference
- Data Engineering Guide | Snowflake
- What is Data Modeling? | Snowflake
- Top 14 Snowflake Best Practices for Data Engineers — Analytics.Today
- Support Multiple Data Modeling Approaches with Snowflake
- Data Warehouse Design | Snowflake Guides | Snowflake
Snowflake SnowPro Advanced Architect certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Snowflake SnowPro Advanced Architect exam and earn Snowflake SnowPro Advanced Architect certification.