Learn the optimal number of partition ranges for a fact table with 2.4 billion records in an Azure Synapse Analytics dedicated SQL pool to maximize compression and performance of the clustered columnstore index.
Table of Contents
Question
You are designing a partition strategy for a fact table in an Azure Synapse Analytics dedicated SQL pool. The table has the following specifications:
- Contain sales data for 20,000 products.
- Use hash distribution on a column named ProductID.
- Contain 2.4 billion records for the years 2019 and 2020.
Which number of partition ranges provides optimal compression and performance for the clustered columnstore index?
A. 40
B. 240
C. 400
D. 2,400
Answer
B. 240
Explanation
When designing a partition strategy for a large fact table in an Azure Synapse Analytics dedicated SQL pool, it’s important to choose the right number of partition ranges to optimize compression and query performance of the clustered columnstore index.
In this scenario, the fact table contains sales data for 20,000 products over a 2 year period from 2019-2020, with a total of 2.4 billion records. The table uses hash distribution on the ProductID column.
To determine the optimal number of partition ranges, a good rule of thumb is to have at least 1 million rows per partition range and a maximum of 100 partition ranges per table. Since this table has 2.4 billion total rows, dividing that by 1 million rows per partition range results in needing approximately 2,400 partition ranges.
However, having thousands of partitions can negatively impact manageability and query performance. Therefore, the Microsoft recommended best practice is to limit the number of partitions to around 60-240 to strike the right balance between parallelism, compression, and manageability.
Given the table has data for 2 years (2019 and 2020), creating monthly partition ranges would result in 24 total partitions (12 months per year * 2 years). But since the table has a very large number of rows, using a higher number of partitions within the 60-240 range is preferable to maximize parallelism.
Therefore, the optimal number of partition ranges for this fact table scenario that will provide the best compression and performance is 240 (Choice B).
In summary, when partitioning large fact tables in an Azure Synapse Analytics dedicated SQL pool, aim for 1 million+ rows per partition and 60-240 total partitions per table to achieve an ideal balance between query performance, data compression, and overall manageability of the clustered columnstore index.
Microsoft DP-203 certification exam practice question and answer (Q&A) dump with detail explanation and reference available free, helpful to pass the Microsoft DP-203 exam and earn Microsoft DP-203 certification.