Components of a Star Schema
-
Fact Table:
- The central table in the star schema.
- Contains quantitative data (measures or metrics) such as sales, revenue, or quantity.
- Each row in the fact table represents a specific event or transaction.
- Connected to dimension tables via foreign keys.
- A fact table for a retail store might store sales transactions:
Fact_Sales
:Transaction_ID
,Date_ID
,Product_ID
,Customer_ID
,Store_ID
,Quantity_Sold
,Total_Amount
.
-
Dimension Tables:
- Surround the fact table like the points of a star.
- Contain descriptive attributes (context or metadata) related to the facts.
- Used to filter, group, or label the data in the fact table.
Dim_Date
:Date_ID
,Date
,Month
,Quarter
,Year
,Day_of_Week
.Dim_Product
:Product_ID
,Product_Name
,Category
,Brand
,Price
.Dim_Customer
:Customer_ID
,Customer_Name
,City
,State
,Phone_Number
.Dim_Store
:Store_ID
,Store_Name
,City
,Manager_Name
.
Example: Star Schema for a Retail Store
Star Schema
Fact Table
Fact_Sales
Transaction_ID | Date_ID | Product_ID | Customer_ID | Store_ID | Quantity_Sold | Total_Amount |
---|---|---|---|---|---|---|
101 | 20250101 | 1 | 1001 | 501 | 2 | 5000 |
102 | 20250102 | 2 | 1002 | 502 | 1 | 1500 |
103 | 20250103 | 3 | 1003 | 503 | 5 | 1000 |
Dimension Tables
-
Dim_Date
:Date_ID Date Month Quarter Year Day_of_Week 20250101 2025-01-01 January Q1 2025 Wednesday 20250102 2025-01-02 January Q1 2025 Thursday 20250103 2025-01-03 January Q1 2025 Friday -
Dim_Product
:Product_ID Product_Name Category Brand Price 1 Saree Sarees Nalli 2500 2 Kurta-Pajama Ethnic Wear Fashion India 1500 3 Turmeric Powder Spices ABC 200 -
Dim_Customer
:Customer_ID Customer_Name City State Phone_Number 1001 Priya Sharma Mumbai Maharashtra 9876543210 1002 Rajesh Patel Ahmedabad Gujarat 8765432109 1003 Anjali Singh Delhi Delhi 7654321098 -
Dim_Store
:Store_ID Store_Name City Manager_Name 501 Nalli Silks Chennai Ramesh Kumar 502 Fashion India Bengaluru Sunita Reddy 503 ABC Spices Delhi Amit Sharma
How the Star Schema Works
-
Querying Data:
- Suppose you want to find the total sales of sarees in Mumbai for January 2025.
- The query would join the
Fact_Sales
table with theDim_Date
,Dim_Product
, andDim_Customer
tables using their respective keys. - Example SQL Query:
-
Benefits:
- Simplified Queries: The star schema makes it easy to write and understand queries.
- Improved Performance: Optimized for read-heavy analytical workloads.
- Scalability: Can handle large datasets efficiently.
Advantages of Star Schema
- Query Performance: Fewer joins are required compared to normalized schemas, leading to faster queries.
- Simplicity: Easy to design and understand, even for non-technical users.
- Flexibility: New dimensions can be added without disrupting existing queries.
- Business-Friendly: Aligns well with how business users think about data (e.g., sales, customers, products).
Disadvantages of Star Schema
- Data Redundancy: Dimension tables may contain redundant data (e.g., repeating city names in
Dim_Customer
). - Limited Flexibility for Complex Relationships: Not suitable for scenarios requiring many-to-many relationships between dimensions.
- Storage Overhead: Large dimension tables can consume significant storage space.