I was recently asked by a colleague about my opinions on unbalanced (sometimes called ragged) versus balanced hierarchies in data modelling. Other people have described the differences between unbalanced and balanced hierarchies before me so instead of me trying to explain them I'll point you here instead for an explanation. Here are the opinions that I proffered back to my colleague:
I’m 50-50 on ragged hierarchies. In some ways storing in parent-child fashion (which is how people typically store ragged hierarchies) is flexible because…well…because they’re ragged. However if you look at it from another direction they’re not flexible at all. Look at the following "balanced" structure:
| Product | Category | Colour |
| Apple | Fruit | Green |
| Banana | Fruit | Yellow |
| Custard | Dessert | Yellow |
In an OLAP or MDM product you can then build many hierarchies from that:
Category-->Product
Colour-->Product
Category-->Colour-->Product
Colour-->Category-->Product
Product-->Category-->Colour (a daft one, but there is actually nothing to stop us from doing this)
Thus – much more flexible.
It’s a daft example but I think you get the idea.
Headline is "Parent-child allows us to store any hierarchy for a dimension, but we can only store one. A list of attributes enables us to build any number of hierarchies in one dimension, as long as they are balanced." So, you have a choice to make.
I'm interested to know what other people's approaches to this problem are? Do you prefer to model your hierarchies as balanced or unbalanced? I'm personally of the opinion that by-and-large the business logic makes the decision for you - but then the world isn't always that simple is it?
-Jamie