Creating a hierarchy in a Power Pivot or Tabular data model is very easy, but you might experience performance issues if the hierarchy is not “natural”. This article explains how to create hierarchies avoiding these problems.
Natural and Unnatural Hierarchies
Consider the Date dimension in Adventure Works. We have a table with the following columns:
- Year
- Semester (2 values)
- Quarter (4 values)
- Month (12 values)
In order to facilitate navigation, you can create a hierarchy with these levels (maybe adding the Date as a fifth level). However, the same month appears in different branches of the hierarchy (one per year). For example, January is the same string for both 2005 and 2006 in the following screenshot.
This is an example of an unnatural hierarchy. When a value of the level of the hierarchy (e.g. January) is not unique for the entire level and is present in more than one branch of the hierarchy, then the identification of that member requires the knowledge of one or more parents of the same hierarchy. In other words, you have to describe the entire path (2005-1-1-January or 2006-1-1-January) in order to identify a single month.
You can define a natural hierarchy by using attribute that have unique values for the entire level of the hierarchy, without requiring the entire path to identify a member of the hierarchy in a correct way. For example, instead of using January, you can use January 2005, January 2006 and so on. This requires a higher number of distinct values in the Month column, because you will have 12 values for each year.
You can create these attributes using calculated columns:
[YearSemester] = "H" & DimDate[Semester] & " " & DimDate[Year] [YearQuarter] = "Q" & DimDate[Quarter] & " " & DimDate[Year] [YearMonth] = DimDate[Month] & " " & DimDate[Year]
You can see in the following screenshot the natural and unnatural hierarchies defined in the data model.
Navigating the natural hierarchy, you can see that each month, each quarter and each semester has a unique name across the hierarchy.
Using a natural hierarchy has many advantages:
- Readability: even if you do not display the Year attribute, you can understand which data you are seeing.
- Charting: you can easily create a line chart in Power View using the month granularity. When you only have 12 months, it is hard to combine year and month in order to obtain a single continuous line for more than one year.
- Performance: the engine automatically identify a natural hierarchy and this improves performance of many MDX queries that access the hierarchy. This is very important in an Excel PivotTable, which send MDX queries to the engine.
Hierarchies Performance in MDX
When you use MDX to query a Tabular or Power Pivot model, you might experience bad performance using unnatural hierarchy. Any PivotTable in Excel generates queries in MDX, so this optimization is very important also if you are using a relatively small data model in Power Pivot.
For example, you can import a few tables from Adventure Works DW: DimDate, DimProduct, DimCustomer, DimGeography, and FactInternetSales.
You create a first PivotTable based on this model, putting the Sum of SalesAmount as a measure and the following attributes on the rows of the PivotTable:
- DimDate – Year (filtered by 2007)
- DimDate – Semester (filtered by 1)
- DimDate – Quarter (filtered by 1)
- DimDate – Month (filtered by March)
- DimCustomer – LastName (filtered by Beck)
- DimProduct – ProductName (no filter)
- DimGeography – City (no filter)
You can see the result in the following screenshot. The performance are very good and you should not wait more than one second for every refresh.
Now, let’s try to do the same using the Unnatural hierarchy created in the DimDate table.
- DimDate – Unnatural (filter 2007-1-1-March)
- Explode levels Year – Semester – Quarter – Month
- DimCustomer – LastName (filtered by Beck)
- DimProduct – ProductName (no filter)
- DimGeography – City (no filter)
In reality, you will wait too much and you will probably get an out of memory error as soon as you add the City attribute from DimGeography. This is what you should see just before adding the DimGeography-City attribute to the rows of the PivotTable.
The reason for this behavior is that the MDX query generated by Excel requires a heavy computation that consumes a lot of memory. Maybe Microsoft will improve this in the future, but it is a best practice presenting natural hierarchies to MDX, so this is important for Excel PivotTables. You can repeat the same test using the Natural hierarchy:
- DimDate – Natural (filter 2007-H1 2007-Q1 2007-March 2007)
- Explode levels Year – Semester – Quarter – Month
- DimCustomer – LastName (filtered by Beck)
- DimProduct – ProductName (no filter)
- DimGeography – City (no filter)
In this case, the performance are very good, response time should be under one second without memory consumption for executing the query.
Always Define Natural Hierarchies
You should always define natural hierarchies in a Power Pivot or Analysis Services Tabular data model. Any unnatural hierarchy might generate bad performance in MDX queries, even with relatively small amount of data.
The data model and the Excel PivotTables shown in this article are included in the sample workbook you can download.