When you design a data model for Tabular you should pay attention to a though topic, which is that of circular dependencies in formulas. It is very important to learn how to handle circular dependencies now because in SQL 2012 (and in PowerPivot 2012) there will be a stronger validation of circular dependencies. Some of the checks described in this article have been introduced with the release RC0 of SQL Server 2012 and were not present before. Thus, you might author formulas in previous versions of SQL 2012 (or PowerPivot) and end up with a non-working solution when the final bits will be available, due to these additional (and perfectly understandable) constraints.
Before to start speaking about circular dependencies, it is worth to introduce simple, linear dependencies. Let us look at an example with the following calculated column on the classical AdventureWorks DimProduct table
DimProduct[Profit] := DimProduct[ListPrice] - DimProduct[StandardCost]
The formula depends on two columns. In such a case, we say that the column Profit depends from ListPrice and StandardCost. You might then create a new column, like ProfitPct with the formula:
DimProduct[ProfitPct] := DimProduct[Profit] / DimProduct[ListPrice]
Now, ProfitPct depends from Profit and ListPrice. Thus, when Analysis Services will need to compute the calculated columns, it knows that ProfitPct will need to be computed only after Profit has been calculated and stored. Otherwise, it will not be able to recover a valid value for the formula.
Linear dependency is not something you should normally worry about. It is used internally by the SSAS engine to detect the correct order of computation of calculated columns during the processing of the database. On a normal Tabular data model, with many calculated columns, the dependency of calculations turns into a complex graph which, again, SSAS handles gracefully.
Circular dependency, on the other hand, is a situation that happens when a loop appears in this graph. For example, a clear situation where circular dependency appears is if you try to modify the definition of Profit to this formula:
DimProduct[Profit] := DimProduct[ProfitPct] * DimProduct[StandardCost]
Because ProfitPct depends on Profit and, in this new formula, Profit depends on ProfitPct, SSDT refuses to modify the formula and shows the error “A circular dependency was detected”
All this describes what circular dependencies are from the point of view of columns, i.e. you have detected the existence of a dependency looking at the expression, without paying attention to the table structure. Nevertheless, there is a more subtle and complex type of dependency which is introduced by the usage of CALCULATE and/or filters inside any expression. Let us see the topic with an example, starting from a subset of columns of DimProduct in AdventureWorks:
We are interested in understanding the dependency list for a new calculated column which makes use of the CALCULATE function, like the following one:
SumOfListPrice := CALCULATE (SUM (DimProduct[ListPrice]))
At first glance, it may seem that the column depends from ListPrice only, as this is the only column used in the formula. Nevertheless, by using CALCULATE, we are asking to transform the current row context into a filter context and this modifies the dependency list. If we expand the meaning of the CALCULATE call, the formula really says:
Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost and ListPrice.
Reading the formula in this way, it is now clear that the formula depends on all of the columns of DimProduct because, the newly introduced filter context will filter all the columns of the table, thus depending on their value. Nevertheless, creating the column, everything goes fine and you get this nice result
Now, we might try to define a new calculated column, using the very same formula, in the same table. Thus, we try to add NewSumOfListPrice with the following formula, which is identical to the previous one.
NewSumOfListPrice := CALCULATE (SUM (DimProduct[ListPrice]))
Surprisingly, SSDT refuses to create this new formula and returns an error
Analysis Services has detected a circular dependency in the formula, which was not detected before. And, because we did not change anything in the formula, the error seems a very strange one. Something has changed indeed, and it is the number of columns in the table. If we were able to add the NewSumOfListPrice to the table, we would reach a situation where the two formulas have these meanings:
- SumOfListPrice: “Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost, ListPrice and NewSumOfListPrice”.
- NewSumOfListPrice: “Sum the value of ListPrice for all the row in the DimProduct table which have the same value for ProductKey, ProductAlternateKey, StandardCost, ListPrice and SumOfListPrice”.
Having added the calculated column, these columns become part of the filter context introduced by CALCULATE and, as a consequence, they are part of the dependency list. Reading the previous definition, it is clear that there is a circular dependency between the two formulas and this is the reason why Analysis Services refuses to allow us to create the NewSumOfListPrice column.
Understanding this error is not very easy. But, on the other hand, finding a solution is pretty straightforward, even if not very intuitive. The problem is that any calculated column containing CALCULATE (or a call to any measure, which adds an automatic CALCULATE) creates a dependency from all of the columns of the table.
The scenario would be different if the table had a row identifier (a primary key, in SQL terms). If the table has a column which acts as a row identifier, then all columns containing a CALCULATE could depend from the row identifier, thus reducing their dependency list to a single column which, by the way, is not a calculated one. Guess what? Tabular is smart enough to behave this way!
In the DimProduct table there is such a column: it is ProductKey. In order to mark the ProductKey as a row identifier you have two options:
- You can create a relationship from any table into DimProduct using ProductKey as the destination column. Performing this operation will ensure that ProductKey is a unique value for DimProduct.
- You can manually set the property of Row Identifier for ProductKey to TRUE using the Properties window inside SSDT or the corresponding feature in PowerPivot
One of these operations will make the engine learn that the table has a row identifier and, in such a scenario, you will be able to define the NewSumOfListPrice column avoiding circular dependency.
As a final advice, it is always a good idea to set the Row Identifier property of a table if such a row exists in the data model, because the Vertipaq engine will make use of this information to optimize all calculations. Nevertheless, row identifiers occupy space in the data model and their memory usage is pretty high because they have the maximum number of distinct values (a different value for each row). Thus, if a row identifier is not needed inside a table (for example for fact tables), it is always a good idea to avoid loading it inside the Tabular data model. Then, in case you face the circular dependency issue, it might be necessary to add the row identifier column to the table, so that the problem is solved.
There are other techniques to avoid circular dependencies. For example, using ALLEXCEPT to remove the calculated columns from the set of columns that become part of the dependency list is a viable option, but it makes all formulas more complicated. On the other hand, using ALLEXCEPT might be useful for very big tables, where the addition of a row identifier would cause memory footprint to grow too much.