There are two options to model many-to-many relationships using Tabular and Power BI: you can use either a regular bidirectional filter relationship, or a limited unidirectional relationship. In this article we compare the performance of both options.
Our readers know SQLBI’s position on bidirectional relationships: they are a powerful tool that should be used with great care and avoided in most scenarios. There actually is one scenario where bidirectional relationships are useful: when you need to create a model involving a many-to-many relationship between dimensions. In this scenario, using a bidirectional filter relationship is the suggested solution. Nonetheless, one of the reasons a bidirectional relationship cannot be created may be ambiguity.. If you are facing this situation, you can use a different modeling technique based on a limited many-to-many cardinality relationship, which would work even when it is set as a unidirectional relationship. The choice between the two models is not an easy one to make. Both come with advantages and disadvantages that need to be understood in depth in order to make the right decision.
In this article, we first provide a description of the two techniques, and then we proceed with the performance analysis of both solutions, so to provide information about which technique to use and when.
The scenario
In the Contoso database, there are no many-to-many relationships. Therefore, we created a many-to-many relationship between the Customer and the Sport tables. We assigned to each customer a certain number of sports that they play. Because each customer may play multiple sports, and each sport may be played by multiple customers, the natural relationship between customers and sports is a many-to-many relationship.
The canonical way to model this relationship is shown below.
A filter from Sport flows to CustomerSupport, then it goes through the bidirectional relationship to Customer to finally reach Sales. By using this model, you can slice Sales by Sport thus obtaining the usual many-to-many non-additive behavior. This is the way we have been teaching many-to-many relationships over the years.
We have recently been discussing another possible option to model this scenario by using limited many-to-many cardinality relationships.
This second model is visually more appealing and it does not require bidirectional relationships – which is most relevant. A filter from either Customer or Sport is applied first to CustomerSport and is then moved to Sales through the many-to-many cross-filter relationship.
Because of the absence of bidirectional relationships in this model, the model is surely more flexible than the previous one. That said, is it also better or at least comparable in terms of performance? Unfortunately, the quick answer is “it depends”. This second model produces different query plans that can be better in some scenarios, and not as good in other scenarios. It all depends on the tables involved in the specific query. Therefore, the new model is an option that should be tested thoroughly before you move it into production.
In order to prove the above statement, we obviously need to perform tests. Before going into the details of the tests, we look at a few considerations. In order to distinguish between the two models, we call the first one the “canonical” model, whereas the second model presented is the “new” model.
The weak point of canonical many-to-many relationships is the bidirectional relationships. It is a weak point from the modeling point of view, because it might introduce ambiguity. It is also a weak point from the performance point of view, because transferring a filter from the many-side to the one-side is always more expensive than doing the opposite.
The new solution does not have bidirectional relationships, but it also presents two weak points. First, the relationship between CustomerSport and Sales is a limited relationship. Performance-wise, limited relationships are slow because they are not materialized. What we should measure is whether a bidirectional relationship is slower than a many-to-many cardinality relationship or not. But then there is another issue. In the canonical model, Customer is linked directly to Sales through a regular 1:M relationship. In the new model, Customer is linked to Sales only through the bridge table. This means that the engine traverses the many-to-many relationship whenever a user slices by attributes of Customer or Sport. In the canonical model, if a user browses by customer, then the bridge table is not part of the game. In the new model, the bridge table always plays an important role. In the new model, a user browsing by customer runs a query that always uses CustomerSport to filter Sales. This important detail is the weakest point of the new model, as we will analyze further with the queries.
Testing performance
The considerations so far are useful to define the tests to perform. We will check the performance of computing Sales Amount (a measure that scans the Sales table) filtering by Sport and by Customer in both scenarios. The model we are using is a Contoso version with 1.4 billion rows. The CustomerSport table contains around 3 million rows: we created a model where each customer plays between zero sport and three different sports.
The first query groups by Sport[Sport] and it computes Sales Amount. In its simplicity, the query needs to traverse the CustomerSport relationship using only the Sport table to group by:
EVALUATE SUMMARIZECOLUMNS ( Sport[Sport], "Amt", [Sales Amount] )
In order to deepen our knowledge of how many-to-many are evaluated, we also analyze a second, slightly more complex query, that groups by both Sport and Customer:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], Sport[Sport], "Amt", [Sales Amount] )
Finally, the last query we analyze groups by Customer only. In the canonical model, this query does not need to traverse the CustomerSport table. In the new model, on the other hand, this query needs to use CustomerSport anyway, because of the way relationships are laid out:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], "Amt", [Sales Amount] )
Tests on the canonical model
First, this is the canonical model, with one bidirectional relationship between Customer and CustomerSport.
Let us now run the queries on the canonical model. The first query groups by Sport:
EVALUATE SUMMARIZECOLUMNS ( Sport[Sport], "Amt", [Sales Amount] )
First, the execution time. The query uses 385,984 milliseconds of storage engine CPU, with a tiny amount of formula engine (10 milliseconds).
The query plan is interesting to analyze. The entire calculation is pushed down to the storage engine, with the aid of temporary tables. Here are the details:
-- -- First we retrieve the pairs of CustomerKey and Sport from CustomerSupport -- DEFINE TABLE '$TTable3' := SELECT 'Customer'[CustomerKey], 'Sport'[Sport] FROM 'CustomerSport' LEFT OUTER JOIN 'Customer' ON 'CustomerSport'[CustomerKey]='Customer'[CustomerKey] LEFT OUTER JOIN 'Sport' ON 'CustomerSport'[SportKey]='Sport'[SportKey]; -- -- Here we extract the CustomerKey values in a bitmap -- DEFINE TABLE '$TTable4' := SELECTSIMPLEINDEXN ( '$TTable3'[Customer$CustomerKey] ) FROM '$TTable3'; -- -- Here is where the actual computation happens. -- Note that the first INNER JOIN is with Table3, to retrieve the Sport -- Table4 is used to further filter the scanning of Sales -- DEFINE TABLE '$TTable1' := SELECT '$TTable3'[Sport$Sport], SUM ( '$TTable2'[$Measure0] ) FROM '$TTable2' INNER JOIN '$TTable3' ON '$TTable2'[Customer$CustomerKey]='$TTable3'[Customer$CustomerKey] REDUCED BY '$TTable2' := WITH $Expr0 := 'Sales'[Quantity] * 'Sales'[Net Price] SELECT 'Customer'[CustomerKey], SUM ( @$Expr0 ) FROM 'Sales' LEFT OUTER JOIN 'Customer' ON 'Sales'[CustomerKey]='Customer'[CustomerKey] WHERE 'Customer'[CustomerKey] ININDEX '$TTable4'[$SemijoinProjection];
The engine performs its calculation in three steps, corresponding to the three storage engine queries:
- First it retrieves the pairs of SportKey and CustomerKey from CustomerSport.
- Next, it generates a bitmap containing the CustomerKey values present in CustomerSport. This table is going to be used in the next step, to filter the customers being scanned from Sales. Indeed, because we are only interested in slicing by sport, customers not doing any sport will be ignored.
- The last step is where the actual calculation happens. The engine scans Sales and retrieves only the customers that are in CustomerSupport (Table2). It joins this result with Table3, that maps customers to sports.
Despite being a complex plan, you see that the entire calculation has been pushed down to the storage engine. Nonetheless, the storage engine cannot handle bidirectional filters by itself, therefore it needs to prepare these temporary tables to produce the correct joins.
The overall logic is also visible in the physical query plan. Indeed, the physical query plan is made of only four lines that retrieve the 45 rows corresponding to the sport, entirely computed by the storage engine.
Because the entire calculation is pushed down to the storage engine, the degree of parallelism used is very good: it shows x49.7 on a machine with 64 virtual cores, as a further demonstration that the engine is maximizing its use of CPU power.
We described the plan with so many details because it is important to be able to compare this first plan with the following plans. There are significant differences in the performance of different queries on different models and the details matter.
Let us now add Customer[Continent] to the query. From a DAX point of view, the difference is not that complex: just one more line in the SUMMARIZECOLUMNS groupby list:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], Sport[Sport], "Amt", [Sales Amount] )
Despite being very close to the previous query, the timings are now very different.
Before diving into the details, please note that the degree of parallelism is largely reduced: it now shows x23.5, less than half what we had in the previous query. Moreover, the formula engine time has greatly increased: it went from 10 to 3,120 milliseconds.
In order to understand the reasons why we have this performance degradation, we need to dive deeper. These are the storage engine queries:
-- -- Here we retrieve the Sales Amount for each customer, without reducing the -- set to only the customers being filtered by the CustomerSport table, as it -- happened in the previous query -- WITH $Expr0 := 'Sales'[Quantity] * 'Sales'[Net Price] SELECT 'Customer'[CustomerKey], 'Customer'[Continent], SUM ( @$Expr0 ) FROM 'Sales' LEFT OUTER JOIN 'Customer' ON 'Sales'[CustomerKey]='Customer'[CustomerKey]; -- -- Here we match CustomerKey values with Customer[Continent] and Sport -- by scanning Customersport, Customer and Sport together -- SELECT 'Customer'[CustomerKey], 'Customer'[Continent], 'Sport'[Sport] FROM 'CustomerSport' LEFT OUTER JOIN 'Customer' ON 'CustomerSport'[CustomerKey]='Customer'[CustomerKey] LEFT OUTER JOIN 'Sport' ON 'CustomerSport'[SportKey]='Sport'[SportKey];
This time, the storage engine does not actually compute the values being returned by the query. It first retrieves the sales amount for each customer key, and then another table that maps customer keys with continent and sport.
The absence of further calculations in the storage engine is a clear indication that the real many-to-many computation happens in the formula engine. Hence the large increase in formula engine use. Moreover, because it will be the formula engine that computes the actual values, this means that the data structures containing the partial values need to be passed back to the formula engine. This is the biggest difference between the two query plans: the previous query plan performed all its computation inside the storage engine, whereas this latter query plan requires moving large amounts of data back to the formula engine.
You can appreciate this “detail” by looking at the physical query plan, that now clearly indicates that the table with Continent and Sport (135 rows) is joined with the larger table containing Sales Amount by CustomerKey (2,802,660 rows) in the formula engine.
Because it involves the formula engine, this query plan is clearly suboptimal compared to the previous query plan.
Things are very different when the CustomerSupport table is not part of the game. Indeed, the last query we analyze slices the sales amount only by Customer[Continent]. Because of the way we laid out relationships, there is no need to use the CustomerSupport table, because Customer can apply its filtering directly to Sales:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], "Amt", [Sales Amount] )
The server timings are excellent.
There is no formula engine involved, the degree of parallelism is an awesome x54.5 and the entire query is executed using only 16,609 milliseconds of storage engine CPU. This is the behavior that can be expected of DAX with regular one-to-many relationships. The entire calculation is performed by its best engine: the storage engine.
Now that we have executed and analyzed the three queries, we can draw conclusions about the canonical model. When the query involves only the Sport table and the calculation can be pushed down to the storage engine, Tabular generates a very good query plan that uses the storage engine at its best. The query is heavy, but by using temporary tables that are not moved out of VertiPaq, the degree of parallelism is good. When the query becomes more complex, the formula engine is required – which greatly slows down the entire query. If the engine does not use the CustomerSupport bridge table, then VertiPaq performs at its best: no formula engine is involved and the query does not even need temporary structures.
Tests on the new model
Now that we have a good understanding of the way many-to-many relationships are resolved in the canonical model, we can perform the same three tests on the new model to check how it behaves. This below is what the new model looks like.
There are no bidirectional relationships. This time, Customer is linked to Sales through CustomerSport, the same way Sport is.
We start with the first query, that groups by Sport only:
EVALUATE SUMMARIZECOLUMNS ( Sport[Sport], "Amt", [Sales Amount] )
First, the timings:
The entire query is resolved in the storage engine. There are a few queries running on a single core, and the main query achieves a very good degree of parallelism. Overall, the total timing is very close to what we achieve with the canonical model.
Because the layout of the relationships is different, the query plan is also slightly different from the one obtained with the canonical model – although the overall logic is very close. Let us look at the storage engine queries:
-- -- Here we retrieve Sport and CustomerKey scanning CustomerSport -- DEFINE TABLE '$TTable3' := SELECT 'Sport'[Sport], 'CustomerSport'[CustomerKey] FROM 'CustomerSport' LEFT OUTER JOIN 'Sport' ON 'CustomerSport'[SportKey]='Sport'[SportKey]; -- -- Here we retrieve CustomerKey only from Table3, which is derived from CustomerSport -- DEFINE TABLE '$TTable4' := SELECT '$TTable3'[CustomerSport$CustomerKey] FROM '$TTable3'; -- -- Table5 contains the values of CustomerKey that are present in Sales, it will -- be used later to reduce the scanning of Sales -- DEFINE TABLE '$TTable5' := SELECT RJOIN ( '$TTable4'[CustomerSport$CustomerKey] ) FROM '$TTable4' REVERSE BITMAP JOIN 'Sales' ON '$TTable4'[CustomerSport$CustomerKey]='Sales'[CustomerKey]; -- -- In this final query, Table5 is used to reduce the scanning of Sales to retrieve -- the final result -- DEFINE TABLE '$TTable1' := SELECT '$TTable3'[Sport$Sport], SUM ( '$TTable2'[$Measure0] ) FROM '$TTable2' INNER JOIN '$TTable3' ON '$TTable2'[Sales$CustomerKey]='$TTable3'[CustomerSport$CustomerKey] REDUCED BY '$TTable2' := WITH $Expr0 := 'Sales'[Quantity] * 'Sales'[Net Price] SELECT 'Sales'[CustomerKey], SUM ( @$Expr0 ) FROM 'Sales' WHERE 'Sales'[CustomerKey] ININDEX '$TTable5'[$SemijoinProjection];
As you see, despite the number of storage engine queries being different, the overall logic is the same as the the canonical model. The physical query plan is the final demonstration that the plans are actually nearly identical and that the entire work is done in the storage engine.
Time to look at the second query, involving both Customer and Sport:
Time to look at the second query, involving both Customer and Sport:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], Sport[Sport], "Amt", [Sales Amount] )
Here we can appreciate the first important difference between the canonical model and the new model. In this scenario, the canonical model involved a heavy usage of the formula engine because it did not rely on the approach used in the previous query. Indeed, the shape of the relationships between Customer and Sales was not the same as the shape of the relationships between Sport and Customer.
In the new model, Customer and Sport are related to CustomerSupport using the very same type of relationship. Consequently, the timings and the format of the internal queries are close to the ones observed in the previous model.
The considerations for this query are the same as for the previous model. There is nearly no formula engine used: the entire query is being executed in the storage engine. This is different than the same query on the canonical model.
In the canonical model, adding Customer triggered a severe degradation of performance, with a higher use of the formula engine. In the new model, adding more filters is not an issue. Regarding this, the new model performs significantly better. Let us look at the details of the storage engine queries, to obtain a better picture:
-- -- Here we retrieve Sport, Continent and CustomerKey scanning CustomerSport -- DEFINE TABLE '$TTable3' := SELECT 'Customer'[Continent], 'Sport'[Sport], 'CustomerSport'[CustomerKey] FROM 'CustomerSport' LEFT OUTER JOIN 'Customer' ON 'CustomerSport'[CustomerKey]='Customer'[CustomerKey] LEFT OUTER JOIN 'Sport' ON 'CustomerSport'[SportKey]='Sport'[SportKey]; -- -- Here we retrieve CustomerKey only from Table3, which is derived from CustomerSport -- DEFINE TABLE '$TTable4' := SELECT '$TTable3'[CustomerSport$CustomerKey] FROM '$TTable3'; -- -- Table5 contains the values of CustomerKey that are present in Sales, it will -- be used later to reduce the scanning of Sales -- DEFINE TABLE '$TTable5' := SELECTRJOIN ( '$TTable4'[CustomerSport$CustomerKey] ) FROM '$TTable4' REVERSE BITMAP JOIN 'Sales' ON '$TTable4'[CustomerSport$CustomerKey]='Sales'[CustomerKey]; -- -- In this final query, Table5 is used to reduce the scanning of Sales to retrieve -- the final result -- DEFINE TABLE '$TTable1' := SELECT '$TTable3'[Customer$Continent], '$TTable3'[Sport$Sport], SUM ( '$TTable2'[$Measure0] ) FROM '$TTable2' INNER JOIN '$TTable3' ON '$TTable2'[Sales$CustomerKey]='$TTable3'[CustomerSport$CustomerKey] REDUCED BY '$TTable2' := WITH $Expr0 := 'Sales'[Quantity] * 'Sales'[Net Price] SELECT 'Sales'[CustomerKey], SUM ( @$Expr0 ) FROM 'Sales' WHERE 'Sales'[CustomerKey] ININDEX '$TTable5'[$SemijoinProjection];
As you see, apart from some differences in the columns being used in the query, the overall structure is identical to the structure in the previous model. The query plan is – again – identical.
In this second query the new model performs better than the canonical model. This is already extremely interesting, because it shows that depending on the kind of queries that you plan to execute on a model, one model performs better than the other.
Unfortunately, the new model falls short on the third query – the query that uses a filter on Customer to reach Sales. In the canonical model, the relationship between Customer and Sales is direct. Therefore, the entire query is executed using a single xmSQL query that performs great:
EVALUATE SUMMARIZECOLUMNS ( Customer[Continent], "Amt", [Sales Amount] )
In the new model, the engine needs to still use CustomerSport to reach Sales. Therefore, the third query runs at nearly the same speed as the previous two and the benefit is lost. Here are the timings in the picture below.
Because Continent contains slightly fewer values, and not all the customers play all the sports, the data structures used by the query are slightly smaller and the execution time is slightly shorter. Overall, the format of the storage engine queries is identical to the previous two, and not worth repeating a third time.
In this third scenario the performance of the model is much poorer than the performance of the canonical model. Again, this is an interesting result as it shows that you need to make choices based on the kinds of queries being executed. If you mostly do not use the many-to-many relationship between Customer and Sport and you rely more on the direct relationship between Customer and Sales for most of your calculations, then the canonical model performs better. On the other hand, if the many-to-many structure is always part of the queries, then the new model comes with interesting advantages.
Conclusions
We performed a detailed analysis of a few queries on the two models, trying to find the best model. The final verdict, as is almost always the case in the world of data models, is it depends.
The canonical model is a good choice whenever the many-to-many relationship is seldom being used. You pay a high price when you use the bridge table through Customer because mixing a bidirectional filter relationship from the many-side with other relationships produces a poorer query plan. When the bridge table is not part of the equation, the canonical model is much faster. However, when the bridge table is being used the canonical model is slower and it uses quite a lot of formula engine. Therefore, you observe a high variation in the speed of your queries – depending on whether you traverse the bridge table or not.
The new model is overall more constant in its performance. The bridge table is always part of a query but the combination of regular relationships with a limited many-to-many cardinality relationship to reach Sales produces query plans that always display the same level of performance. There is a more constant loss in performance, regardless of the dimensions used for filtering. This means that you never enjoy the great performance of a direct filter between Customer and Sales, because the only way Customer can reach Sales is through CustomerSport.
As is always the case, the choices depend on the specifics of your model. In choosing between the canonical model or the new model, you should carefully evaluate the queries that will be executed and make an educated choice based on user requirements. For extreme optimization scenarios it might also be possible to keep both sets of relationships in the model, and author code by using IF conditions and the CROSSFILTER and USERELATIONSHIP modifiers in order to choose the best relationships layout depending on the query being executed. Again, this is not a suggested technique: changing the relationship layout has another effect on performance and it disables several optimizations. Nonetheless, it is a technique worth trying when you strive for extreme optimization.