Quantcast
Channel: Tabular – SQLBI
Viewing all articles
Browse latest Browse all 227

Optimize Many-to-Many Calculation in DAX with SUMMARIZE and Cross Table Filtering

$
0
0

PowerPivot and Analysis Services 2012 Tabular do not support many-to-many (M2M) relationship directly in the data model. However, you can obtain the desired result from a many-to-many relationship by writing a DAX expression. For example, consider the classical M2M relationship between bank accounts and customers.

clip_image002

If you want to obtain the total Amount for all the accounts of the selected group of customers, you have to split the operation in two parts: first, select the accounts, then you can apply the account filter to the Transaction table obtaining the desired result.

In order to filter the accounts, you can use the FILTER function, returning only those accounts that have at least one related customer active in the current filter context. Because the filter context is automatically propagated following the one-to-many relationship, any selection applied to Dim_Customer column propagates its effects to the Bridge_AccountCustomer table. The FILTER statement iterates the Dim_Account table and for each account the row context of Dim_Account is transformed into a filter context that is propagated to Bridge_AccountCustomer in the CALCULATE expression used in the second parameter of the FILTER call. At this point, counting the rows of the Bridge_AccountCustomer will return a number greater than 0 only if the considered account belongs to one or more selected customers. This calculation can be implemented with the following formula, that can operates also with the first version of PowerPivot (released with SQL Server 2008 R2).

AmountM2M :=

CALCULATE(
    SUM( Fact_Transaction[Amount] ),
    FILTER(
        Dim_Account,
        CALCULATE( COUNTROWS( Bridge_AccountCustomer ) ) > 0
    )
)

The operation performed by using the filter requires an iteration and if you have one million of accounts, there are one million of CALCULATE( COUNTROWS( Bridge_AccountCustomer ) ) calls required. With PowerPivot 2012 and Analysis Services 2012 Tabular it is possible to perform the same calculation in a more efficient way.

The FILTER has to return a list of Dim_Account and we can obtain the same result by just using the ID_Account column, which defines the link to both Fact_Transaction and Bridge_AccountCustomer tables. The new SUMMARIZE DAX function can be used in replacement of the previous FILTER. When you use SUMMARIZE, the first parameter is the table on which you want to perform the summarization, then there are one or more parameters that identify the columns on which data have to be grouped by and finally you can add optional pair of parameters that defines name and expression to summarize for each group. In other words, the SUMMARIZE is semantically similar to a GROUP BY operation in SQL.

We can try to understand the improvement in efficiency by making a SQL comparison. The FILTER expression

FILTER(
    Dim_Account,
    CALCULATE( COUNTROWS( Bridge_AccountCustomer ) ) > 0
)
<br>

corresponds to the following SQL query:

SELECT a.ID_Account
FROM   Dim_Account
WHERE  (SELECT COUNT(*) 
        FROM Bridge_AccountCustomer b
        WHERE b.ID_Account = a.ID_Account
          AND <filter context applies here>) > 0
WHERE <filter context applies here>

We will replace the previous FILTER with the following SUMMARIZE:

SUMMARIZE(
    Bridge_AccountCustomer,
    Dim_Account[ID_Account]
)

This SUMMARIZE is semantically similar to the following SQL query:

SELECT a.ID_Account
FROM   Bridge_AccountCustomer b
INNER JOIN Dim_Account a
WHERE <filter context applies here>
GROUP BY a.ID_Account

As you can see, the SUMMARIZE is more concise and better express the type of operation. Of course, a smarter query optimizer might result in a similar query plan, but such optimization is hard to obtain also from advanced RDBMS and it is not available in DAX engines today.

For this reason, it is important to use SUMMARIZE in order to get best performance navigating M2M relationships in DAX. The following is the more optimized DAX measure based on SUMMARIZE.

AmountM2M :=

CALCULATE(
    SUM( Fact_Transaction[Amount] ),
    SUMMARIZE(
        Bridge_AccountCustomer,
        Dim_Account[ID_Account]
    )
)

Finally, there is another possible syntax available in DAX that is harder to explain by making a comparison with SQL but returns the same result as the SUMMARIZE technique handling the many-to-many relationship. This approach has been described by Gerhard Brueckl and produces a very elegant calculation leveraging on cross table filtering. You simply have to include, in the CALCULATE filter parameters, the bridge table and the intermediate dimension table involved in the many-to-many relationship. In this example, we are talking about the Bridge_AccountCustomer and the DimAccount tables.

AmountM2M :=

CALCULATE(
    SUM( Fact_Transaction[Amount] ),
    Bridge_AccountCustomer
)

I made some tests on more complex models and this approach does provide performances similar to the SUMMARIZE version, but it has the big advantage or requiring less knowledge of the underlying model, making it easier to be automatically generated by a client tool or simply by a developer who don’t want to be worried about the field that is used to define a relationship.

You can find a more complete and deep discussion about many-to-many patterns in DAX and MDX in “The Many-to-Many Revolution 2.0” whitepaper.


Viewing all articles
Browse latest Browse all 227

Trending Articles