Analyzing table and column size is an important step in optimizing a data model for Power Pivot, Power BI, or Analysis Services Tabular. This article describes VertiPaq Analyzer, an Excel workbook to analyze detailed information extracted from Dynamic Management Views.
Analysis Services provides many Dynamic Management Views (DMV) to collect information about memory used by a data model. For example, DISCOVER_OBJECT_MEMORY_USAGE is a DMV that provides information about all the objects in memory. You can use such a DMV also to monitor a Multidimensional instance of Analysis Services. Kasper de Jonge created a sample model (BISM Memory Report) that organizes this data in a hierarchical way, making it easy to find the most expensive databases, tables, and columns on a server.
If you want to analyze a particular database, you probably want to look at more detailed information, which are available in other DMVs. However, manual access to individual DMVs is a time-consuming process, and the SQL language available for DMVs is very limited, not allowing groups and joins. For this reason, I created VertiPaq Analyzer, a Power Pivot data model to collect size information for all the objects in a specific database. In the current version, it works only with SSAS Tabular. If you need to analyze a Power Pivot data model, then you have to restore it to Analysis Services before performing the analysis. You can download the workbook from http://www.sqlbi.com/tools/vertipaq-analyzer/.
Connection
When you open the Excel file, you have to setup the connection to the database to monitor. After opening the Power Pivot window:
- Open Existing Connections
- Edit the SSAS connection
- Modify the instance name of Analysis Services (localhost\tabular in this example)
- Modify the database name (Contoso in this example)
- Test the connection (if it doesn’t work, check the connection string again)
- Save the connection
- Refresh the connection
Sample Visualizations
The Excel file includes some pivot tables showing many useful information. The first is Tables, which displays table and column information in a hierarchical way.
The second is Columns, a detail of the information for the columns, regardless of the table. This visualization makes it easy to identify the most expensive columns in a database, regardless of the table they belong.
Then we have User Hierarchies, which is a detail of the cost of user hierarchies in each table.
The Relationships pivot table displays information about the size of the relationships associated to each table (on the foreign key side).
Finally, the Compression report is an example of the information you can display about compression applied to table and columns. This example is a recap of the distribution of the segments and their size across different compression types and encoding bits.
Available Entities and Measures
The data model provides a number of measures:
- Data Size: bytes for all the compressed data in segments and partitions. It does not include dictionary and column hierarchies.
- Cardinality: object’s cardinality (number of rows of a table or number of unique values of a column)
- Rows: number of rows of a table, partition, or segment
- Columns Hierarchies Size: bytes of automatically generated hierarchies for columns (used by MDX).
- User Hierarchies Size: bytes of user-defined hierarchies
- Relationship Size: bytes of relationships between tables
- Columns Total Size: bytes of all the structures related to a column (sum of Data Size, Dictionary Size, and Columns Hierarchies Size)
- Dictionary Size: bytes of dictionary structures
- Table Size: bytes of a table (sum of Columns Total Size, User Hierarchies Size, and Relationships Size)
- Table Size %: ratio of Columns Total Size vs. Table Size
- Database Size %: ratio of Table Size vs. Database Size (sum of Table Size of all the tables)
- Segments #: number of segments
- Partitions #: number of partitions
- Columns #: number of columns
You can browse the data model using the following entities:
- Columns: includes Tables-Columns hierarchy (two levels, Table and Column), ColumnKey attribute (Table-Column as a single string, useful to browse columns regardless of the table they belong to)
- Columns Hierarchies: shows STRUCTURE_NAME with the name of the internal structure of a column hierarchy (usually ID_TO_POS and POS_TO_ID)
- Columns Segments: shows segments of columns in a table and includes a number of attributes:
- BITS_COUNT
- BOOKMARK_BITS_COUNT
- COMPRESSION_TYPE
- PARTITION_NAME
- SEGMENT_NUMBER
- TABLE_PARTITION_NUMBER
- VERTIPAQ_STATE
- Relationships: shows the relationships related to a table (RELATIONSHIP_ID is the only attribute, unfortunately it does not display the columns involved in the relationship); it cannot be crossed with columns and segments.
- User Hierarchies: shows hierarchies defined by user related to a table (cannot be crossed with columns and segments); it includes two attributes:
- STRUCTURE_NAME: name of the internal structure of a user hierarchy (usually CHILD_COUNT, FIRST_CHILD_POS, MULTILEVEL_ID, and PARENT_POS)
- User Hierarchy: name of the user-defined hierarchy
Internal Structure
The following picture shows the entities imported in the data model from the DMVs. You can see that collecting the information in a meaningful way requires seven different queries, resulting in measures having different granularities.
The Tables and Columns Cardinality tables are hidden and used by measures and calculated columns, in order to provide a more intuitive user experience navigating data in Pivot Tables.
Many calculated columns defined in the data model provides numeric information useful if you want to display the data as tables in Excel, without using a pivot table. You can find more information about the DMVs used in this data model, and on the meaning of certain columns and measures, in The Definitive Guide to DAX book.