Determining SSAS Aggregation Usage

Aggregations improve query performance by limiting the amount of partition data that needs to be scanned to answer a query. Instead of querying the fact level data of the partition, SSAS will query the aggregation and because the amount of data in the aggregation is smaller, the query can be answered more succinctly. However these benefits come with a trade-off, they take time to build which increases processing time and space. They can also have an adverse impact on performance where there too many designs. This post shows how to determine if an aggregation is being used by using trace events.

A quick overview: When SSAS receives an MDX request, the statement is passed to the formula engine (FE). The FE requests data from the cache. If the data is in the cache (readily available) it can be returned to the FE directly, if not, it the request is passed to the storage engine (SE) which retrieves the data. Note that, if the data is available in cache, there is no request to the SE.

This raises two side points. Firstly, there is no SE requirement for data that is in cache. Cache warming may improve query performance (without the need for the aggregation). However, to warm the cache, data must still be loaded (which may be improved with an aggregation). Secondly, the aggregation will only be effective when the query is executed against cold cache (requesting non-cached data).

In these examples, we are using the reseller sales partition of adventure works. There is a simple aggregation design that included [Month Name] which is only applied to the 2003 partition. Additionally, the relationships for the [date] dimension are shown below. Note that Calendar and Fiscal hierarchies can be defined from the [Month Name] attribute.

Trace Events

The trace is run with the following events;

Event Class Event
Progress Report Progress Report End
Query Events Query Event Begin
Query Event End
Query Processing Get Data From Aggregation
Get Data From Cache

Clearing the Cache

Unless otherwise stated, each query is executed after clearing the cache. Cache can be cleared from a database, cube or partition. For simplicity, cache is cleared from the database using the command.

<ClearCache xmlns=http://schemas.microsoft.com/analysisservices/2003/engine>
<Object>
<DatabaseID>Adventure Works</DatabaseID>
</Object>
</ClearCache>

A more comprehensive list of commands which cover how to specify other objects can be found at ssas-info;

http://www.ssas-info.com/analysis-services-faq/27-mdx/133-mdx-how-do-i-clear-analysis-services-ssas-database-cache

Query 1 – Data from the Partition

A query that requires data from 2002 will read the partition (recall there is no aggregation on this parition). We can see that the text data for the progress report end specifies the partition that is used (Reseller_Sales_2002).

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

When the same query is run without clearing the cache, the progress report is replaced with the event get data from cache.

Query 2 – Data from the Aggregation
When the query is executed for 2003, the trace event includes the get data from aggregation event. Additionally the text data for the progress report shows that the Aggregation is used.

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

Although not shown in the trace, the retrieval of data from the aggregation is faster than from the partition (duration of 7ms verse 0ms).

A Quick Note about the Benefits of Relationships

The only attribute defined in the aggregation was [Month Name], yet the queries, have used [Calendar Year] and hit the aggregation where possible. This is because there are relationships defined between [Month Name] and [Calendar Year] and, the aggregation can be used to answer queries for attributes from higher levels in the tree.

We can see the path by using the aggregation manager in BIDS helper (@ http://bidshelper.codeplex.com/)
Because the relationships for [Canendar] and [Fiscal] hierarchies are derived by the attribute [Month Name], both hierarchies can utilise the aggregation.
A query against the [Fiscal] hierarchy will also use the aggreagtion design.

Conclusion

This post has shown how to determine if aggregations are used to answer queries. When applied, they reduce the overhead associated with data retrieval. However, this reduction is only to the extent that it is requested from the SE. When data is in cache, there is no requirement on the SE (and hence neither the aggregation nor partition data will be used).

Moving SSAS Databases (Detach)

There are a few ways to move SSAS databases. This post looks at the detach-reattach method.

When SSAS creates a database, it stores all the files for the database in a folder under the instances data path (assuming that you haven’t specified alternate locations for any of the objects). The data folder for my instance is below; I have two versions of adventure works;

You can detach the database from the instance by simply right clicking on the database node as selecting detach. The only configuration option that you have for it is specifying a password.

Detaching the database creates a detach_log file in the database root folder (ie under ‘Adventure Works DW 2008R2.0.db’). This is needed to reattach the database. You can’t simply copy the files and then try to reattach.

The entire database directory can then be copied to a new location. There is no requirement for this to be the default SSAS (data) directory .

Re-Attaching the database is almost the reverse, just specify the Attach option and the folder name (for the database).

The Last Non Empty Dilemma

The quantity of stock on hand is a common reporting requirement for inventory systems where quantity of stock on hand at aggregate time levels is shown as the last value that was recorded. When time is aggregated, (eg the year as a parent of months), the value shown refers to the last recorded value and not the sum of months. In this post we look at the use of LastNonEmpty aggregation, and the implications that it has.

The data used in this post assumes the leaf data for the diagram above (products [a], [b], [c], [d] with moths [Oct], [Nov], [Dec]). When the November stock take is done, the year value for stock holding should display 19, and when December stock take is done the year value should show 12. Showing the year value as the sum of its children will give incorrect results because, (after Novembers stocktake) the stock holiding is not equal to October quantity plus November (45=26+19), only the last known value (Nov) 19.

Last Non Empty Aggregation

The SSAS aggregation function ‘LastNonEmpty’ can solve this issue automatically by showing the time aggregates as the last recorded value. By default aggregation is set to sum but it can be set on the properties tab for the measure (as below).

The higher levels of time now inherit the last known quantity for a given product which may seem a reasonable solution (below). However, this can cause an issue when stock values are not present for all stock items in the last period. When an stock item (sku) is excluded, the LastNonEmpty inherits the last known period value. In this situation, the LastNonEmpty can bypass the last known (or input) period and aggregate values across inconsistent periods. This is shown in the diagram below where the last recorded period is December and the yearly counts include November balances.

Furthermore, total balances are calcualted as the total for the LastNonEmpty period and so the total for the time aggregate doesnot equal the sum of its children. We can see that;

  1. [CY 2010] values at the product level inherits the last known value.
  2. [CY 2010] [Grand Total] inherits the [Grand Total] for [CY 2010 Dec] (so that 12=12) and
  3. The [Grand Total] for [CY 2010] does not equal the sum of each product ([a],[b],[c],[d]) for [CY 2010] (ie 12 is not the sum of 8+3+4+1). If we are to assume that the values for [CY 2010] for each product will aggregate to the [Grand Total] for the product, we should see 16 as the [Grand Total] for [CY 2010].

Clearly, the use of the LastNonEmpty as an aggregate function comes with some caveats.

Solution 1 (Fact Based)

One way to solve this issue is to include zero balances for inventory items where the value is zero (or would otherwise be null). This can be done by appending the a zero quantity products that are not counted (have zero balances or are null). In this case, a view which automatically derives missing data based on the theroitical space of (date x prodcut less non zero values).

As expected, the values for all products are known for a month and the time aggregate inherits the last materialised period amount.

Solution 2 (Script Based)

A different approach is to scipt a measure that satisifys the aggregation function and answers the LastNonEmpty based on the last non empty date period for all products. Leaving the existing LastNonEmpty measure inplace and using a calcualte member [Stock On Hand] to calculate time aggregates based on the [Prod Count] value. Essentially, we are looking for the [Prod Count] values on the last [date] member where any product had a value. In this context, the last date member refers to the last (maximum) date for any product that had stock.

The solution follows the following logic;

  1. What was the last date that products had value
  2. What was the value of product on this date?

The last date (that any product was sold) can be determined by the snippet below

TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1)

And the sum aggregates these values (for the last time period)

SUM(  TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1) 
 , [Measures].[Prod Count] 
 )

Other Calculation Considerations

Introducing the [Stock On Hand] calcualted member solves the aggregate time issues and [CY 2010] values will show correct (as below). The [Stock On Hand] value for [CY 2010] will show the stock as at the last recorded period ([CY 2010 Dec]).

One side-effect for this calcualtion (asis) is that preceeding periods will show values for post dated periods. For example (below), [CY 2010 Sep] values will be displayed on the basis of [CY 2010 Dec]. Clearly this is also not correct as the values for [Sep] should be null.

This behaviour can be corrected by only displaying the calcualted value when a LastNonEmpty value would have otherwise been shown. The calcuation becomes;

IIF([Measures].[Prod Count]=0  , NULL  , SUM(TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1)  , [Measures].[Prod Count]  ) 

And this satisfies all requirements

Conculsion

The use of LastNonEmpty aggregation offers a convenient way to aggregate measures such as product count across time hierarchies. However, there are some assumptions included with the use of the function that may cause output not to show as expected. This output can be controlled by either by including zero values on for products that are not counted or by scripting a calculated measure to manually aggregate across time.

 

Improving the performance of the sum:

As Chris has pointed out, the sum function will perform poorly in a large environment.  This can be greatly improved by defining the LNE over the tuple for the last date that has values.  That is, we derive the last date member (as below)

TAIL(Filter([Date].[Calander
Years].[Cal Month], (ROOT([Product]),[Measures].[Prod
Count] )> 0)

,1).ITEM(0)

Other methods of defining the last posting date may use attributes of the date dimension (see http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?List=83c71d76-a4d8-4197-9257-38d6b857634f&ID=16)
per Boyans post.

The stock on hand value is then defined as the tuple of the
member and our materialised value;

MEMBER
[Measures].[Stock Count] as

(

[Measures].[Prod Count]

,[Date].[Calander Years].LAST_MONTH_VALUE

)