Aggregations improve query performance by limiting the amount of partition data that needs to be scanned to answer a query. Instead of querying the fact level data of the partition, SSAS will query the aggregation and because the amount of data in the aggregation is smaller, the query can be answered more succinctly. However these benefits come with a trade-off, they take time to build which increases processing time and space. They can also have an adverse impact on performance where there too many designs. This post shows how to determine if an aggregation is being used by using trace events.
A quick overview: When SSAS receives an MDX request, the statement is passed to the formula engine (FE). The FE requests data from the cache. If the data is in the cache (readily available) it can be returned to the FE directly, if not, it the request is passed to the storage engine (SE) which retrieves the data. Note that, if the data is available in cache, there is no request to the SE.
This raises two side points. Firstly, there is no SE requirement for data that is in cache. Cache warming may improve query performance (without the need for the aggregation). However, to warm the cache, data must still be loaded (which may be improved with an aggregation). Secondly, the aggregation will only be effective when the query is executed against cold cache (requesting non-cached data).
In these examples, we are using the reseller sales partition of adventure works. There is a simple aggregation design that included [Month Name] which is only applied to the 2003 partition. Additionally, the relationships for the [date] dimension are shown below. Note that Calendar and Fiscal hierarchies can be defined from the [Month Name] attribute.
The trace is run with the following events;
|Progress Report||Progress Report End|
|Query Events||Query Event Begin|
|Query Event End|
|Query Processing||Get Data From Aggregation|
|Get Data From Cache|
Clearing the Cache
Unless otherwise stated, each query is executed after clearing the cache. Cache can be cleared from a database, cube or partition. For simplicity, cache is cleared from the database using the command.
A more comprehensive list of commands which cover how to specify other objects can be found at ssas-info;
Query 1 – Data from the Partition
A query that requires data from 2002 will read the partition (recall there is no aggregation on this parition). We can see that the text data for the progress report end specifies the partition that is used (Reseller_Sales_2002).
|select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].& on 1
from [Adventure Works]
When the same query is run without clearing the cache, the progress report is replaced with the event get data from cache.
Query 2 – Data from the Aggregation
When the query is executed for 2003, the trace event includes the get data from aggregation event. Additionally the text data for the progress report shows that the Aggregation is used.
select [Measures].[Reseller Sales Amount] on 0,
Although not shown in the trace, the retrieval of data from the aggregation is faster than from the partition (duration of 7ms verse 0ms).
A Quick Note about the Benefits of Relationships
The only attribute defined in the aggregation was [Month Name], yet the queries, have used [Calendar Year] and hit the aggregation where possible. This is because there are relationships defined between [Month Name] and [Calendar Year] and, the aggregation can be used to answer queries for attributes from higher levels in the tree.
|We can see the path by using the aggregation manager in BIDS helper (@ http://bidshelper.codeplex.com/)
Because the relationships for [Canendar] and [Fiscal] hierarchies are derived by the attribute [Month Name], both hierarchies can utilise the aggregation.
A query against the [Fiscal] hierarchy will also use the aggreagtion design.
This post has shown how to determine if aggregations are used to answer queries. When applied, they reduce the overhead associated with data retrieval. However, this reduction is only to the extent that it is requested from the SE. When data is in cache, there is no requirement on the SE (and hence neither the aggregation nor partition data will be used).