Archive

Posts Tagged ‘SSAS’

Accessing the Slicer through VBA

February 24, 2012 Leave a comment

There may be times when we want to programmatically control slicers through VBA. For example, we may want to default a date to the current date or set a cost centre depending on who has opened the book. This post looks how a slicer can be controlled through VBA.

In this example, we have added a slicer to a worksheet that uses a date hierarchy as its source. Because, we have included all levels of the hierarchy when the slicer was setup, we get three individual slicers for each level of the hierarchy.

If we look at the settings for the slicer (right click on the slicer and select slicer settings), we can see that the slicer has a name and each level of the slicer hierarchy maintains the hierarchy level name. For example, the Slicer_Dates_Hie below has a level Year, Month and Day. Although we can change the name (for example the name Year in the picture below), the slicer retains the mdx level that the slicer belongs to.

Accessing the Slicer

We can access the slicer through the SlicerCaches object. This is as simple as declaring the slicer cache and referencing it to the name of the slicer we want to use. For example;

Dim sC As SlicerCache
Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)

Navigating the Structure of the Slicer

Once we have a reference to the slicer we can navigate its structure using SlicerCacheLevels. For example we can determine the number of levels of the slicer and iterate over them with the following code.

Dim sC As SlicerCache
Dim sL As SlicerCacheLevel
Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)

For Each sL In sC.SlicerCacheLevels
Debug.Print “Level ” + CStr(sL.Ordinal) + ” –> ” + sL.Name
Next sL

Naturally, the level can be accessed through the cache level ordinal to produce the same result. The highest level (year) takes the value 1 which increments for each level from the first level. There is always a level (ie level 1) even if the slicer is based on a single attribute.

Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)
For i = 1 To sC.SlicerCacheLevels.Count
Debug.Print “Level ” + CStr(i) + ” –> ” + sC.SlicerCacheLevels(i).Name
Next i

Slicer Data Members

We can gain access to the data items through slicer items, as mdx attributes, they have a caption, value and a key (member unique name). For example the year 2011 in this slicer has a value of 2011 and a name (MDX unique name) of [Dates].[Dates Hie].[Year].&[2011]

Dim sC As SlicerCache
Dim SL As SlicerCacheLevel
Dim sI As SlicerItem

Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)
Set SL = sC.SlicerCacheLevels(1)
Debug.Print “——————————————————————————”

For Each sI In SL.SlicerItems
Debug.Print “Caption –> ” & sI.Caption
Debug.Print “Value –> ” + CStr(sI.Value)
Debug.Print “Unique Name –> ” + sI.Name
Debug.Print “——————————————————————————”

Next

Setting the Slicer Value

Slicer item selection must be set through the visible slicer items list and is specified using an array. For example, we could set the SlicerCache (selected items) to 2011 and 2012 with the following code;

sC.VisibleSlicerItemsList = Array(“[Dates].[Dates Hie].[Year].&[2011]“, “[Dates].[Dates Hie].[Year].&[2012]“)

The name selected must be a data member of the level. If not a runtime error will occur (as below)

Once the values are set, connected pivots are updated immediately

Member Iteration

Members can be easily iterated using the following code;

Dim sC As SlicerCache 
Dim SL As SlicerCacheLevel 
Dim sI As SlicerItem
Set sC = ActiveWorkbook.SlicerCaches("Slicer_Dates_Hie") 
Set SL = sC.SlicerCacheLevels(2)
For Each sI In SL.SlicerItems    
 sC.VisibleSlicerItemsList = Array(sI.Name) 
Next

Conclusion

The control of slicers through VBA could be used to provide some very nice personalisation to work books.

Categories: SSAS Tags: , ,

SSAS Dimension Error Processing & Duplicate Attributes

December 8, 2011 1 comment

SSAS error configurations allow for some flexible arrangements in object processing. Naively, these error configurations may be changed so that a dimension can be successfully processed however, the error configurations can have a dramatic effect on how the dimension and fact data is processed. This post examines the error configuration for duplicate attributes. In doing so, I address dimension design, its processing and the effect that the configuration has on fact and dimension data.

In this post, we use a simple geography model for illustration, with a single measure (value) against each leaf node. Note that the city Berlin appears in the states of Hessen and Saarland.

Structure

Data

 

The data for the model is also very simple (with both fact data and dimension in a single table). The row_key field has been added to the table to improve the dimension appearance (design appearance). Additionally, some will note that the row_key is an excellent candidate for the [city] attribute key in this scenario, however, to keep things simple, we use attribute names as keys.

Dimension Design

Default Design (no relationships)

When the dimension is created it has three attributes of [key], [State] and [City]. The default behaviour of these attributes is to relate each (non-key) attribute directly to the dimensions key as below;

Structure

Attribute Relationships

 

If a hierarchy is created so that state drills into city, the relationship view changes slightly and separates the attributes into distinctive nodes (as below). In this situation, the relationships have not changed as both [State] and [City] relate directly to the key. Note that the hierarchy provides a warning triangle () which states that attribute relationships do not exist between levels in the hierarchy and this may decrease performance.

Structure

Attribute Relationships

 

The importance of relationships is well understood and, is arguably the most important performance consideration in dimension design. Because, we expect cities to roll up to states, the relationship should be created as part of the design. This (second design) choice is for performance.

Performance Design (relationships)

When the dimension is designed with relationships (below), the warning indicator is removed from the hierarchy.

Structure

Attribute Relationships

 

 

Dimension Processing

Dimension processing involves many stages. For our purpose, we are primarily interested with the load of data (that is loading member data) and the effect that relationships have on processing.

The default design sends a query to the data source for each non key attribute in the dimension. These are highlighted in the first two queries in the trace below where the data for [city] and [state] is loaded. The final query holds the data for the dimension key and relates the key to the [state] and [city] attributes. Note that in this design, each attribute is directly related to the dimension key and therefore, the members for these attributes have a direct relation with the key, hence the third query must relate the key to both [state] and [city] attributes.

When the performance design dimension is processed (note the dimension fails processing), the queries sent for data mimic the attribute structure in the relationship design. That is;

  1. A query is sent to retrieve [State] data. This is the same as in the default design.
  2. A query is sent to retrieve the [city] and [state] data. Unlike the default design, this includes [state] because state has a relationship with city.
  3. Finally, a query is sent for [key] data and the [city] relationship. In the default design, this query also included the [state] attribute however, since [state] is related to city (and this relation is exposed in the prior query), there is no requirement for [state] data and it is excluded.

Dimension Processing Error Configuration

Under the default error configuration, the performance design dimension fails during processing with a ‘duplicate key’ error is displayed (as below).


 

This error relates to the expectation that a [city] will relate to one (and only one) [state]. The data shows that the city Berlin is found in both Hessen and Saarland.

 

In order to process the dimension, the KeyDuplicate error configuration may be changed to IgnoreError or ReportAndContinue from the default value of ReportAndStop. However, while this may permit the dimension to process, the change in error configuration can have adverse impacts on both dimension data and fact data.

 

Data Outcomes

When the performance dimension is processed by ignoring duplicate attribute errors, members which raise the error are dropped from the dimension. Not all members are dropped.

 

Berlin is not found under Hessen and the fact values do not match the expected [state] totals. Note that Hessen should sum to 50 and Saarland should sum to 70. The 20 difference relates to the 20 for Berlin which should reside in Hessen (but now resides in Saarland).

 

Correcting the Outcome

A standard way of correcting this type of error is to apply a composite key to the attribute that causes the issue. In this situation, the key of the city attribute now uses both the state and city (see below) and shows the expected structure.

 

Conclusion

While the use of error configurations may permit the processing of dimensions, the outcomes of the processing operation can have unexpected results. A more certain way of managing processing is to address the underlying issue of the error at the source or through other factors in dimension design.

 

Categories: SSAS Tags: ,

MDX – Counting Consecutive Days

November 2, 2011 Leave a comment

A recent blog post (here) questioned a method for returning the most consecutive days that a customer has brought product for. For example, given a particular customer what’s the largest continuous date span of their purchases? More realistically, a business question may be ‘what are the customers who purchase for more than 5 days in a row’?

In addition to the answer provided in the post, this blog provides a little more support to the solution. To make the data more accessible, the question is based on [Adventure Works] and considers months instead of days (ie, for a customer, what’s the most number of months they have continuously purchased for?). We are using the [Date].[Calendar] hierarchy and the [Internet Sales Amount] measure.

Approach

This approach looks at the problem in two ways. Firstly, we define a measure that can be applied to a specific period that counts the number of periods that have had (continuous) sales. Secondly, we apply this to all periods to determine the largest number of consecutive periods.

Monthly Continuous Periods

If we consider a single customer and their monthly sales (as below), the continuous sales for any month end on the first subsequent month of null sales. For example; the [July 2007] continuous periods end in [August 2007] and [October 2007] end in [December 2007]. The end of the continuous period is defined as the first null in the set which starts in the next month and on the last month.

The number of continuous periods for a month is also the number of members in the set which range from the first period until (but not including) the first (null) period.

select
[Measures].[Internet Sales Amount]
on 0
,[Date].[Calendar].[Month]
on 1
from    [Adventure Works]
where
(
   [Customer].[Customer].&[12301]
)

The end of the continuous period (or the first null member) is satisfied by the expression;

filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0
).item(0)
// note that we are only interested in the first null item (and use the .item(0) function to retireive this

The query above can be extended to show the ending month (name) of the continuous period;

with
member [Measures].[end of range member] as
filter(
{[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0
).item(0)
.member_name
select
[Measures].[Internet Sales Amount]
,[Measures].[end of range member]
}
on 0
,[Date].[Calendar].[Month]
on 1
from    [Adventure Works]
where
(
    [Customer].[Customer].&[12301]
)

Additionally, since a continuous period cannot exist when the current month has no sales, months without sales can be excluded from the calculation (they will be null).

with
member [Measures].[end of range member] as
iif ( [Measures].[Internet Sales Amount] = 0
       , null
       , filter( {[Date].[Calendar].currentmember.nextmember : null }
                   ,[Measures].[Internet Sales Amount] = 0).item(0)
                   .member_name
)

Given the end period (note that we know the start since it’s a relative period), we can specify the continuous range between the start and the end. This set includes an additional member (the last one is not really needed because its value is null), and so it can be excluded from the set. Although it can be excluded using the .prevmeber function, it can be left in the set and removed from the count. The set is simply defined within the iif statement;

with
member [Measures].[end of range member] as
iif ( [Measures].[Internet Sales Amount] = 0
, null
,         filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
 .member_name
)
// inclusive
member [Measures].[set (inclusive)] as
iif ( [Measures].[Internet Sales Amount] = 0
        , null
        , settostr(
                  [Date].[Calendar].currentmember :
filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
)                   )

// count (just reduce 1)
member [Measures].[set count] as
iif ( [Measures].[Internet Sales Amount] = 0
      , null
 ,   {
 [Date].[Calendar].currentmember :
            filter( {[Date].[Calendar].currentmember.nextmember : null }
    ,[Measures].[Internet Sales Amount] = 0).item(0)
}.count-1
      )

select
{
[Measures].[Internet Sales Amount]
,[Measures].[end of range member]
, [Measures].[set (inclusive)]
, [Measures].[set count]
on 0,

[Date].[Calendar].[Month] on 1
from    [Adventure Works]
where
(
[Customer].[Customer].&[12301]
  )

The Largest Number of Consecutive Periods

Having defined the number of consecutive periods for relative periods, we can define the maximum value for all periods using the max function. Simply specify range set (first argument of max) as the period members. Our query becomes;

with member [Measures].[Range Count] as
iif ( [Measures].[Internet Sales Amount] = 0
        , null
, {[Date].[Calendar].currentmember :
filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
}.count-1
    )
member [Measures].[Max Periods] as
max([Date].[Calendar].[Month], [Measures].[Range Count])
select [Measures].[Max Periods] on 0
from    [Adventure Works]
where
(    [Customer].[Customer].&[12301]   )

Other Considerations

The query searches for continuous periods until the last date period (regardless of year). Often, with this type of situation, comparatives are required on an annual basis and so there would be a need to alter the ‘end’ date to be the last data for the current year.

There are also some performance improvements that can be applied to the query; However, I think that this demonstrates the method rather simply and performance addressed as needed.

Categories: MDX, SSAS Tags: ,

SSAS Relationships and Granularity

October 3, 2011 Leave a comment

The creation of measure groups at varying levels of granularity is used when two related measures differ by some dimensional level. A classic example of this occurs in budget comparisons to actual reporting. In this situation, the budget data is typically stored at a higher level of granularity. For example, it may exclude product information (or be set at high level product classes) and be targeted at monthly totals. Actual data is presumed to be recorded at a detail sale level (where { date x product x store x ref# } gives { revenue & quantity sold }). This post looks at how attribute relationships determine how data is aggregated in dimensions where measure groups are related to dimensions through non key attributes.

Situation

To illustrate the significance of relationships in dimension design and the effect they have on aggregation, the examples shown in this post use a very simple cube structure. There are two measure groups (one for actual and one for budget) and only one dimension (product). We assume that actual data is recorded at the product level and budgets are set at subcategory. Note that the related attribute for the Actual measure group is [Product] and the related attribute for the budget measure group is [Sub Category] (as below);

The product dimension has similar characteristics to that of adventure works but includes only three attributes (a product, sub category and category). The natural (business) hierarchy for product is [product] à [sub category] à [category]. That is, that a group of products will role-up to a [sub category] and several [sub categories] will role-up into a [category].

In this example, we consider only two subcategories which are [Mountain Bikes] and [Forks]. [Mountain Bikes] maps to the category [Bikes] and [Forks] maps to [Components]. This can be seen in the following hierarchy browse and script;

To illustrate how the use of relationships effect aggregations, we will assume that the [Mountain Bikes] sub category has a budget of 100 and the [Forks] sub category has a budget of 200. With these values, the budget for the [Bikes] category should be 100 and [Components] should be 200.

These examples uses MDX update statements to update the underlying fact data however, this is only for simplicity. The examples in this blog will work the same had the underlying fact been updated by other means.


Defined Relationships

When relationships are correctly defined between attributes according to the business definition (below), values at the [Sub Category] attribute are correctly aggregated to the parent attribute. For example, when the [mountain bikes] subcategory is updated, the value is aggregated to the parent (Bikes). This is demonstrated in the script below and screen shots below. When [Mountain Bikes] is updated and category [Bikes] is updated by the same amount. This can be seen in the following example;

Relationship :
Update Script : /* updating script */
update [Granularity Example] set

([Product].[Product By Category].[Mountain Bikes],[Measures].[Bgt Amount])=100,
([Product].[Product By Category].[Forks],[Measures].[Bgt Amount])=200
;

Query / Result select [Measures].[Bgt Amount] on 0,
{ [Product].[Product By Category].[Bikes]
,[Product].[Product By Category].[Components]
} on 1
from [Granularity Example];

Non Defined Relationships

When relationships are not defined between product attributes, both parent categories (Bikes and Components) contain the same value. The value shown is the total for all members and can be seen below.

Relationship :
Query / Result

Clearly, without the relationship between the parent and child, the values for the [sub category] level are not aggregated to [category] as expected.

The reason for this is that a correct relationship (as it was displayed in the first example) specifies how members are aggregated to higher levels in the hierarchy. In this case, how a [category] is made up of several [sub categories]. Without an ‘upstream’ relationship to the [sub category] attribute, the [category] attribute can only show total values regardless of the member data is displayed through the hierarchy. In a previous post on aggregation usage, we showed how lower level aggregations could be used when higher levels of the relationship were required. This is exactly the same (except we are not concerned with aggregation use). Note that I have used the words relationship and hierarchy interchangeably. In this situation, we assume that a hierarchy mimics the relationship structure (business hierarchy).

Although aggregations and relationships improve performance, the problem does not present itself when the fact is joined to the key of the dimension. This is because there is an explicit relationship between the dimension key and every other attribute (hierarchy).

Conclusion

The ability to include measure groups from varying levels of aggregations is an important feature of SSAS because it permits the ability to combine data that would otherwise be disparate. However, this flexibility comes with caution because measure groups joined to dimension non-key attributes aggregates data only to the extent that relationships are defined in the dimension structure. If this structure is poorly defined, not understood by the user or not managed through perspectives, data may be misinterpreted where hierarchies are provided without supporting relationships and non-key attributes are used.

Categories: SSAS Tags:

BISM – Period To Date Calcs

August 21, 2011 7 comments

DAX has three periods to date functions which easily allow the creation of common the common measures (Month To Date, Quarter To Date and Year To Day). This post looks at period to date calculations and member sorting in the BISM tabular model.

Sorting

By default the tabular model sorts members by name. For example, clicking on the month field shows the alphabetical listing of months. Although these values are shown in the project model they are displayed in the same order when the project is deployed.The order of these members can be changed and sorted by an additional column by using the sort by column function (menu path: Column > Sort > Sort By Column).When this is done, we simply specify the sort column in the popup (below).

After sorting, members appear as expected in both the project design and deployed project.

Period to Date Calculations

Period to date can be easily created using the DAX functions TOTALYTD, TOTALMTD, TOTALQTD. Note that these functions are not dependent on sort order but on a related date field.

For example, a date field in the related DimTime table.Note that there is a primary relationship between FactResellerSales and DimTime and the inclusion of the TimeDate (which is a date data type) column in DimTime.

The general format of the function is TOTALPPP ( scalar_function , time_column) where the time_column is a primary related date field. To show YTD Values for SalesAmount we use the formula:

Sales YTD:=TOTALYTD(sum(FactResellerSales[SalesAmount]), DimTime[TimeDate])

Also note that measures (as they would be thought of in SSAS) must be added to the ‘Measure Grid’. If they are not, measures are not seen in the cube. The table FactResellerSales has columns hidden from client tools (so that no dimension members are created) and contains the measures [Sales] and [Sales YTD] (below).

Browsing the cube shows data (sorted) with YTD values as expected. Unlike SSAS, the tabular model will show the [YTD] measure defaulting to the last year when the (related) time table is not used.


MISC

The technet reference for these functions can be found at YTD , QTD , MTD .

Categories: BISM, SSAS Tags: , ,

DMX 102 – SSAS Time Series with Multiple Attributes

August 7, 2011 1 comment

This post looks at extending the time series prediction so that multiple values can be predicted. In the last post (DMX 101) only a single value was predicted and while this can be useful, the approach has obvious draw backs because there is often an expectation that different classes have different patterns. Consider seasonal fashion. We expect that seasonal products will have a distinct life-cycle and to group products (from multiple seasons) would distort a specific seasons sales.

In this post, we predict two sequences within the same model. For brevity, both series follow the function y=mx+b;

    y1 = ½ X + 20 (and)

y2 = 2x

There are two ways of dealing with multiple predictors. Firstly, we can treat each as a function value of the same sequence. In this method, we identify a value for y1 and a corresponding value for y2. In this approach, the series key is simply the x (time) value. The second method considers that only one value is being predicted (y) which is a value derived from the series key and an additional identifier which specifies the function (as either y1 or y2).

Method 1 – Multiple Y values

In this situation, our predicting data is represented as functions of (the time) series key (see right). This is represented by the SQL below.
with xs(x) as
( select convert(money,1) union  all
select x+1 from xs where x<20
)
select x, x/2+20 as y1, 2*x as y2
from xs

The model is created and trained with the following commands.

– MODEL CREATION : create single sequence key with mulitple prediction values
create mining model ts_method1
(   x long key time
, y1 double continuous predict
, y2 double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : train the model
INSERT INTO ts_method1(x, y1, y2)
OPENROWSET(‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as (select convert(money,1) union all select x+1 from xs where x<20)
select x, x/2+20 as y1, 2*x as y2 from xs’
)

Note with this method, the predicted values (function) must be explicitly identified. That is, we can predict y1 or y2 but not y1+y2. For example, we can predict y1 with;

– select flattned results for y1
select flattened predict([y1],5)
from [ts_method1]

Method 2 – Single Y Value and a Composite Key

The second approach incorporates the function (or class type) into the model key. We can see that the data for the model shows a y_value that is dependent on both the x and y_function columns.

with xs(x) as (select convert(money,1)
union all select x+1 from xs where x<20)
select x ‘y1′ as y_function , x/2+20 as y_value
from xs
union all
select x
, ‘y2′ as y_function, 2*x as y2
from xs

The model can be created and trained with the following commands;

– MODEL CREATION : create complex sequence key with single prediction values
create mining model ts_method2
 x long key time
, y_function text key
, y_value double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : Train the model
insert into ts_method2(x, y_function, y_value)
OPENROWSET (‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as ( select convert(money,1) union all select x+1 from xs where x<20)
select x, ”y1” as y_function, x/2+20 as y_value from xs
union all select x, ”y2” as y_function, 2*x as y2 from xs
order by 1
)

Additionally, it is important to note that training Time Series model requires ordered data.

Prediction from a model which includes a composite key requires that the function (class) be specified in the query. The first query (as the results on the right show) does not show what class relates to what predictions. Note that two series of data are returned

– Q1 : Prediction without class specification
select flattened Predict([y_value], 5)
from [ts_method2]


This can be overcome by showing the class in the result set (in which case all results are still returned-

– Q2 : Prediction showing class specification
select flattened [y_function], Predict([y_value], 5)
from [ts_method2]
or, by specifying the class that will be predicted.

– Q2 : Prediction specifying class
select flattened Predict([y_value], 5)
from [ts_method2]
where [y_function]=’y1′

The addition of class tags into time series models improves their useability by disseminating an aggregate sequence into components. This post has looked at the two methods for including class tags and the prediction implicatios of each.

Categories: Data Mining Tags: ,

Determining SSAS Aggregation Usage

July 30, 2011 1 comment

Aggregations improve query performance by limiting the amount of partition data that needs to be scanned to answer a query. Instead of querying the fact level data of the partition, SSAS will query the aggregation and because the amount of data in the aggregation is smaller, the query can be answered more succinctly. However these benefits come with a trade-off, they take time to build which increases processing time and space. They can also have an adverse impact on performance where there too many designs. This post shows how to determine if an aggregation is being used by using trace events.

A quick overview: When SSAS receives an MDX request, the statement is passed to the formula engine (FE). The FE requests data from the cache. If the data is in the cache (readily available) it can be returned to the FE directly, if not, it the request is passed to the storage engine (SE) which retrieves the data. Note that, if the data is available in cache, there is no request to the SE.

This raises two side points. Firstly, there is no SE requirement for data that is in cache. Cache warming may improve query performance (without the need for the aggregation). However, to warm the cache, data must still be loaded (which may be improved with an aggregation). Secondly, the aggregation will only be effective when the query is executed against cold cache (requesting non-cached data).

In these examples, we are using the reseller sales partition of adventure works. There is a simple aggregation design that included [Month Name] which is only applied to the 2003 partition. Additionally, the relationships for the [date] dimension are shown below. Note that Calendar and Fiscal hierarchies can be defined from the [Month Name] attribute.

Trace Events

The trace is run with the following events;

Event Class Event
Progress Report Progress Report End
Query Events Query Event Begin
Query Event End
Query Processing Get Data From Aggregation
Get Data From Cache

Clearing the Cache

Unless otherwise stated, each query is executed after clearing the cache. Cache can be cleared from a database, cube or partition. For simplicity, cache is cleared from the database using the command.

<ClearCache xmlns=http://schemas.microsoft.com/analysisservices/2003/engine>
<Object>
<DatabaseID>Adventure Works</DatabaseID>
</Object>
</ClearCache>

A more comprehensive list of commands which cover how to specify other objects can be found at ssas-info;

http://www.ssas-info.com/analysis-services-faq/27-mdx/133-mdx-how-do-i-clear-analysis-services-ssas-database-cache

Query 1 – Data from the Partition

A query that requires data from 2002 will read the partition (recall there is no aggregation on this parition). We can see that the text data for the progress report end specifies the partition that is used (Reseller_Sales_2002).

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

When the same query is run without clearing the cache, the progress report is replaced with the event get data from cache.

Query 2 – Data from the Aggregation
When the query is executed for 2003, the trace event includes the get data from aggregation event. Additionally the text data for the progress report shows that the Aggregation is used.

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

Although not shown in the trace, the retrieval of data from the aggregation is faster than from the partition (duration of 7ms verse 0ms).

A Quick Note about the Benefits of Relationships

The only attribute defined in the aggregation was [Month Name], yet the queries, have used [Calendar Year] and hit the aggregation where possible. This is because there are relationships defined between [Month Name] and [Calendar Year] and, the aggregation can be used to answer queries for attributes from higher levels in the tree.

We can see the path by using the aggregation manager in BIDS helper (@ http://bidshelper.codeplex.com/)
Because the relationships for [Canendar] and [Fiscal] hierarchies are derived by the attribute [Month Name], both hierarchies can utilise the aggregation.
A query against the [Fiscal] hierarchy will also use the aggreagtion design.

Conclusion

This post has shown how to determine if aggregations are used to answer queries. When applied, they reduce the overhead associated with data retrieval. However, this reduction is only to the extent that it is requested from the SE. When data is in cache, there is no requirement on the SE (and hence neither the aggregation nor partition data will be used).

Categories: SSAS Tags: ,

Moving SSAS Databases (Detach)

There are a few ways to move SSAS databases. This post looks at the detach-reattach method.

When SSAS creates a database, it stores all the files for the database in a folder under the instances data path (assuming that you haven’t specified alternate locations for any of the objects). The data folder for my instance is below; I have two versions of adventure works;

You can detach the database from the instance by simply right clicking on the database node as selecting detach. The only configuration option that you have for it is specifying a password.

Detaching the database creates a detach_log file in the database root folder (ie under ‘Adventure Works DW 2008R2.0.db’). This is needed to reattach the database. You can’t simply copy the files and then try to reattach.

The entire database directory can then be copied to a new location. There is no requirement for this to be the default SSAS (data) directory .

Re-Attaching the database is almost the reverse, just specify the Attach option and the folder name (for the database).

Categories: SSAS Tags: ,

The Last Non Empty Dilemma

July 10, 2011 7 comments

The quantity of stock on hand is a common reporting requirement for inventory systems where quantity of stock on hand at aggregate time levels is shown as the last value that was recorded. When time is aggregated, (eg the year as a parent of months), the value shown refers to the last recorded value and not the sum of months. In this post we look at the use of LastNonEmpty aggregation, and the implications that it has.

The data used in this post assumes the leaf data for the diagram above (products [a], [b], [c], [d] with moths [Oct], [Nov], [Dec]). When the November stock take is done, the year value for stock holding should display 19, and when December stock take is done the year value should show 12. Showing the year value as the sum of its children will give incorrect results because, (after Novembers stocktake) the stock holiding is not equal to October quantity plus November (45=26+19), only the last known value (Nov) 19.

Last Non Empty Aggregation

The SSAS aggregation function ‘LastNonEmpty’ can solve this issue automatically by showing the time aggregates as the last recorded value. By default aggregation is set to sum but it can be set on the properties tab for the measure (as below).

The higher levels of time now inherit the last known quantity for a given product which may seem a reasonable solution (below). However, this can cause an issue when stock values are not present for all stock items in the last period. When an stock item (sku) is excluded, the LastNonEmpty inherits the last known period value. In this situation, the LastNonEmpty can bypass the last known (or input) period and aggregate values across inconsistent periods. This is shown in the diagram below where the last recorded period is December and the yearly counts include November balances.

Furthermore, total balances are calcualted as the total for the LastNonEmpty period and so the total for the time aggregate doesnot equal the sum of its children. We can see that;

  1. [CY 2010] values at the product level inherits the last known value.
  2. [CY 2010] [Grand Total] inherits the [Grand Total] for [CY 2010 Dec] (so that 12=12) and
  3. The [Grand Total] for [CY 2010] does not equal the sum of each product ([a],[b],[c],[d]) for [CY 2010] (ie 12 is not the sum of 8+3+4+1). If we are to assume that the values for [CY 2010] for each product will aggregate to the [Grand Total] for the product, we should see 16 as the [Grand Total] for [CY 2010].

Clearly, the use of the LastNonEmpty as an aggregate function comes with some caveats.

Solution 1 (Fact Based)

One way to solve this issue is to include zero balances for inventory items where the value is zero (or would otherwise be null). This can be done by appending the a zero quantity products that are not counted (have zero balances or are null). In this case, a view which automatically derives missing data based on the theroitical space of (date x prodcut less non zero values).

As expected, the values for all products are known for a month and the time aggregate inherits the last materialised period amount.

Solution 2 (Script Based)

A different approach is to scipt a measure that satisifys the aggregation function and answers the LastNonEmpty based on the last non empty date period for all products. Leaving the existing LastNonEmpty measure inplace and using a calcualte member [Stock On Hand] to calculate time aggregates based on the [Prod Count] value. Essentially, we are looking for the [Prod Count] values on the last [date] member where any product had a value. In this context, the last date member refers to the last (maximum) date for any product that had stock.

The solution follows the following logic;

  1. What was the last date that products had value
  2. What was the value of product on this date?

The last date (that any product was sold) can be determined by the snippet below

TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1)

And the sum aggregates these values (for the last time period)

SUM(  TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1) 
 , [Measures].[Prod Count] 
 )

Other Calculation Considerations

Introducing the [Stock On Hand] calcualted member solves the aggregate time issues and [CY 2010] values will show correct (as below). The [Stock On Hand] value for [CY 2010] will show the stock as at the last recorded period ([CY 2010 Dec]).

One side-effect for this calcualtion (asis) is that preceeding periods will show values for post dated periods. For example (below), [CY 2010 Sep] values will be displayed on the basis of [CY 2010 Dec]. Clearly this is also not correct as the values for [Sep] should be null.

This behaviour can be corrected by only displaying the calcualted value when a LastNonEmpty value would have otherwise been shown. The calcuation becomes;

IIF([Measures].[Prod Count]=0  , NULL  , SUM(TAIL(FILTER([Date].[Calander Years].[Cal Month], ([Measures].[Prod Count], ROOT([Product]))<>0), 1)  , [Measures].[Prod Count]  ) 

And this satisfies all requirements

Conculsion

The use of LastNonEmpty aggregation offers a convenient way to aggregate measures such as product count across time hierarchies. However, there are some assumptions included with the use of the function that may cause output not to show as expected. This output can be controlled by either by including zero values on for products that are not counted or by scripting a calculated measure to manually aggregate across time.

 

Improving the performance of the sum:

As Chris has pointed out, the sum function will perform poorly in a large environment.  This can be greatly improved by defining the LNE over the tuple for the last date that has values.  That is, we derive the last date member (as below)

TAIL(Filter([Date].[Calander
Years].[Cal Month], (ROOT([Product]),[Measures].[Prod
Count] )> 0)

,1).ITEM(0)

Other methods of defining the last posting date may use attributes of the date dimension (see http://www.artisconsulting.com/blogs/greggalloway/Lists/Posts/Post.aspx?List=83c71d76-a4d8-4197-9257-38d6b857634f&ID=16)
per Boyans post.

The stock on hand value is then defined as the tuple of the
member and our materialised value;

MEMBER
[Measures].[Stock Count] as

(

[Measures].[Prod Count]

,[Date].[Calander Years].LAST_MONTH_VALUE

)

Categories: SSAS Tags: ,

Aggregating Member Properties

March 18, 2011 3 comments

One of the underutilised features of dimensions in SSAS is the use of member properties. These properties can be used to store information about member attributes in the dimension and add an additional layer of information to dimensional design. The property is stored against the members and exposed only to that member level. For example, a property set a customer attribute would not be visible at the state despite the two members being related. Additionally, the properties are not accessible from other dimensions.

However, it is common to want to see the aggregation of such properties by other dimensions or, the aggregation of the properties with a hierarchy of the attribute dimension. For example consider a situation where an employee dimension may have a property of annual income and we wish to show the total annual income by department.

Since the design of a dimension in the UDM can take such an approach and this choice of design may be appropriate with a SCD2, it seems like a logical query to aggregate a dimension value by reference to some other dimension. There is certainly an argument for storing this information in a measure group, but in this situation, we will examine where the dimension member has the associated property.

Using Adventure Works, we examine how to aggregate the customer property of ‘Total Children’ across another dimension (Sales Area). Our business question may take the form of what are the “Total Children” for each country?

 

Member Properties

We can retrieve the number of children that a customer has through its properties. Defining the measure MemberChildren will do this;

WITH
MEMBER Measures. MemberChildren AS [Customer].[Customer].CURRENTMEMBER.Properties( “Total Children” )

SELECT

Measures.MemberChildren ON 0,

[Customer].[Customer Geography].[Customer] ON 1

FROM [Adventure Works]

 

It would be convenient to stop there, expecting that the property could be aggregated by other dimensions. However, the property only applies to customers and is not aggregated in any customer hierarchies or across any other dimensions. This can be demonstrated by showing the members of other levels in the hierarchy a level other than Customer are displayed, there is no property to display. For example, listing the country members (in the customer dimension) will give nulls values for MemberChildren because the property is not associated with country members.


WITH

MEMBER Measures.MemberChildren AS [Customer].[Customer].CURRENTMEMBER.Properties( “Total Children” )

SELECT

{Measures.MemberChildren} ON 0,

[Customer].[Customer Geography].[Country] ON 1

FROM [Adventure Works]

Members In Country

The next question that needs to be addressed is “which customers reside in which area?” The EXISTS function returns the set of members that reside in the context of another, for example, which customers reside in which state. To list the customers the customers of France, we would write the following statement. Note that by specifying the measure group name, we restrict the context to a specific fact (table) relationship.

SELECT

{[Measures].[Internet Sales Amount]} ON 0,

EXISTS( [Customer].[Customer].[Customer]

        , [Sales Territory].[Sales Territory].[Country].&[France]

        , “Internet Customers”) ON 1

FROM [Adventure Works]

Bringing It Together

The two techniques discussed above can be used to combine member properties with the sets of members specific to some association. Our goal is to define the total children by country, so we need to sum the children for each member that exist in each country (or state etc.).

To do this, we define a measure called MemberChildren which shows a number of each customers children (this is only applicable to the leaf (customer) members). Note that since we need to use this value in a numerical context (we need to sum it), we must explicitly convert the property to a value using the STRTOVALUE function.

Next we define the measure TotalChildren which sums the MemberChildren for all customers that exist within our geographical context. That is, MemberChildren is summed for all customers that exist in the context of another member.

The following query shows how this is done.

WITH

MEMBER Measures.MemberChildren AS
STRTOVALUE([Customer].[Customer].CURRENTMEMBER.Properties( “Total Children” ))

MEMBER Measures.TotalChildren AS

SUM(    EXISTS( [Customer].[Customer].[Customer], [Sales Territory].[Sales Territory].CURRENTMEMBER, “Internet Customers”)

        , MemberChildren

    )

SELECT

{Measures.TotalChildren} ON 0,

[Sales Territory].[Sales Territory].[Country] ON 1

FROM [Adventure Works]

Conclusion

The dimension structure in SSAS is a powerful tool for managing subject area data (dimensions). The information contained in dimensions are often disregarded as the dimension members are only used to slice and dice. However the information can add valuable data to the cube through their properties which can also participate in traditional slicing and dicing.

 

Categories: SSAS Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 40 other followers