Archive

Posts Tagged ‘SSAS’

Designing Good Dimensions in SSAS

March 15, 2013 2 comments

Designing good dimensions is a balancing act between at least two factors. Our dimension design must offer the user good (and intuitive) way to navigate data and it must also perform well in queries. This post looks at achieving that balance and the dangers that assumptions can produce for dimension design and result output.

While we can consider other factors (for example processing speed and property/attribute visibility) as important in our design, I considered these factors the most important because the dimension must be used in an intuitive way by users in order to be successful and, the user experience is driven by quick results (especially in an addhoc analytical environment).

Scenario

If we examine the adventure works database and the [Direct Sales] perspective, we note that the customer dimension has a [Post Code] attribute hierarchy (single level hierarchy) and a [Customer Geography] hierarchy which also shows a level [Postal Code]. This can be seen in the diagram below;

Now consider that someone wishes to know what sales were derived in a postcode. We assume that they could use either of these hierarchies to interrogate sales.

Although this is a trivial example, I have seen this type of question present itself in production environments that have “reported” incorrect data the years. Consider a product hierarchy (which includes colour at a lower level) and an attribute hierarchy [colour] (that just shows colours).

I’ll also point out that I highlighted reported above because the business expectation of the results different from the technical. In this instance, the client thought that they were viewing all “yellow” products when in fact they were viewing only a subset of the yellow products.

The dangers of this issue can also be magnified by method that client tool is used to query cube.

So What’s the Issue?

If we consider the postcode example, we ask ourselves the question what are the sales will postcode “V9″. If this were the case, one might expect to use the [Postal Code] attribute hierarchy and just show “V9″. If created pivot in Excel and filter on V9, I get the following results.

This output is not useful, why? Because I expect to see a single value for V9 ($252K).

I could use the [Postal code] level of the [Customer Geography] dimension by creating a set but this is not that intuitive (and doesn’t change results) and is not the kind of interaction we want to give our users.

From the above results we know that there are two occurrences of the postcode (that is two different areas with the same postcode) which have a value ‘V9′. However, what I want is the total for V9!

This distinction becomes obvious when we look at the members in MDX and the [Postal Code] key definition (as below). Note that the key is a composite key (State, City, Postcode) which is also shown when we look at the MDX member key (by dragging members to the query grid in SSMS).

This type of design is perfectly acceptable when the attribute is used within a dimension hierarchy (as is the case in adventure works). I suggest that the [Postal Code] attribute should be hidden since it is available in hierarchy. Furthermore, the creation of a hierarchy (with relationships between the levels) is a SSAS design best practice. Note that the relationships are defined between these attributes.

The problem is that we have not been able to answer a question about the sales for [Postal Code] V9! What we really need is another attribute that only shows distinct postcode values!

Dangerous Clients

Exposing the attribute hierarchy in this way can be a very dangerous situation, especially when the client uses a name in the generation of MDX. For example consider the first query which shows all members that have a postcode of V9. Both members are shown as we would expect;

select [Measures].[Internet Sales Amount] on 0,

filter(

[Customer].[Postal Code].members

, [Customer].[Postal Code].currentmember.member_caption = ‘V9′

)

on 1

from [Direct Sales]

We could aggregate these results to show the combined result (which is the answer we want);

with
member

[Customer].[Postal Code].[MyAgg] as

aggregate(

            filter([Customer].[Postal Code].members

                    , [Customer].[Postal Code].currentmember.member_caption = ‘V9′

                    )

        )

select     [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[MyAgg] on 1

from [Direct Sales]

Now consider the situation where the client simply references the member name [V9] in MDX. This is shown below. Note that only the first member is returned which we would naïvely assumed to be the sales for the postcode.

select [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[V9] on 1

from [Direct Sales]

Conclusion

It is easy to see how useability is an important consideration for attributes. What often surprises most though, are the dangers of clients which purport to show a member value (say V9) and only return 1 member.

Measure Selection in SSRS and SSAS

November 26, 2012 1 comment

Pivoting around the measure improves report usability by adding an additional dimension that can be utilised in reports. A common example of this is a trend report that allows the user to select the measure that they wish to see. There are a few ways to accomplish this in your solution in this post looks at some of them. The methods examined are as a static parameter within the report definition, as a flexible parameter within the report definition, and finally, as a utility dimension.

In this example, we will look at pivoting a monthly trend report around two measures (actual and budget). The output for our reports will look like a screenshot below. Notice that the measure is selectable by the user.

In these reports we have created a measure parameter (named @measure_name) with two possible values as follows;

Parameter Value Parameter Label
=“[Measures].[Native Amt ACT]“ Native Amt ACT
=“[Measures].[Native Amt BGT]“ Native Amt BGT

Note that the value of the parameter should be included as an expression(otherwise an error will occur).

Static Parameters

Static Parameters transform pivot the data in the data set result. All possible measures I return by the query, and the dataset determines the pivotal value based on a formula. Therefore, in the base query looks like the one below, notice that both [Native Amt ACT] and [Native Amt BGT] are returned;

with
member measures.month_key as
[Dates].[Fin Years].currentmember.member_key
select
{
measures.month_key
,[Measures].[Native Amt ACT]
, [Measures].[Native Amt BGT]
} on 0,
strtomember(@year).children
on 1
from ….

Our data set includes an additional, added field that determines what measure to show (as an iif a function)

=iif(Parameters!measure_name.Label=“Native Amt ACT”
, Fields!Native_Amt_ACT.Value
,Fields!Native_Amt_BGT.Value
   )

In order to have the data set fields visible in the expression editor of the query must be saved and then reopened. Of course our report pivots around the Measure_Val amount.

Flexible Parameters

Flexible parameters are transformed within the query, that is, the parameter is defined as a member (measure) within the query. That is, at query now converts the parameter value @measure_name to a measure which is defined in the query.

with
member measures.month_key as
[Dates].[Fin Years].currentmember.member_key
member measures.measure_val as
strtomember(@measure_name)
select
{
measures.month_key
, Measures.measure_val
} on 0,
strtomember(@year).children
on 1
from
The @measure_name parameter must be included in the query parameters (as below);

Utility Dimensions

Utility dimensions are probably best known for their use in time intelligence and the by-passing the SSAS wizard for time intelligence (see here). Generally, unsophisticated users find individual measures relatively easy to use in their reports or pivots because it allows them to select a measure without thinking about the ‘time dimension’. You can simply stack measures and create calculations by referring to individual measures rather than the tuples created by a measure and a dimension combination.

A measure utility dimension can be created in the following manner;

1. Add a table to the database (or view with data) to produce the desired characteristics (ie, those measure names that we want to allow users to select). As an advanced option, we can also include additional properties to allow the MDX query to return a subset of dimension members.

2. Add the table to the DSV

3. Create a dimension based on the table so that the measure_name field appear as captions for members. There may be a logical case for setting the ‘IsAggregatable’ property to false and set a default member however, there is no need to do so because the user will select the member (or measure value) that they require and the default value for the Measure_Val measure is null.

4. Add the measure (Measure_Val) to cube and scope the intersects for selected [Reporting Measures] definitions.

create
member  currentcube.Measures.[Measure_Val] as null
, visible=0;
scope(Measures.[ Measure_Val]);
scope([Reporting Measures].[Measure Name].&[Actual TM]);
 this = [Measures].[Native Amt ACT];
end scope;
scope( [Reporting Measures].[Measure Name].&[Budget TM]);
 this=[Measures].[Native Amt BGT];
end scope;
end scope;

Now in the report is created, the [Reporting Measures] dimension can be used as a parameter in the report. In fact, we can specify the member values as the code for the ‘measure’ parameter selection. In this case, our @measure_name would have the available value set to query, on the query would return the members of the [Reporting Measures] dimension

with

member [measures].[measure_label] as

[Reporting Measures].[Measure Name].member_caption

member [measures].[measure_value] as
[Reporting Measures].[Measure Name].member_unique_name
select
{
[measures].[measure_label]
, [measures].[measure_value]
} on 0,
[Reporting Measures].[Measure Name].[Measure Name] on 1
From …

The report data set uses the [reporting measures] dimension is a parameter in the query is changed to use the dimension as below. Note that in this situation the Measures.Measure_Val member is a measure within the cube and not a definition within the MDX.

with
member measures.month_key as
[Dates].[Fin Years].currentmember.member_key
select
{
measures.month_key
, Measures.measure_val
} on 0,
strtomember(@year).children
on 1
from
where
( strtomember(@measure_name)
, …
)

The method of creating utility dimension is quite flexible for the SSRS query designer because all values and dimensions can be selected from the within the SSRS query designer without the need to write MDX.

Conclusion

There are a number of options for allowing the user to select measures their reports. The method chosen may depend on the uses ability right custom MDX and the need to duplicate measure selection throughout the reports (or the reporting environment). Additionally the method chosen may depend on the rate of change for the ‘reporting measure’ selection . When utility dimension is used, production versions the cube can easily hide the measure [measure_value] and [Reporting Measure] dimension in this may remove confusion the cube is browsed ad hoc manner.

Categories: MDX, SSAS Tags: , ,

Accessing the Slicer through VBA

February 24, 2012 28 comments

There may be times when we want to programmatically control slicers through VBA. For example, we may want to default a date to the current date or set a cost centre depending on who has opened the book. This post looks how a slicer can be controlled through VBA.

In this example, we have added a slicer to a worksheet that uses a date hierarchy as its source. Because, we have included all levels of the hierarchy when the slicer was setup, we get three individual slicers for each level of the hierarchy.

If we look at the settings for the slicer (right click on the slicer and select slicer settings), we can see that the slicer has a name and each level of the slicer hierarchy maintains the hierarchy level name. For example, the Slicer_Dates_Hie below has a level Year, Month and Day. Although we can change the name (for example the name Year in the picture below), the slicer retains the mdx level that the slicer belongs to.

Accessing the Slicer

We can access the slicer through the SlicerCaches object. This is as simple as declaring the slicer cache and referencing it to the name of the slicer we want to use. For example;

Dim sC As SlicerCache
Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)

Navigating the Structure of the Slicer

Once we have a reference to the slicer we can navigate its structure using SlicerCacheLevels. For example we can determine the number of levels of the slicer and iterate over them with the following code.

Dim sC As SlicerCache
Dim sL As SlicerCacheLevel
Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)

For Each sL In sC.SlicerCacheLevels
Debug.Print “Level ” + CStr(sL.Ordinal) + ” –> ” + sL.Name
Next sL

Naturally, the level can be accessed through the cache level ordinal to produce the same result. The highest level (year) takes the value 1 which increments for each level from the first level. There is always a level (ie level 1) even if the slicer is based on a single attribute.

Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)
For i = 1 To sC.SlicerCacheLevels.Count
Debug.Print “Level ” + CStr(i) + ” –> ” + sC.SlicerCacheLevels(i).Name
Next i

Slicer Data Members

We can gain access to the data items through slicer items, as mdx attributes, they have a caption, value and a key (member unique name). For example the year 2011 in this slicer has a value of 2011 and a name (MDX unique name) of [Dates].[Dates Hie].[Year].&[2011]

Dim sC As SlicerCache
Dim SL As SlicerCacheLevel
Dim sI As SlicerItem

Set sC = ActiveWorkbook.SlicerCaches(“Slicer_Dates_Hie”)
Set SL = sC.SlicerCacheLevels(1)
Debug.Print “——————————————————————————”

For Each sI In SL.SlicerItems
Debug.Print “Caption –> ” & sI.Caption
Debug.Print “Value –> ” + CStr(sI.Value)
Debug.Print “Unique Name –> ” + sI.Name
Debug.Print “——————————————————————————”

Next

Setting the Slicer Value

Slicer item selection must be set through the visible slicer items list and is specified using an array. For example, we could set the SlicerCache (selected items) to 2011 and 2012 with the following code;

sC.VisibleSlicerItemsList = Array(“[Dates].[Dates Hie].[Year].&[2011]“, “[Dates].[Dates Hie].[Year].&[2012]“)

The name selected must be a data member of the level. If not a runtime error will occur (as below)

Once the values are set, connected pivots are updated immediately

Member Iteration

Members can be easily iterated using the following code;

Dim sC As SlicerCache 
Dim SL As SlicerCacheLevel 
Dim sI As SlicerItem
Set sC = ActiveWorkbook.SlicerCaches("Slicer_Dates_Hie") 
Set SL = sC.SlicerCacheLevels(2)
For Each sI In SL.SlicerItems    
 sC.VisibleSlicerItemsList = Array(sI.Name) 
Next

Conclusion

The control of slicers through VBA could be used to provide some very nice personalisation to work books.

NB:  If you liked this post you might be interested in this one.  In it, I discuss setting slicers through cell association to pivot rows.

Categories: SSAS Tags: , ,

SSAS Dimension Error Processing & Duplicate Attributes

December 8, 2011 5 comments

SSAS error configurations allow for some flexible arrangements in object processing. Naively, these error configurations may be changed so that a dimension can be successfully processed however, the error configurations can have a dramatic effect on how the dimension and fact data is processed. This post examines the error configuration for duplicate attributes. In doing so, I address dimension design, its processing and the effect that the configuration has on fact and dimension data.

In this post, we use a simple geography model for illustration, with a single measure (value) against each leaf node. Note that the city Berlin appears in the states of Hessen and Saarland.

Structure

Data

 

The data for the model is also very simple (with both fact data and dimension in a single table). The row_key field has been added to the table to improve the dimension appearance (design appearance). Additionally, some will note that the row_key is an excellent candidate for the [city] attribute key in this scenario, however, to keep things simple, we use attribute names as keys.

Dimension Design

Default Design (no relationships)

When the dimension is created it has three attributes of [key], [State] and [City]. The default behaviour of these attributes is to relate each (non-key) attribute directly to the dimensions key as below;

Structure

Attribute Relationships

 

If a hierarchy is created so that state drills into city, the relationship view changes slightly and separates the attributes into distinctive nodes (as below). In this situation, the relationships have not changed as both [State] and [City] relate directly to the key. Note that the hierarchy provides a warning triangle () which states that attribute relationships do not exist between levels in the hierarchy and this may decrease performance.

Structure

Attribute Relationships

 

The importance of relationships is well understood and, is arguably the most important performance consideration in dimension design. Because, we expect cities to roll up to states, the relationship should be created as part of the design. This (second design) choice is for performance.

Performance Design (relationships)

When the dimension is designed with relationships (below), the warning indicator is removed from the hierarchy.

Structure

Attribute Relationships

 

 

Dimension Processing

Dimension processing involves many stages. For our purpose, we are primarily interested with the load of data (that is loading member data) and the effect that relationships have on processing.

The default design sends a query to the data source for each non key attribute in the dimension. These are highlighted in the first two queries in the trace below where the data for [city] and [state] is loaded. The final query holds the data for the dimension key and relates the key to the [state] and [city] attributes. Note that in this design, each attribute is directly related to the dimension key and therefore, the members for these attributes have a direct relation with the key, hence the third query must relate the key to both [state] and [city] attributes.

When the performance design dimension is processed (note the dimension fails processing), the queries sent for data mimic the attribute structure in the relationship design. That is;

  1. A query is sent to retrieve [State] data. This is the same as in the default design.
  2. A query is sent to retrieve the [city] and [state] data. Unlike the default design, this includes [state] because state has a relationship with city.
  3. Finally, a query is sent for [key] data and the [city] relationship. In the default design, this query also included the [state] attribute however, since [state] is related to city (and this relation is exposed in the prior query), there is no requirement for [state] data and it is excluded.

Dimension Processing Error Configuration

Under the default error configuration, the performance design dimension fails during processing with a ‘duplicate key’ error is displayed (as below).


 

This error relates to the expectation that a [city] will relate to one (and only one) [state]. The data shows that the city Berlin is found in both Hessen and Saarland.

 

In order to process the dimension, the KeyDuplicate error configuration may be changed to IgnoreError or ReportAndContinue from the default value of ReportAndStop. However, while this may permit the dimension to process, the change in error configuration can have adverse impacts on both dimension data and fact data.

 

Data Outcomes

When the performance dimension is processed by ignoring duplicate attribute errors, members which raise the error are dropped from the dimension. Not all members are dropped.

 

Berlin is not found under Hessen and the fact values do not match the expected [state] totals. Note that Hessen should sum to 50 and Saarland should sum to 70. The 20 difference relates to the 20 for Berlin which should reside in Hessen (but now resides in Saarland).

 

Correcting the Outcome

A standard way of correcting this type of error is to apply a composite key to the attribute that causes the issue. In this situation, the key of the city attribute now uses both the state and city (see below) and shows the expected structure.

 

Conclusion

While the use of error configurations may permit the processing of dimensions, the outcomes of the processing operation can have unexpected results. A more certain way of managing processing is to address the underlying issue of the error at the source or through other factors in dimension design.

 

Categories: SSAS Tags: ,

MDX – Counting Consecutive Days

November 2, 2011 Leave a comment

A recent blog post (here) questioned a method for returning the most consecutive days that a customer has brought product for. For example, given a particular customer what’s the largest continuous date span of their purchases? More realistically, a business question may be ‘what are the customers who purchase for more than 5 days in a row’?

In addition to the answer provided in the post, this blog provides a little more support to the solution. To make the data more accessible, the question is based on [Adventure Works] and considers months instead of days (ie, for a customer, what’s the most number of months they have continuously purchased for?). We are using the [Date].[Calendar] hierarchy and the [Internet Sales Amount] measure.

Approach

This approach looks at the problem in two ways. Firstly, we define a measure that can be applied to a specific period that counts the number of periods that have had (continuous) sales. Secondly, we apply this to all periods to determine the largest number of consecutive periods.

Monthly Continuous Periods

If we consider a single customer and their monthly sales (as below), the continuous sales for any month end on the first subsequent month of null sales. For example; the [July 2007] continuous periods end in [August 2007] and [October 2007] end in [December 2007]. The end of the continuous period is defined as the first null in the set which starts in the next month and on the last month.

The number of continuous periods for a month is also the number of members in the set which range from the first period until (but not including) the first (null) period.

select
[Measures].[Internet Sales Amount]
on 0
,[Date].[Calendar].[Month]
on 1
from    [Adventure Works]
where
(
   [Customer].[Customer].&[12301]
)

The end of the continuous period (or the first null member) is satisfied by the expression;

filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0
).item(0)
// note that we are only interested in the first null item (and use the .item(0) function to retireive this

The query above can be extended to show the ending month (name) of the continuous period;

with
member [Measures].[end of range member] as
filter(
{[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0
).item(0)
.member_name
select
[Measures].[Internet Sales Amount]
,[Measures].[end of range member]
}
on 0
,[Date].[Calendar].[Month]
on 1
from    [Adventure Works]
where
(
    [Customer].[Customer].&[12301]
)

Additionally, since a continuous period cannot exist when the current month has no sales, months without sales can be excluded from the calculation (they will be null).

with
member [Measures].[end of range member] as
iif ( [Measures].[Internet Sales Amount] = 0
       , null
       , filter( {[Date].[Calendar].currentmember.nextmember : null }
                   ,[Measures].[Internet Sales Amount] = 0).item(0)
                   .member_name
)

Given the end period (note that we know the start since it’s a relative period), we can specify the continuous range between the start and the end. This set includes an additional member (the last one is not really needed because its value is null), and so it can be excluded from the set. Although it can be excluded using the .prevmeber function, it can be left in the set and removed from the count. The set is simply defined within the iif statement;

with
member [Measures].[end of range member] as
iif ( [Measures].[Internet Sales Amount] = 0
, null
,         filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
 .member_name
)
// inclusive
member [Measures].[set (inclusive)] as
iif ( [Measures].[Internet Sales Amount] = 0
        , null
        , settostr(
                  [Date].[Calendar].currentmember :
filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
)                   )

// count (just reduce 1)
member [Measures].[set count] as
iif ( [Measures].[Internet Sales Amount] = 0
      , null
 ,   {
 [Date].[Calendar].currentmember :
            filter( {[Date].[Calendar].currentmember.nextmember : null }
    ,[Measures].[Internet Sales Amount] = 0).item(0)
}.count-1
      )

select
{
[Measures].[Internet Sales Amount]
,[Measures].[end of range member]
, [Measures].[set (inclusive)]
, [Measures].[set count]
on 0,

[Date].[Calendar].[Month] on 1
from    [Adventure Works]
where
(
[Customer].[Customer].&[12301]
  )

The Largest Number of Consecutive Periods

Having defined the number of consecutive periods for relative periods, we can define the maximum value for all periods using the max function. Simply specify range set (first argument of max) as the period members. Our query becomes;

with member [Measures].[Range Count] as
iif ( [Measures].[Internet Sales Amount] = 0
        , null
, {[Date].[Calendar].currentmember :
filter( {[Date].[Calendar].currentmember.nextmember : null }
,[Measures].[Internet Sales Amount] = 0).item(0)
}.count-1
    )
member [Measures].[Max Periods] as
max([Date].[Calendar].[Month], [Measures].[Range Count])
select [Measures].[Max Periods] on 0
from    [Adventure Works]
where
(    [Customer].[Customer].&[12301]   )

Other Considerations

The query searches for continuous periods until the last date period (regardless of year). Often, with this type of situation, comparatives are required on an annual basis and so there would be a need to alter the ‘end’ date to be the last data for the current year.

There are also some performance improvements that can be applied to the query; However, I think that this demonstrates the method rather simply and performance addressed as needed.

Categories: MDX, SSAS Tags: ,

SSAS Relationships and Granularity

October 3, 2011 1 comment

The creation of measure groups at varying levels of granularity is used when two related measures differ by some dimensional level. A classic example of this occurs in budget comparisons to actual reporting. In this situation, the budget data is typically stored at a higher level of granularity. For example, it may exclude product information (or be set at high level product classes) and be targeted at monthly totals. Actual data is presumed to be recorded at a detail sale level (where { date x product x store x ref# } gives { revenue & quantity sold }). This post looks at how attribute relationships determine how data is aggregated in dimensions where measure groups are related to dimensions through non key attributes.

Situation

To illustrate the significance of relationships in dimension design and the effect they have on aggregation, the examples shown in this post use a very simple cube structure. There are two measure groups (one for actual and one for budget) and only one dimension (product). We assume that actual data is recorded at the product level and budgets are set at subcategory. Note that the related attribute for the Actual measure group is [Product] and the related attribute for the budget measure group is [Sub Category] (as below);

The product dimension has similar characteristics to that of adventure works but includes only three attributes (a product, sub category and category). The natural (business) hierarchy for product is [product] à [sub category] à [category]. That is, that a group of products will role-up to a [sub category] and several [sub categories] will role-up into a [category].

In this example, we consider only two subcategories which are [Mountain Bikes] and [Forks]. [Mountain Bikes] maps to the category [Bikes] and [Forks] maps to [Components]. This can be seen in the following hierarchy browse and script;

To illustrate how the use of relationships effect aggregations, we will assume that the [Mountain Bikes] sub category has a budget of 100 and the [Forks] sub category has a budget of 200. With these values, the budget for the [Bikes] category should be 100 and [Components] should be 200.

These examples uses MDX update statements to update the underlying fact data however, this is only for simplicity. The examples in this blog will work the same had the underlying fact been updated by other means.


Defined Relationships

When relationships are correctly defined between attributes according to the business definition (below), values at the [Sub Category] attribute are correctly aggregated to the parent attribute. For example, when the [mountain bikes] subcategory is updated, the value is aggregated to the parent (Bikes). This is demonstrated in the script below and screen shots below. When [Mountain Bikes] is updated and category [Bikes] is updated by the same amount. This can be seen in the following example;

Relationship :
Update Script : /* updating script */
update [Granularity Example] set

([Product].[Product By Category].[Mountain Bikes],[Measures].[Bgt Amount])=100,
([Product].[Product By Category].[Forks],[Measures].[Bgt Amount])=200
;

Query / Result select [Measures].[Bgt Amount] on 0,
{ [Product].[Product By Category].[Bikes]
,[Product].[Product By Category].[Components]
} on 1
from [Granularity Example];

Non Defined Relationships

When relationships are not defined between product attributes, both parent categories (Bikes and Components) contain the same value. The value shown is the total for all members and can be seen below.

Relationship :
Query / Result

Clearly, without the relationship between the parent and child, the values for the [sub category] level are not aggregated to [category] as expected.

The reason for this is that a correct relationship (as it was displayed in the first example) specifies how members are aggregated to higher levels in the hierarchy. In this case, how a [category] is made up of several [sub categories]. Without an ‘upstream’ relationship to the [sub category] attribute, the [category] attribute can only show total values regardless of the member data is displayed through the hierarchy. In a previous post on aggregation usage, we showed how lower level aggregations could be used when higher levels of the relationship were required. This is exactly the same (except we are not concerned with aggregation use). Note that I have used the words relationship and hierarchy interchangeably. In this situation, we assume that a hierarchy mimics the relationship structure (business hierarchy).

Although aggregations and relationships improve performance, the problem does not present itself when the fact is joined to the key of the dimension. This is because there is an explicit relationship between the dimension key and every other attribute (hierarchy).

Conclusion

The ability to include measure groups from varying levels of aggregations is an important feature of SSAS because it permits the ability to combine data that would otherwise be disparate. However, this flexibility comes with caution because measure groups joined to dimension non-key attributes aggregates data only to the extent that relationships are defined in the dimension structure. If this structure is poorly defined, not understood by the user or not managed through perspectives, data may be misinterpreted where hierarchies are provided without supporting relationships and non-key attributes are used.

Categories: SSAS Tags:

BISM – Period To Date Calcs

August 21, 2011 7 comments

DAX has three periods to date functions which easily allow the creation of common the common measures (Month To Date, Quarter To Date and Year To Day). This post looks at period to date calculations and member sorting in the BISM tabular model.

Sorting

By default the tabular model sorts members by name. For example, clicking on the month field shows the alphabetical listing of months. Although these values are shown in the project model they are displayed in the same order when the project is deployed.The order of these members can be changed and sorted by an additional column by using the sort by column function (menu path: Column > Sort > Sort By Column).When this is done, we simply specify the sort column in the popup (below).

After sorting, members appear as expected in both the project design and deployed project.

Period to Date Calculations

Period to date can be easily created using the DAX functions TOTALYTD, TOTALMTD, TOTALQTD. Note that these functions are not dependent on sort order but on a related date field.

For example, a date field in the related DimTime table.Note that there is a primary relationship between FactResellerSales and DimTime and the inclusion of the TimeDate (which is a date data type) column in DimTime.

The general format of the function is TOTALPPP ( scalar_function , time_column) where the time_column is a primary related date field. To show YTD Values for SalesAmount we use the formula:

Sales YTD:=TOTALYTD(sum(FactResellerSales[SalesAmount]), DimTime[TimeDate])

Also note that measures (as they would be thought of in SSAS) must be added to the ‘Measure Grid’. If they are not, measures are not seen in the cube. The table FactResellerSales has columns hidden from client tools (so that no dimension members are created) and contains the measures [Sales] and [Sales YTD] (below).

Browsing the cube shows data (sorted) with YTD values as expected. Unlike SSAS, the tabular model will show the [YTD] measure defaulting to the last year when the (related) time table is not used.


MISC

The technet reference for these functions can be found at YTD , QTD , MTD .

Categories: BISM, SSAS Tags: , ,

DMX 102 – SSAS Time Series with Multiple Attributes

August 7, 2011 1 comment

This post looks at extending the time series prediction so that multiple values can be predicted. In the last post (DMX 101) only a single value was predicted and while this can be useful, the approach has obvious draw backs because there is often an expectation that different classes have different patterns. Consider seasonal fashion. We expect that seasonal products will have a distinct life-cycle and to group products (from multiple seasons) would distort a specific seasons sales.

In this post, we predict two sequences within the same model. For brevity, both series follow the function y=mx+b;

    y1 = ½ X + 20 (and)

y2 = 2x

There are two ways of dealing with multiple predictors. Firstly, we can treat each as a function value of the same sequence. In this method, we identify a value for y1 and a corresponding value for y2. In this approach, the series key is simply the x (time) value. The second method considers that only one value is being predicted (y) which is a value derived from the series key and an additional identifier which specifies the function (as either y1 or y2).

Method 1 – Multiple Y values

In this situation, our predicting data is represented as functions of (the time) series key (see right). This is represented by the SQL below.
with xs(x) as
( select convert(money,1) union  all
select x+1 from xs where x<20
)
select x, x/2+20 as y1, 2*x as y2
from xs

The model is created and trained with the following commands.

– MODEL CREATION : create single sequence key with mulitple prediction values
create mining model ts_method1
(   x long key time
, y1 double continuous predict
, y2 double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : train the model
INSERT INTO ts_method1(x, y1, y2)
OPENROWSET(‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as (select convert(money,1) union all select x+1 from xs where x<20)
select x, x/2+20 as y1, 2*x as y2 from xs’
)

Note with this method, the predicted values (function) must be explicitly identified. That is, we can predict y1 or y2 but not y1+y2. For example, we can predict y1 with;

– select flattned results for y1
select flattened predict([y1],5)
from [ts_method1]

Method 2 – Single Y Value and a Composite Key

The second approach incorporates the function (or class type) into the model key. We can see that the data for the model shows a y_value that is dependent on both the x and y_function columns.

with xs(x) as (select convert(money,1)
union all select x+1 from xs where x<20)
select x ‘y1′ as y_function , x/2+20 as y_value
from xs
union all
select x
, ‘y2′ as y_function, 2*x as y2
from xs

The model can be created and trained with the following commands;

– MODEL CREATION : create complex sequence key with single prediction values
create mining model ts_method2
 x long key time
, y_function text key
, y_value double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : Train the model
insert into ts_method2(x, y_function, y_value)
OPENROWSET (‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as ( select convert(money,1) union all select x+1 from xs where x<20)
select x, ”y1” as y_function, x/2+20 as y_value from xs
union all select x, ”y2” as y_function, 2*x as y2 from xs
order by 1
)

Additionally, it is important to note that training Time Series model requires ordered data.

Prediction from a model which includes a composite key requires that the function (class) be specified in the query. The first query (as the results on the right show) does not show what class relates to what predictions. Note that two series of data are returned

– Q1 : Prediction without class specification
select flattened Predict([y_value], 5)
from [ts_method2]


This can be overcome by showing the class in the result set (in which case all results are still returned-

– Q2 : Prediction showing class specification
select flattened [y_function], Predict([y_value], 5)
from [ts_method2]
or, by specifying the class that will be predicted.

– Q2 : Prediction specifying class
select flattened Predict([y_value], 5)
from [ts_method2]
where [y_function]=’y1′

The addition of class tags into time series models improves their useability by disseminating an aggregate sequence into components. This post has looked at the two methods for including class tags and the prediction implicatios of each.

Categories: Data Mining Tags: ,

Determining SSAS Aggregation Usage

July 30, 2011 3 comments

Aggregations improve query performance by limiting the amount of partition data that needs to be scanned to answer a query. Instead of querying the fact level data of the partition, SSAS will query the aggregation and because the amount of data in the aggregation is smaller, the query can be answered more succinctly. However these benefits come with a trade-off, they take time to build which increases processing time and space. They can also have an adverse impact on performance where there too many designs. This post shows how to determine if an aggregation is being used by using trace events.

A quick overview: When SSAS receives an MDX request, the statement is passed to the formula engine (FE). The FE requests data from the cache. If the data is in the cache (readily available) it can be returned to the FE directly, if not, it the request is passed to the storage engine (SE) which retrieves the data. Note that, if the data is available in cache, there is no request to the SE.

This raises two side points. Firstly, there is no SE requirement for data that is in cache. Cache warming may improve query performance (without the need for the aggregation). However, to warm the cache, data must still be loaded (which may be improved with an aggregation). Secondly, the aggregation will only be effective when the query is executed against cold cache (requesting non-cached data).

In these examples, we are using the reseller sales partition of adventure works. There is a simple aggregation design that included [Month Name] which is only applied to the 2003 partition. Additionally, the relationships for the [date] dimension are shown below. Note that Calendar and Fiscal hierarchies can be defined from the [Month Name] attribute.

Trace Events

The trace is run with the following events;

Event Class Event
Progress Report Progress Report End
Query Events Query Event Begin
Query Event End
Query Processing Get Data From Aggregation
Get Data From Cache

Clearing the Cache

Unless otherwise stated, each query is executed after clearing the cache. Cache can be cleared from a database, cube or partition. For simplicity, cache is cleared from the database using the command.

<ClearCache xmlns=http://schemas.microsoft.com/analysisservices/2003/engine>
<Object>
<DatabaseID>Adventure Works</DatabaseID>
</Object>
</ClearCache>

A more comprehensive list of commands which cover how to specify other objects can be found at ssas-info;

http://www.ssas-info.com/analysis-services-faq/27-mdx/133-mdx-how-do-i-clear-analysis-services-ssas-database-cache

Query 1 – Data from the Partition

A query that requires data from 2002 will read the partition (recall there is no aggregation on this parition). We can see that the text data for the progress report end specifies the partition that is used (Reseller_Sales_2002).

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

When the same query is run without clearing the cache, the progress report is replaced with the event get data from cache.

Query 2 – Data from the Aggregation
When the query is executed for 2003, the trace event includes the get data from aggregation event. Additionally the text data for the progress report shows that the Aggregation is used.

select [Measures].[Reseller Sales Amount] on 0,
[Date].[Calendar].[Calendar Year].&[2002] on 1
from [Adventure Works]

Although not shown in the trace, the retrieval of data from the aggregation is faster than from the partition (duration of 7ms verse 0ms).

A Quick Note about the Benefits of Relationships

The only attribute defined in the aggregation was [Month Name], yet the queries, have used [Calendar Year] and hit the aggregation where possible. This is because there are relationships defined between [Month Name] and [Calendar Year] and, the aggregation can be used to answer queries for attributes from higher levels in the tree.

We can see the path by using the aggregation manager in BIDS helper (@ http://bidshelper.codeplex.com/)
Because the relationships for [Canendar] and [Fiscal] hierarchies are derived by the attribute [Month Name], both hierarchies can utilise the aggregation.
A query against the [Fiscal] hierarchy will also use the aggreagtion design.

Conclusion

This post has shown how to determine if aggregations are used to answer queries. When applied, they reduce the overhead associated with data retrieval. However, this reduction is only to the extent that it is requested from the SE. When data is in cache, there is no requirement on the SE (and hence neither the aggregation nor partition data will be used).

Categories: SSAS Tags: ,

Moving SSAS Databases (Detach)

July 25, 2011 2 comments

There are a few ways to move SSAS databases. This post looks at the detach-reattach method.

When SSAS creates a database, it stores all the files for the database in a folder under the instances data path (assuming that you haven’t specified alternate locations for any of the objects). The data folder for my instance is below; I have two versions of adventure works;

You can detach the database from the instance by simply right clicking on the database node as selecting detach. The only configuration option that you have for it is specifying a password.

Detaching the database creates a detach_log file in the database root folder (ie under ‘Adventure Works DW 2008R2.0.db’). This is needed to reattach the database. You can’t simply copy the files and then try to reattach.

The entire database directory can then be copied to a new location. There is no requirement for this to be the default SSAS (data) directory .

Re-Attaching the database is almost the reverse, just specify the Attach option and the folder name (for the database).

Categories: SSAS Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 228 other followers