Designing Good Dimensions in SSAS


Designing good dimensions is a balancing act between at least two factors. Our dimension design must offer the user good (and intuitive) way to navigate data and it must also perform well in queries. This post looks at achieving that balance and the dangers that assumptions can produce for dimension design and result output.

While we can consider other factors (for example processing speed and property/attribute visibility) as important in our design, I considered these factors the most important because the dimension must be used in an intuitive way by users in order to be successful and, the user experience is driven by quick results (especially in an addhoc analytical environment).

Scenario

If we examine the adventure works database and the [Direct Sales] perspective, we note that the customer dimension has a [Post Code] attribute hierarchy (single level hierarchy) and a [Customer Geography] hierarchy which also shows a level [Postal Code]. This can be seen in the diagram below;

Now consider that someone wishes to know what sales were derived in a postcode. We assume that they could use either of these hierarchies to interrogate sales.

Although this is a trivial example, I have seen this type of question present itself in production environments that have “reported” incorrect data the years. Consider a product hierarchy (which includes colour at a lower level) and an attribute hierarchy [colour] (that just shows colours).

I’ll also point out that I highlighted reported above because the business expectation of the results different from the technical. In this instance, the client thought that they were viewing all “yellow” products when in fact they were viewing only a subset of the yellow products.

The dangers of this issue can also be magnified by method that client tool is used to query cube.

So What’s the Issue?

If we consider the postcode example, we ask ourselves the question what are the sales will postcode “V9″. If this were the case, one might expect to use the [Postal Code] attribute hierarchy and just show “V9″. If created pivot in Excel and filter on V9, I get the following results.

This output is not useful, why? Because I expect to see a single value for V9 ($252K).

I could use the [Postal code] level of the [Customer Geography] dimension by creating a set but this is not that intuitive (and doesn’t change results) and is not the kind of interaction we want to give our users.

From the above results we know that there are two occurrences of the postcode (that is two different areas with the same postcode) which have a value ‘V9′. However, what I want is the total for V9!

This distinction becomes obvious when we look at the members in MDX and the [Postal Code] key definition (as below). Note that the key is a composite key (State, City, Postcode) which is also shown when we look at the MDX member key (by dragging members to the query grid in SSMS).

This type of design is perfectly acceptable when the attribute is used within a dimension hierarchy (as is the case in adventure works). I suggest that the [Postal Code] attribute should be hidden since it is available in hierarchy. Furthermore, the creation of a hierarchy (with relationships between the levels) is a SSAS design best practice. Note that the relationships are defined between these attributes.

The problem is that we have not been able to answer a question about the sales for [Postal Code] V9! What we really need is another attribute that only shows distinct postcode values!

Dangerous Clients

Exposing the attribute hierarchy in this way can be a very dangerous situation, especially when the client uses a name in the generation of MDX. For example consider the first query which shows all members that have a postcode of V9. Both members are shown as we would expect;

select [Measures].[Internet Sales Amount] on 0,

filter(

[Customer].[Postal Code].members

, [Customer].[Postal Code].currentmember.member_caption = ‘V9′

)

on 1

from [Direct Sales]

We could aggregate these results to show the combined result (which is the answer we want);

with
member

[Customer].[Postal Code].[MyAgg] as

aggregate(

            filter([Customer].[Postal Code].members

                    , [Customer].[Postal Code].currentmember.member_caption = ‘V9′

                    )

        )

select     [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[MyAgg] on 1

from [Direct Sales]

Now consider the situation where the client simply references the member name [V9] in MDX. This is shown below. Note that only the first member is returned which we would naïvely assumed to be the sales for the postcode.

select [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[V9] on 1

from [Direct Sales]

Conclusion

It is easy to see how useability is an important consideration for attributes. What often surprises most though, are the dangers of clients which purport to show a member value (say V9) and only return 1 member.

About these ads

4 thoughts on “Designing Good Dimensions in SSAS

  1. Hi Paul,

    Havent seen you in a while! Hope Brissie is treating you well

    I have an issue im currently working on with a similar scenario. We have an SCD2 dimension where there are duplicate names caused by current version and older versions of a business unit which are part of a 3 level org hierarchy. The older versions belong to a different parent. The key is composite on 4 attributes.. the 3 “level” keys and an “is_current” flag.

    We need results across the version to show up on a single row (since the name is still the same), but not just for a chosen member. The query needs to return the whole org hierarchy, flattened to not have duplicates.

    Furthermore this issue is complicated by the fact that this is in an SSRS report and the users can select any items at any level from the org hierarchy as a parameter to the report. I currently have the parameter dataset restricted to just the current versions so as not to show the users duplicates.

    The problem I am have is how to get the main report query to accept those selections (in a STRTOSET), but return all the relevant data with no duplicates showing up.

    I haven’t spent much time on it but hit a few roadblocks and am tackling it again next week.
    Can you think of anything?

    • Hi Bhavik,

      Devils always in the detail! – But I think this is a classic type 6 dimension – no? So retain the type 2 data for the hierarchy (historic) and type 1 for the ‘current [level 3] org hierarchy’ (attribute hierarchy) based on the name.

      In your report, you could always filter by the hierarchy level and list the user attribute members that exist (which would ensure showing a single name).

      However (as you nodoubt know), this (in my opinion) is really a training exercise for adhoc users. They need to be made aware that the two sets of attributes are not the same.

      HTH, Paul

  2. Paul,

    I am struggling with same question, I mean how can we get single name across organisation hierarchy accross the board in an scd2 dimension. The filter solution you shown is providing for single entity.

    This is what I like to do

    1) Create a Dynamic Named Set in cube it self with aggregate function on their particular instances that data is pre-processed.

    2) Then from report simple filter out using date filter or other filter from report. Otherwise report is inefficient.

    I am struggling with first part. I am looking for something like this but available at cube preparepation time. You can see that on below MDX I need to know each of their name ( 432) and their particular instances ( 4168, 4169, 759)

    with member
    [Staff].[Staff Hierarchy].[AGENT-NAME1-432]
    as aggregate({[Staff].[Staff Hierarchy].&[4168],[Staff].[Staff Hierarchy].&[4169], [Staff].[Staff Hierarchy].&[759]})

    set CustomEmployeeSet as {[Staff].[Staff Hierarchy].[AGENT-NAME1-432] }
    SELECT { [ARR DAILY Set] } ON COLUMNS,
    CustomEmployeeSet on ROWS
    FROM [NA SCORECARD]

    Any help Much Appreciated.

  3. Hi Ani,

    I would not recommend aggregating the data through the use of calculated members – for a start its going to be impossible to maintain.

    The easiest way to show a single value or name is to have an attribute that is designed to do so. It’s a little hard to tell without knowing about the dimension design (and how your relationships are designed). A good example of what I mean can be found in date dimensions.

    If you’ve got a month_of_year attribute with a smart key (year&month) which appears in the hierarchy then you’ll need another different attribute to show the month (here of course you’ll have a different key event though the name might be the same). You will not expose the month_of_year attribute as an attribute hierarchy (it should be hidden) with the month attribute visible to the user.

    I hope this helps?

    Regards,

    Paul

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s