Home > MDX, Other Tools, SSAS > Designing Good Dimensions in SSAS

Designing Good Dimensions in SSAS


Designing good dimensions is a balancing act between at least two factors. Our dimension design must offer the user good (and intuitive) way to navigate data and it must also perform well in queries. This post looks at achieving that balance and the dangers that assumptions can produce for dimension design and result output.

While we can consider other factors (for example processing speed and property/attribute visibility) as important in our design, I considered these factors the most important because the dimension must be used in an intuitive way by users in order to be successful and, the user experience is driven by quick results (especially in an addhoc analytical environment).

Scenario

If we examine the adventure works database and the [Direct Sales] perspective, we note that the customer dimension has a [Post Code] attribute hierarchy (single level hierarchy) and a [Customer Geography] hierarchy which also shows a level [Postal Code]. This can be seen in the diagram below;

Now consider that someone wishes to know what sales were derived in a postcode. We assume that they could use either of these hierarchies to interrogate sales.

Although this is a trivial example, I have seen this type of question present itself in production environments that have “reported” incorrect data the years. Consider a product hierarchy (which includes colour at a lower level) and an attribute hierarchy [colour] (that just shows colours).

I’ll also point out that I highlighted reported above because the business expectation of the results different from the technical. In this instance, the client thought that they were viewing all “yellow” products when in fact they were viewing only a subset of the yellow products.

The dangers of this issue can also be magnified by method that client tool is used to query cube.

So What’s the Issue?

If we consider the postcode example, we ask ourselves the question what are the sales will postcode “V9″. If this were the case, one might expect to use the [Postal Code] attribute hierarchy and just show “V9″. If created pivot in Excel and filter on V9, I get the following results.

This output is not useful, why? Because I expect to see a single value for V9 ($252K).

I could use the [Postal code] level of the [Customer Geography] dimension by creating a set but this is not that intuitive (and doesn’t change results) and is not the kind of interaction we want to give our users.

From the above results we know that there are two occurrences of the postcode (that is two different areas with the same postcode) which have a value ‘V9′. However, what I want is the total for V9!

This distinction becomes obvious when we look at the members in MDX and the [Postal Code] key definition (as below). Note that the key is a composite key (State, City, Postcode) which is also shown when we look at the MDX member key (by dragging members to the query grid in SSMS).

This type of design is perfectly acceptable when the attribute is used within a dimension hierarchy (as is the case in adventure works). I suggest that the [Postal Code] attribute should be hidden since it is available in hierarchy. Furthermore, the creation of a hierarchy (with relationships between the levels) is a SSAS design best practice. Note that the relationships are defined between these attributes.

The problem is that we have not been able to answer a question about the sales for [Postal Code] V9! What we really need is another attribute that only shows distinct postcode values!

Dangerous Clients

Exposing the attribute hierarchy in this way can be a very dangerous situation, especially when the client uses a name in the generation of MDX. For example consider the first query which shows all members that have a postcode of V9. Both members are shown as we would expect;

select [Measures].[Internet Sales Amount] on 0,

filter(

[Customer].[Postal Code].members

, [Customer].[Postal Code].currentmember.member_caption = ‘V9′

)

on 1

from [Direct Sales]

We could aggregate these results to show the combined result (which is the answer we want);

with
member

[Customer].[Postal Code].[MyAgg] as

aggregate(

            filter([Customer].[Postal Code].members

                    , [Customer].[Postal Code].currentmember.member_caption = ‘V9′

                    )

        )

select     [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[MyAgg] on 1

from [Direct Sales]

Now consider the situation where the client simply references the member name [V9] in MDX. This is shown below. Note that only the first member is returned which we would naïvely assumed to be the sales for the postcode.

select [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[V9] on 1

from [Direct Sales]

Conclusion

It is easy to see how useability is an important consideration for attributes. What often surprises most though, are the dangers of clients which purport to show a member value (say V9) and only return 1 member.

About these ads
  1. bhavikmerchant
    April 11, 2013 at 2:10 pm | #1

    Hi Paul,

    Havent seen you in a while! Hope Brissie is treating you well

    I have an issue im currently working on with a similar scenario. We have an SCD2 dimension where there are duplicate names caused by current version and older versions of a business unit which are part of a 3 level org hierarchy. The older versions belong to a different parent. The key is composite on 4 attributes.. the 3 “level” keys and an “is_current” flag.

    We need results across the version to show up on a single row (since the name is still the same), but not just for a chosen member. The query needs to return the whole org hierarchy, flattened to not have duplicates.

    Furthermore this issue is complicated by the fact that this is in an SSRS report and the users can select any items at any level from the org hierarchy as a parameter to the report. I currently have the parameter dataset restricted to just the current versions so as not to show the users duplicates.

    The problem I am have is how to get the main report query to accept those selections (in a STRTOSET), but return all the relevant data with no duplicates showing up.

    I haven’t spent much time on it but hit a few roadblocks and am tackling it again next week.
    Can you think of anything?

    • April 12, 2013 at 5:03 pm | #2

      Hi Bhavik,

      Devils always in the detail! – But I think this is a classic type 6 dimension – no? So retain the type 2 data for the hierarchy (historic) and type 1 for the ‘current [level 3] org hierarchy’ (attribute hierarchy) based on the name.

      In your report, you could always filter by the hierarchy level and list the user attribute members that exist (which would ensure showing a single name).

      However (as you nodoubt know), this (in my opinion) is really a training exercise for adhoc users. They need to be made aware that the two sets of attributes are not the same.

      HTH, Paul

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 158 other followers

%d bloggers like this: