Archive

Posts Tagged ‘Attributes’

Designing Good Dimensions in SSAS

March 15, 2013 2 comments

Designing good dimensions is a balancing act between at least two factors. Our dimension design must offer the user good (and intuitive) way to navigate data and it must also perform well in queries. This post looks at achieving that balance and the dangers that assumptions can produce for dimension design and result output.

While we can consider other factors (for example processing speed and property/attribute visibility) as important in our design, I considered these factors the most important because the dimension must be used in an intuitive way by users in order to be successful and, the user experience is driven by quick results (especially in an addhoc analytical environment).

Scenario

If we examine the adventure works database and the [Direct Sales] perspective, we note that the customer dimension has a [Post Code] attribute hierarchy (single level hierarchy) and a [Customer Geography] hierarchy which also shows a level [Postal Code]. This can be seen in the diagram below;

Now consider that someone wishes to know what sales were derived in a postcode. We assume that they could use either of these hierarchies to interrogate sales.

Although this is a trivial example, I have seen this type of question present itself in production environments that have “reported” incorrect data the years. Consider a product hierarchy (which includes colour at a lower level) and an attribute hierarchy [colour] (that just shows colours).

I’ll also point out that I highlighted reported above because the business expectation of the results different from the technical. In this instance, the client thought that they were viewing all “yellow” products when in fact they were viewing only a subset of the yellow products.

The dangers of this issue can also be magnified by method that client tool is used to query cube.

So What’s the Issue?

If we consider the postcode example, we ask ourselves the question what are the sales will postcode “V9″. If this were the case, one might expect to use the [Postal Code] attribute hierarchy and just show “V9″. If created pivot in Excel and filter on V9, I get the following results.

This output is not useful, why? Because I expect to see a single value for V9 ($252K).

I could use the [Postal code] level of the [Customer Geography] dimension by creating a set but this is not that intuitive (and doesn’t change results) and is not the kind of interaction we want to give our users.

From the above results we know that there are two occurrences of the postcode (that is two different areas with the same postcode) which have a value ‘V9′. However, what I want is the total for V9!

This distinction becomes obvious when we look at the members in MDX and the [Postal Code] key definition (as below). Note that the key is a composite key (State, City, Postcode) which is also shown when we look at the MDX member key (by dragging members to the query grid in SSMS).

This type of design is perfectly acceptable when the attribute is used within a dimension hierarchy (as is the case in adventure works). I suggest that the [Postal Code] attribute should be hidden since it is available in hierarchy. Furthermore, the creation of a hierarchy (with relationships between the levels) is a SSAS design best practice. Note that the relationships are defined between these attributes.

The problem is that we have not been able to answer a question about the sales for [Postal Code] V9! What we really need is another attribute that only shows distinct postcode values!

Dangerous Clients

Exposing the attribute hierarchy in this way can be a very dangerous situation, especially when the client uses a name in the generation of MDX. For example consider the first query which shows all members that have a postcode of V9. Both members are shown as we would expect;

select [Measures].[Internet Sales Amount] on 0,

filter(

[Customer].[Postal Code].members

, [Customer].[Postal Code].currentmember.member_caption = ‘V9′

)

on 1

from [Direct Sales]

We could aggregate these results to show the combined result (which is the answer we want);

with
member

[Customer].[Postal Code].[MyAgg] as

aggregate(

            filter([Customer].[Postal Code].members

                    , [Customer].[Postal Code].currentmember.member_caption = ‘V9′

                    )

        )

select     [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[MyAgg] on 1

from [Direct Sales]

Now consider the situation where the client simply references the member name [V9] in MDX. This is shown below. Note that only the first member is returned which we would naïvely assumed to be the sales for the postcode.

select [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[V9] on 1

from [Direct Sales]

Conclusion

It is easy to see how useability is an important consideration for attributes. What often surprises most though, are the dangers of clients which purport to show a member value (say V9) and only return 1 member.

Attribute Discretisation and Bin Names

Attribute discretization (grouping or binning) can be used to group a large number of attributes into a single member. For example, we could create a group on the income attribute of the customer in AdventureWorks so that similar ranged incomes are grouped together. This post looks at how to implement attribute discretization and more importantly how to control the name of the attribute bins once they are created.

Creating Bins

Attribute Groups (or bins) can be created on any attribute through setting the DiscretizationBucketCount and DiscretizationMethod properties. Once the attribute is discretised, the detail values are no longer available as members for browsing and the detail data is placed aggregated into bins.

There are three methods available (AUTOMATIC, CLUSTERS and EQUALAREAS) which are discussed here. In this example, we will simply create 4 groups of EQUALAREAS on the [Yearly Income] attribute as shown in the picture below.

After processing, the [Yearly Income] attribute shows four values with the bounds of the group indentified in the name.

Before After Discretization

Controlling Bin Names

The bin names are created based on the bounds of the bins created. However, we may wish to control how the member names are displayed by adding some text so that the names are more user friendly.

This can be achieved by formatting the name the attribute, so we set the name property of the attribute (to ‘Yearly Income’) and specify the format content.The content allows the use of place holders that can retrieve information about the bins created. For example, we can retrieve the value of the bins lowest and highest value (or the next bins lowest value).

Furthermore, the format allows a different construct for the first, last and intermediate bins.

The naming template can specify the format for all members or, alternatively specify a template for each level (first, intermediate, last). For example, the template takes the form

<First definition> [;<Intermediate definition>;<Last definition>]

If the Intermediate and Last templates are not supplied, the first definition is applied to all members. Thus, the template

Between %{Bucket Min} and %{Bucket Max}
will produce the outcome shown below;

Before After Discretization First Template

An example of specifying each template level is;

Less Than %{Next bucket first member};Between %{Bucket Min} and %{Bucket Max}; Greater than %{Previous Bucket Max}

Which produces (the final result)

After Discretization First Template All Template

The allowed values for place holder (variables) are;

Variable Description
%{First bucket member} The member name of the first member to be included in the   current member group.
%{Last bucket member} The member name of the last member to be included in the   current member group.
%{Previous bucket last member} The member name of the last member assigned to the   previous member group.
%{Next bucket first member} The member name of the first member to be assigned to the   next member group.
%{Bucket Min} The minimum value of the members to be assigned to the   current member group.
%{Bucket Max} The maximum value of the members to be assigned to the   current member group.
%{Previous Bucket Max} The maximum value of the members to be assigned to the   previous member group.
%{Next Bucket Min} The minimum value of the members to be assigned to the   next member group.
Other Options
The use of discretised groups can reduce the amount of data that is shown to the user, and by using a naming template, we can have strict control over how these bins are displayed to the user. Further reading on TechNet can be found here.

Alternatively, groups can be created by including an additional attribute in the dimension. For example we could include a calculated member in the table or DSV or change the view to achieve the same effect (provided we knew what we wanted in advance). Another method may involve the use of a lookup table that specifies bands and their applicable values.

Categories: SSAS Tags:
Follow

Get every new post delivered to your Inbox.

Join 158 other followers