Extending Data Sets in SSRS with VB Functions

A common event that occurs rather often is when a report has to be altered to include some set of data that does not exist in the ‘corporate’ data warehouse. This type of situation is easy to imagine in sales environments where a campaign exists (one which was not planned at budget time) and someone* would like to see this data reported against actual sales (and possibly even a comparative to budget). I agree that it would be ideal to have this data ‘corporatized’; however, there are often competing forces and changes in priorities, which means that this data never makes it in. This post looks at how this data can be included into reports.

Consider the situation where and how someone has gone through and created a new budget for adventure works data (as below) and we want to include this in our report.

In order to include this will create a function which specifies what the new value should be given the inputs of year in category. Naturally, in a production environment, these combinations increase but the principle applied is exactly the same. You can read a little bit more about functions here but essentially what we’re trying to do is return some value given inputs. We can then append that to our data set as an additional field (i.e. another measure) and then use that field in the report.

Base Data

The base query for [reseller sales amount] by using categories would be;

select
[Measures].[Reseller Sales Amount] on 0,

[Date].[Calendar].[Calendar Year]
* [Product].[Category].[Category]
on 1

from [Channel Sales]

Creating the Function 20 oh

Having the data in the tabular format is makes it really easy to manipulate in Excel, in fact, we could add another column with a formula to specify what the return value should be given the year and category. In my workbook, the formula would be;

=”if (year_name=”””&A2&””” and category_name =”””&B2&”””) then return ” &D2

And it is apply to every row that we have data for (as in the screenshot below).

Now that we have these formulas, we can embed them in a property construct and add them to the report code. In order to do this you define the property name, the return value type, the input parameters (and their type).

Public Function UserValue(year_name As String, category_name As String) As Double

‘use if statements here
End Function

Note that we are defining a read-only property (UserValue) and specifying a ‘Get’ modifier for the property. All this means is that a function will be created (called UserValue) and require the input variables of year_name and category_name. The function will return a double type. A cut of the complete function looks like this;

Public Function UserValue(year_name As String, category_name As String) As Double
  if (year_name=”CY 2005″ and category_name =”Accessories”) then return 20000
  if (year_name=”CY 2005″ and category_name =”Bikes”) then return 7340000
  if (year_name=”CY 2005″ and category_name =”Clothing”) then return 34000
  …..

  if (year_name=”CY 2005″ and category_name =”Components”) then return 615000
  if (year_name=”CY 2006″ and category_name =”Accessories”) then return 92000
  if (year_name=”CY 2006″ and category_name =”Bikes”) then return 19000000
End Function

As a tip, it may be easier to create this in a text editor before adding it to report code.

Adding the Function to Report Code

There are a few ways to add code to a report, perhaps the easiest is to right click (outside the canvas) and select the report properties from the pop-up menu and then navigate to the code. You could then paste the code directly into the custom code box.

Calling the Function

Functions can be called by editing the expression (that we are interested in). Since we want to append to our original dataset, we edit the dataset and adding new column. Right click on the dataset and select edit. We then add a new calculated field (as below);

The field can be given a name (New_Value) and entered by clicking the expressions button. Note that I can use values from the current dataset as inputs from a function. Just as any other expression editor I simply need to double click on the items that I want to add.

My report now has a new measure (New_Value) which can be added to the report, just as any other measure.

Other Titbit’s and Gotchas

Although this function was added to the dataset, it could also be used as a cell in a matrix or table. The benefit of adding an additional column to the dataset is that the value can be aggregated in your report and used in totals. This would not occur if was added to the control directly.

Also note that the default MDX generated by the query designer in SSRS excludes non-empty data and this may lead to misleading results where no ‘sales’ have occurred, but there is a New_Value amount to be shown. In this situation it is advisable to remove the nonempty criteria query so that you can guarantee that data will be shown (regardless of the results returned through the underlying dataset).

A side benefit of using this technique is that the data then become (somewhat) available to other users.  Once the report is published, it can be consumed as an atom feed.

* The use of the word someone here is intentional because the person making the request for the change to a report does not often have control over the data that is entered into the data warehouse. While there is a business imperative to include this data, it is not seen as an IT imperative. There is simply not enough time to go through the appropriate channels.

Designing Good Dimensions in SSAS

Designing good dimensions is a balancing act between at least two factors. Our dimension design must offer the user good (and intuitive) way to navigate data and it must also perform well in queries. This post looks at achieving that balance and the dangers that assumptions can produce for dimension design and result output.

While we can consider other factors (for example processing speed and property/attribute visibility) as important in our design, I considered these factors the most important because the dimension must be used in an intuitive way by users in order to be successful and, the user experience is driven by quick results (especially in an addhoc analytical environment).

Scenario

If we examine the adventure works database and the [Direct Sales] perspective, we note that the customer dimension has a [Post Code] attribute hierarchy (single level hierarchy) and a [Customer Geography] hierarchy which also shows a level [Postal Code]. This can be seen in the diagram below;

Now consider that someone wishes to know what sales were derived in a postcode. We assume that they could use either of these hierarchies to interrogate sales.

Although this is a trivial example, I have seen this type of question present itself in production environments that have “reported” incorrect data the years. Consider a product hierarchy (which includes colour at a lower level) and an attribute hierarchy [colour] (that just shows colours).

I’ll also point out that I highlighted reported above because the business expectation of the results different from the technical. In this instance, the client thought that they were viewing all “yellow” products when in fact they were viewing only a subset of the yellow products.

The dangers of this issue can also be magnified by method that client tool is used to query cube.

So What’s the Issue?

If we consider the postcode example, we ask ourselves the question what are the sales will postcode “V9”. If this were the case, one might expect to use the [Postal Code] attribute hierarchy and just show “V9”. If created pivot in Excel and filter on V9, I get the following results.

This output is not useful, why? Because I expect to see a single value for V9 ($252K).

I could use the [Postal code] level of the [Customer Geography] dimension by creating a set but this is not that intuitive (and doesn’t change results) and is not the kind of interaction we want to give our users.

From the above results we know that there are two occurrences of the postcode (that is two different areas with the same postcode) which have a value ‘V9’. However, what I want is the total for V9!

This distinction becomes obvious when we look at the members in MDX and the [Postal Code] key definition (as below). Note that the key is a composite key (State, City, Postcode) which is also shown when we look at the MDX member key (by dragging members to the query grid in SSMS).

This type of design is perfectly acceptable when the attribute is used within a dimension hierarchy (as is the case in adventure works). I suggest that the [Postal Code] attribute should be hidden since it is available in hierarchy. Furthermore, the creation of a hierarchy (with relationships between the levels) is a SSAS design best practice. Note that the relationships are defined between these attributes.

The problem is that we have not been able to answer a question about the sales for [Postal Code] V9! What we really need is another attribute that only shows distinct postcode values!

Dangerous Clients

Exposing the attribute hierarchy in this way can be a very dangerous situation, especially when the client uses a name in the generation of MDX. For example consider the first query which shows all members that have a postcode of V9. Both members are shown as we would expect;

select [Measures].[Internet Sales Amount] on 0,

filter(

[Customer].[Postal Code].members

, [Customer].[Postal Code].currentmember.member_caption = ‘V9’

)

on 1

from [Direct Sales]

We could aggregate these results to show the combined result (which is the answer we want);

with
member

[Customer].[Postal Code].[MyAgg] as

aggregate(

            filter([Customer].[Postal Code].members

                    , [Customer].[Postal Code].currentmember.member_caption = ‘V9’

                    )

        )

select     [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[MyAgg] on 1

from [Direct Sales]

Now consider the situation where the client simply references the member name [V9] in MDX. This is shown below. Note that only the first member is returned which we would naïvely assumed to be the sales for the postcode.

select [Measures].[Internet Sales Amount] on 0,

[Customer].[Postal Code].[V9] on 1

from [Direct Sales]

Conclusion

It is easy to see how useability is an important consideration for attributes. What often surprises most though, are the dangers of clients which purport to show a member value (say V9) and only return 1 member.

SSIS 2012 Execution Reporting

SSIS 2012 has (arguably) two main improvements over prior versions. The first is a project deployment model where an entire project can be deployed and managed with a common structure. The second is a more favourable logging mechanism. This post looks at the latter and examines reporting on package and job execution in SSIS 2012.

There are standard inbuilt reports that show the execution of the project (and individual packages within the project). These can be seen in SQL Server Management Studio (SSMS) by right clicking on the SSIS catalogue (or node) and selecting the “All Executions” report from the reports menu (as below). Note that although you can get to the report through the path Reports > All Executions or Reports > Standard Reports > All Executions, the reports are the same (and shown below).

Accessing the Execution Reports through SSMS
Sample Report Output

One of the really great features of this report is that you can click on the Overview hyperlink and see what packages have executed as part of the (node) execution. So, for example, the packages that have executed under my ‘Catalog Testing’ (project) execution are listed since this is the node I selected (when I selected overview from the ‘All Executions’ report above. An example of this output is seen below. You may note that the overview includes the tasks that have executed within the package.

Overview Report (accessed by the Overview link in the ‘All Executions’ report)

But Wait – There’s More!

While these reports are good, they require SSMS to view the reports, so you might want to interrogate the SSIS database directly. Using this you can create some custom reports and expose these as normal SSRS reports.

Two views that are invaluable for examining package executions (and task executions) are catalog.executions
and catalog.executable_statistics. From MSDN the executions view shows “instances package execution in the Integration Services Catalogue”, therefore, we can use this to show (and identify the execution_id) of packages when execute. Consider the “All Executions” report above which shows the execution_id of 10275. This can be easily identified in the executions view with the query;

select  *
from catalog.executions
where execution_id = 10275

More importantly we can easily identify executions of packages (or groups of packages) by referring to the package_name field in this table.

Once we have the execution_id, we can then use this in the statistics view to see the detailed execution of the package. This is the same data that is shown in the ‘Overview’ report above.

select  *
from catalog.executable_statistics
where execution_id = 10275

Most of my projects utilise a parent/child architecture, so when I consider the natural drill path for package executions as the master_package à child_package à tasks. If I want to only show the packages that have executed as part of the Master package (note Master_1.dtsx package name in the executions above) I can simply use the following query;

select e.*
, CONVERT(datetimees.start_timeAS start_time
CONVERT(datetimees.end_timeAS end_time
es.execution_duration es.statistics_id
es.execution_result
case es.execution_result
when 0 then ‘Success’
when 1 then ‘Failure’
     when 2 then ‘Completion’
    when 3 then ‘Cancelled’
  end  as execution_result_description
from catalog.executables e
join catalog.executable_statistics es
on  e.executable_id = es.executable_id
 and e.execution_id = es.execution_id
where package_path  ‘\Package’
and  e.execution_id = 10275

A Word of Caution

while they SSISDB database can be used to provide some nice reporting, the executions view contain some inbuilt security which implements row security. If we examine the definition of the view we can see that the where clause includes the following conditions.

WHERE  opers.[operation_id] in  (SELECT id  FROM  [internal].current_user_readable_operations])
OR  (IS_MEMBER(‘ssis_admin’ = 1)
OR  (IS_SRVROLEMEMBER(‘sysadmin’= 1)

The credential for accessing the database should be a member of ssis_amdin or sysadmin role. If not, no rows are returned.

I will leave it to your own devices to come up with alternate methods for bypassing this requirement.


Creating Custom Calculations in Excel with Pivot Table Extensions

One of the biggest hurdles that users are faced with when using Pivot Tables is that all measures and dimensions must be defined within the cube. Having the defined data available is really convenient for drag and drop analysis however in most situations, the users requirements can quickly outgrow the bounds of the definition of the cube. I want to see such and such in my data is the usual type of requirement. Sure, developers and report writers can create calculations but what about users – the people that need to use the data? This post looks at OLAP PivotTable Extensions, a tool which increases the usability of pivot tables for end users.

Traditional PivotTables just don’t give the user enough flexibility. Sure Excel 2013 has some nice features for creating calculations in Pivots but I haven’t worked with any corporate client that is seriously considering using 2013 just yet. There is just too much corporate tape to cut through. So let me just say, if your users aren’t using this tool, they should be. You may argue that users could use Power Pivot but, that really defeats the purpose of having a cube doesn’t it? Besides, why should the users have to handle data just to do some simple analysis? Besides that, the addin is free, talk about a lower barrier to entry for self-service BI!

This post looks at creating calculations using the OLAP pivot table extensions addin.

A Quick Overview

Once the add-in has been installed it can be accessed from a pivot table by simply right clicking on that pivot table. For example, consider the pivot table below which shows the [sales amount] and [sales quota] by year in adventure works. To add a custom calculation to this pivot, all I need to do is right click and select “OLAP Pivot Table Extnesions …”

This opens the following window (note that the calculations tab is selected);

Calculations On Measures

The first example that we can use for creating a calculation involves creating a new measure that is based on other measures. For example based on the [Sales Amount] and [Sales Amount Quota], we might like to see what the variance is. We can express this as [Sales Amount] – [Sales Amount Quota] but we should remember two things;

Firstly, there are spaces in in the names so we need to surround the name with square bracket (ie [, ]).

Secondly, the pivot is based on MDX. This is a decorative language and has its own rules. While these rules can sometimes be bypassed (as we will see below), it’s probably the better idea to use the fully qualified name for items that are referring to.

If we consider the requirement for sales variance (lets call it [sales variance]) we could simply create the measure using the following formula [Sales Amount] – [Sales Amount Quota]. This is shown below as;

Once the measure has been defined I can added to the pivot table by clicking the ‘Add to Pivot Table’ button.

Fully Qualified Names

I stated above that MDX has requirements for how we refer to items in the language, and it is a good idea to use the fully qualified name. An exception to this is the requirement to use the fully qualified names for measures. For example if we are in cell C8 and we create a formula that references C5, we can see that Excel adds its own calculation (one that users the formula GETPIVOTDATA).

By examining this formula, we can see that the fully qualified name for the [Sales Amount] measure is [Meaures].[Sales Amount]. Also note that the fully qualified name for [CY 2008] is [Date].[Calendar Year].&[2008].

The use of the GETPIVOTDATA function is a very useful trick that you can use to determine fully qualified name for pivot cells.

Using Static References

One of the restrictions of pivot tables is that it’s very hard to reference individual cells in the pivot. For example, if we wished to show the sales amount for 2005 in the pivot we would generally filter columns or rows to 2005 and add the [sales amount] measure. We can use the pivot table extensions with a static reference to include the 2005 sales amount value as a column in the pivot. All we have to do is to create a new measure for the cell coordinates that we are interested in.

Consider the GETPIVOTDATA formula for the sales amount in 2005, that is;

=GETPIVOTDATA(“[Measures].[Sales Amount]”,$A$1,”[Date].[Calendar Year]”,”[Date].[Calendar Year].&[2005]”)

We can simply create a measure that references this coordinate. This is ( [Sales Amount] , [Date].[Calendar Year].&[2005] )

Note that coordinates should be enclosed in parenthesis and separated by commas.



A really cool thing about creating this new measure is that we can use it by other measures. So we could show the variance compared to 2005 (Sales Amount Measure) by creating a measure that refers to the [Sales Amount 2005] measure. This measure would be;
[Sales Amount] – [Sales Amount 2005]

Note, that we did not have to keep ‘referenced’ measures in the pivot table to use their outcome. For example we can remove the measure [Sales Amount 2005] from the pivot table and see the measure [Sales Variance 2005] in the pivot. Measures are removed just as any pivot measure, by dragging them from the ‘Values’ box in the field selection list.


The use of the word static to this section implies that we will only reference data from 2005 in our calculated measure. More often than not we want to refer to a coordinate in the context of another. This is usually the ‘next year’ or the ‘prior year’. For example we might like to see the [Sales Amount] variance compared to last year. In order to achieve this we will create a measure [Sales Amount LY] and [Sales Variance LY].

Calculations using Relative References

If we recall the GETPIVOTDATA formula for the sales amount in 2005, we can also discover that the ‘2005’ amount relates to a specific dimension, and the hierarchy within that dimension. We can see this in the highlighted sections below;

=GETPIVOTDATA(“[Measures].[Sales Amount]”,$A$1,”[Date].[Calendar Year]“,”[Date].[Calendar Year].&[2005]”)

These both give us the same information, specifically that the hierarchy [Date].[Calendar Year] is being used. Additionally, we should know that each row actually refers to a specific member (for example 2005). If we want to create a measure that uses a relative reference we have two to use the hierarchy name and the relative function.

Arguably the .prevmember function is one of the most useful relative functions available. This function takes the current coordinate (item) and substitutes the prior one for it. Therefore, if we are on the 2006 row, the .prevmember would give us the 2005 coordinate. Also note that because we are using a relative function, the ‘exact’ coordinate is not referred to, rather the hierarchy is used in formulas.

Therefore, we can get the prior year of a row using the following coordinate [Date].[Calendar Year].prevmember

This may seem a little bit complicated at first, and it may help to think that the rows shown are from the [Date].[Calendar Year] hierarchy, rather than specific values. For example [Date].[Calendar Year].&[2005].

If we wish to create a calculated measure to show what sells amount was last year we simply need to use the formula


The measure [Sales Var LY] is simply [Sales Amount] – [Sales Amount LY].

The results of these calculations are shown on screen shot below;


Conclusion
The use of the OLAP PivotTable Extensions offers a huge improvement to the usability of pivot tables. This post is looked at the ability to create custom calculations within the pivot so that the user can create their own analysis.

References
Download the addin here
Prevmember function reference on MSDN