Data Vault Training – Australia in 2015

The Genesee Academy’s Data Vault training has been announced for 2015 in the Australian region. 

Hans Hultgren will be running classes on Brisbane, Melbourne, Sydney and Auckland in July and August this year (the dates are below with links to the courses and registration). 

It’s a great class (I’ve heard multiple recommendations about this one) so if you interested in Data Vault, why not give attend and learn from someone who literally wrote the book


Jul 13 – 15th 3 Day Certification – Brisbane Australia Registration
Jul 22 – 24th 3 Day Certification – Melbourne Australia Registration
Jul 27 – 28th 2 Day Certification – Sydney Registration
Aug 3 – 4th 3 Day Certification – Auckland Registration

Forcing Values into SSRS Matrix Columns and Rows

The concept of the trusty SSRS matrix control is great for dynamically expanding columns and rows based on the objects source data.  It’s behaviour operates in a very similar way to a Pivot table, that is, expanding and contracting based on underlying data.

However, when you want to force data to be shown in rows or columns, the expanding nature of the control can present a problem where the data does not exist. Consider a report where we allow the user to select multiple subcategories and then we display the number of products in each category based on their selection.

The hierarchy for this relationship is product –> product subcategory –> product category, so if we allow the user to select a subcategory, and then restrict our result set based on this, the matrix will not show the all available Categories.

Consider the base (standard query) that uses a parameter for subcategory_id (its a multi-valued parameter).

pc.Name as CategoryName
, count(*) as ProductCount
from [Production].[Product] p
join [Production].[ProductSubcategory] psc
on p.ProductSubcategoryID = psc.ProductSubcategoryID
and p.ProductSubcategoryID IN (@subcategory_id)
join [Production].[ProductCategory] pc
on psc.ProductCategoryID = pc.ProductCategoryID

group by pc.Name


image If we have a matrix (as in the image to the left), the Category values that are shown are dependent on what values are in the dataset.

If we want to extend the control to show all values, we have to append it to the dataset.  Therefore, our query would need to be changed to the following;

pc.Name as CategoryName
, count(*) as ProductCount
from [Production].[Product] p
join [Production].[ProductSubcategory] psc
on p.ProductSubcategoryID = psc.ProductSubcategoryID
and p.ProductSubcategoryID IN (@subcategory_id)
join [Production].[ProductCategory] pc
on psc.ProductCategoryID = pc.ProductCategoryID

group by pc.Name

union all select
Name as CategoryName
, null as ProductCount

from Production.ProductCategory

With such a small example this approach works fine. However, when the dataset includes additional fields, and the matrix groups on those fields, the grouping behaviour may create undesired results because the additional union adds another layer of data to group on.

Consider, for example if we included country, our dataset would look like

[country name (as a value)],[category],[count]   -- for the 'real data'
[country name (as a null) ],[category],[count]   -- for the 'additional data'

If a report included a grouping on country name, the nulls would be in their own group as a value and this is not what we want to display.

To solve this, we can apply a filter on the matrix to remove the blank countries. This does not remove the additional categories (that is values being shown), it just effect what country values are shown. A handy little trick 🙂

SSRS Report Parameters from Unfriendly Sources

We have recently developed some SSRS reports against Non SQL Server sources and have run into a few issues regarding the use of multi-value parameters.  Since, we are sure to meet this problem again (and variations of it), this post is just as much of a bookmark for ways around the problems that we’ve encountered.

So what’s the problem?  Lets assume were using an  MySql Source with the same data as AventureWorks, we can to pick a Subcategory and then have our report show a list of products.  This is pretty straight forward.  The products query uses a ? to identify a parameter, and that parameter is assigned a value to in the dataset properties;

SELECT        *
FROM            adventure_works_2012.product
WHERE        (ProductSubCategoryID = ?)

You can see how this is setup in the image below.  The SubCategory dataset is the source for the SubCategory parameter and that is used as a parameter for the Products query.


Unfortunately, this only works if our SubCategory Parameter DOESNOT allow multiple values.  If we enable this as an option for the Parameter, we’ll get an error like the following when we try to view the report.


To make the products query run with multiple parameter values we need to dynamically generate the SQL associated with the parameter.  Actually there are a couple variations of this however they all use the same theme.

So, first, we change our query to recognise the need for multiple values from the parameter;

SELECT     *
FROM            adventure_works_2012.product
WHERE        ProductSubCategoryID in (?)

Then, instead of supplying the parameter value to the dataset, we use an expression that delimits parameter values.


Here, the source type is a integer so we are expecting something like 1,2,3 etc to be passed as the parameter’s value.  If the source type was a string, we’d just have to append quotes around the statement.

= Join(Parameters!SubCategory.Value, "’ ,’")


We can use a few variations of this approach to generate the parameter list.  For example, we could exclude the parameter definition from the Products dataset and specify the query as a function that returns an SQL string.  Using this approach, the Parameter value is still called but the parameter object is not defined in the dataset.

We can also define a some VB code that returns the SQL string (I find this a little more flexible).  Here the string value of the parameter is passed to the VB function.  So here, I’ve created a simple function and added it to the report;

Public Function SQL(ByVal Keys as String) as String
  return “select * from adventure_works_2012.product where ProductSubCategoryID in (” & Keys & “)”
End Function

Now, our dataset is defined by the expression

=Code.SQL( Join(Parameters!SubCategory.Value, ",") )

Oracle Batches & Vairables

More of a prompt for me than an informative post 🙂

Oracle params/variables can be defined in a batch statement through a variable keyword and then assigned a value using the exec (execute) command.  For example, we can define 2 variables and give them a value with the statements below (note the use of := for assignment)

variable v1 nvarchar2(20);
variable v2 nvarchar2(20);
variable v3 nvarchar2(20);

exec :v1 := 'Val 1';
exec :v2 := 'Val 2';
exec :v3 := 'Val 3';

select :v1 as v1, :v3 as v2,
from dual;

select :v2 as v2 , :v1 as v1
from dual;

When executing the command (I’ve tried this on TOAD and SQL Developer), the function key used with either

  1. Execute the batch  (if F5 is pressed) or
  2. Execute the current command where the cursor is (if F9 is pressed).  This relates to the statement surrounding the cursor position.

Where the command is executed (F9), any variables in the command are prompted for entry (as shown below).



Tableau 9 – Features for User Experience

There’s no doubt about it – Tableau is synonymous with best practice data visualization.  Explain a problem to any consultant about visualization and tableau’s name will enter the conversation (well in most independent cases).  Tableau’s approach is targeted at visual investigation – allowing the user to work with with data in real time and draw conclusions from it.  An approach that was the original intentions of OLAP technology and like OLAP tableau allows the developer/user to define a model within the (tableau) workbook and create visualizations from that model.  This is an important call out because it is not dependent on a single table import and can combine data from several disparate sources (I’ve heard some consultants say that you can only display data from a single table).

So with this in mind, what do I like in the next release (V9)?  I was originally going to publish this after the Singapore conference however, a recent brief at the BBBT revealed some very nice features that are definitely worth a call out.  Surprisingly, the features are targeted at the user experience and not so much at visualization improvements

Importing/Connecting to Data

There’s a new connector for SAS and R data files which may be great for passing data along a work stream.  But from my point of view the most useful improvement is the concept of a data interpreter.  This investigates a source (Excel Sheet) to look for and interpret tabular data (or a data set) that is the true intention of the import.  This has several nice consequences of the feature.  You don’t have to have data start cell A1, infact it could be a ‘report’ somewhere in the sheet.  Headings (non tabular cells) are striped as are columns, so you could import source systems exports that are not in a strict tabular format.  The import can manage formatting and cell merging.  For example column headings that appear over 2 lines are imported without issue.  Finally, the interpreter applies its own delimiting function.  Fields that are delimited (or even the import of a delimited file) can be identified and broken down into multiple fields.

Is this a mature ETL feature?  Certainly not.  There’s practically no way to control how the the feature works but that’s not the point.  The benefit from my point of view is that there is a somewhat intelligent way to use other sources and the ‘month end’ report pack just became a lot more functional.

LOD Expressions & Windowing Functions

Proportions (and trends) are the bread and butter of analysis.  We naturally classify items into groups and show significance of items to the group by its proportion to the group total.  Of course the calculation of the ratio requires 2 values (detail and group total) and the group total is an implicit windowing problem.  In order to derive the total, we have to define a range of data to perform a calculation over (that is window) and this is a usually a problem because there is an associated hierarchy between an item and the group.

In MDX we can define the calculation by reference to the current members parent (provided that a hierarchy exists), however, in reports its not so easy and the most common implementation is found in control which allow this as part of there feature (pie charts are a natural fit for this). Unfortunately, the values (actually only the ratios) are artifacts of the control and unavailable outside its scope.

Tableau solves this issue in 9 with a scoped calculation called a Level Of Detail (LOD).  Its a really impressive method to define the scope of an aggregation.  This is a really cool addition because (in addition to ratios), those measures can be reused through out the workbook.  You can define a calculation with respect to the scope of data that’s shown in a control.

The only consideration i have not tested is solve order …

Measure Editor

While we are on the subject of measures, there’s also a new in-line editing feature that allows you to define calculations in the worksheet.  You just start typing the (intelli-sense enabled) formula in rows or columns and the calculation is added to the sheet.  Then, If you want to create a measure for the workbook, you can simply drag it to the measures tab.

Tableau Online

There have previously been two options for sharing workbooks in Tableau.  Firstly, you can run up your own instance of Tableau Server which was the enterprise web server.  For those not familiar with it, I would liken it to Share Point or Report Server for Tableau workbooks.   Something that you need to engage the IT department on to get up and running.  Secondly there was Tableau Public – a no security implementation of Tableau Server that is open to everyone – not really a reporting solution for a department.

This leaves some organisations that I speak to in an interesting predicament.  They like the desktop tool but can’t share the workbook because of challenges with their IT department (the direct sale of tools to business users seems to exasperate any ill feelings between IT and business doesn’t it?).  Tableau Public is just not an option and a Server wont get past IT or the business.

Enter Tableau Online.  A secure pay by the user hosted service of Tableau Server.  Actually, thinking about the offering, I’m surprised that it has taken this long to eventuate because (in hindsight) it seems like such a sweet spot for the sales model (well at least the clients I speak to).  Its a secure, private implementation of Tableau Server that’s hosted by Tableau – a very nice offering indeed.

Another very nice feature included in the new server (all versions I believe) is search functionality.  Workbooks can be tagged and searched within the site.  This type of functionality seems to be the new norm for finding what you want.

If your not interested in the server, you could try tableau reader for some method of distribution and collaboration.  This is equivalent to a PDF reader (for tableau files).


So not much new (well perhaps advertised) in the visualization space.  There are some nice features for grouping (lassoing) data points in maps but for the most part, I see the improvements relating to how the user interacts (and perhaps what they can define) with data.

A Brief Look at Qlik

I was fortunate enough to attend a Qlik briefing last week and this post is targeted at some of my observations about that product and the offering (or approach) that Qlik takes to the market.  To be explicit, these observations are my own thoughts and were, in no way dependent on the briefing.

The perception of most folk that I discuss Qlik with is that the product is a departmental solution and “hard core” BI pro’s dismiss it as a trivial product.  Similarly, these emotions are projected to other self-service products (Tableau, Spotfire etc).  They suggest that Qlik could never handle the scale that enterprise solutions require (or so the argument goes) and possibly this may have been the case in resource constrained environments.  Is this consideration only applicable only to Qlik? – I’m not so sure…. actually, of course NOT.

The question of this scalability is one that seems most relevant given Qlik’s history.  Why?  Because Qlik applications embed their data with the presentation engine (exactly the same approach as Power BI).  Looking at the industry, it is interesting to see that in the last 2-3 years this approach has now become accepted as the norm (dare I suggest recommended) and the proponents of ‘its not enterprise’ are quiet.

Qlik seems to have this scalability concern covered with clustering and load balancing capabilities (of course I have not tested this against any large datasets).  Notwithstanding that, I have no reason to expect that their ability would not provide enterprise grade solutions or performance.

Now onto the product.  Qlik’s approach is to build applications based around data – developers/users connect & import their data to the workbook (or application) and then chose various presentation methods to present it to end users.  Again, I think this is so similar to the concepts of Power BI and specifically, the development of an “Excel Application” that conceptually, the two approaches are indistinguishable.  For me this is a very important clarification because a very large number of visitors to this site search for methods to integrate and create those Excel Applications.  Managing the worksheet objects through VBA is a very common requirement and suggests that the concept of an Excel application (or reporting pack) is alive and well (still).

The Data Driven App

One of the impressive features of the Qlik data is the idea of an associative model.  If you look at the image below you’ll notice that there are relationships between tables and (as we would expect) this causes filtering at the data level.  A really interesting feature of this model is that it is bi-directional and the concept of a key table (lookup table) with a unique key in a one way relationship is not required.  This means that many-to-many relationships are handled intuitively by the data model.


Actually, the idea of ‘modelling’ is redundant in Qlik because there is really nothing for a traditional modeller to do.  Data is loaded and relationships are automatically created between between tables based on field names.  This extends to relationships that would otherwise be created between composite fields (multiple columns).


Continuing with the idea of a developing a data driven application – controls access the underlying model an and either restrict visibility (if data elements are chosen) or display the current data set (for example display a visualisation based on the active filter).  Here is where I think the ‘developer’ has some real power.

Just as in the development of an Excel application you can add tabs (additional pages with different controls) and, in a similar way to the usage of a slicer/pivot connection apply filters across the entire application.  The great thing is that the controls are so powerful and customisable that the developer really has an open slate to design the application the way they want.  For example (as below), you can include a spark chart in a list box and set its order by any value.  Everything seems to be customisable visually: backgrounds, fonts (with visual cues), headings … its all there.



Other Cool Features

There were a few other cool features that I thought the product had and are worthy of mention.  The first is search controls.  Here (as below), you can filter your data by searching for values, and a search is applied to all data in the model.  Say for example, I’m looking for ‘%ar%’.  Using the search control shows any occurrences of Ar something in my entire model.


There is also a really neat screen sharing option.  Here’s the business case.  Say for example, I’m doing some analysis and I want to talk about an issue with an associate – I can send them a URL so they can see my Qlik application in a browser as I do (even as I make changes).  Its effectively an terminal service of my screen.  Really cool 🙂

Finally (and I’m sure this list is not exhaustive), there is the ability to annotate and snapshot the application.  For example, you can send an email to your associate with comments/instructions (that are also stored in the app) and include a ‘snapshot’ that opens the application with the same filters that you had (and the same data).

The Conclusion?

There are some great features for developing an application and I think that’s the key word.  Its designed to build and application for users so in my mind its not what we would call a data discovery tool.  But oh…. such power and (I’m sure) these features have only scratched the surface. 🙂