For those Aussie readers that would like to do some Data Vault training (and training I can thoroughly recommend), Hans will be running Data Vault training in November. The sessions (and registration links) are listed below;
There’s no doubt about it – Tableau is synonymous with best practice data visualization. Explain a problem to any consultant about visualization and tableau’s name will enter the conversation (well in most independent cases). Tableau’s approach is targeted at visual investigation – allowing the user to work with with data in real time and draw conclusions from it. An approach that was the original intentions of OLAP technology and like OLAP tableau allows the developer/user to define a model within the (tableau) workbook and create visualizations from that model. This is an important call out because it is not dependent on a single table import and can combine data from several disparate sources (I’ve heard some consultants say that you can only display data from a single table).
So with this in mind, what do I like in the next release (V9)? I was originally going to publish this after the Singapore conference however, a recent brief at the BBBT revealed some very nice features that are definitely worth a call out. Surprisingly, the features are targeted at the user experience and not so much at visualization improvements
Importing/Connecting to Data
There’s a new connector for SAS and R data files which may be great for passing data along a work stream. But from my point of view the most useful improvement is the concept of a data interpreter. This investigates a source (Excel Sheet) to look for and interpret tabular data (or a data set) that is the true intention of the import. This has several nice consequences of the feature. You don’t have to have data start cell A1, infact it could be a ‘report’ somewhere in the sheet. Headings (non tabular cells) are striped as are columns, so you could import source systems exports that are not in a strict tabular format. The import can manage formatting and cell merging. For example column headings that appear over 2 lines are imported without issue. Finally, the interpreter applies its own delimiting function. Fields that are delimited (or even the import of a delimited file) can be identified and broken down into multiple fields.
Is this a mature ETL feature? Certainly not. There’s practically no way to control how the the feature works but that’s not the point. The benefit from my point of view is that there is a somewhat intelligent way to use other sources and the ‘month end’ report pack just became a lot more functional.
LOD Expressions & Windowing Functions
Proportions (and trends) are the bread and butter of analysis. We naturally classify items into groups and show significance of items to the group by its proportion to the group total. Of course the calculation of the ratio requires 2 values (detail and group total) and the group total is an implicit windowing problem. In order to derive the total, we have to define a range of data to perform a calculation over (that is window) and this is a usually a problem because there is an associated hierarchy between an item and the group.
In MDX we can define the calculation by reference to the current members parent (provided that a hierarchy exists), however, in reports its not so easy and the most common implementation is found in control which allow this as part of there feature (pie charts are a natural fit for this). Unfortunately, the values (actually only the ratios) are artifacts of the control and unavailable outside its scope.
Tableau solves this issue in 9 with a scoped calculation called a Level Of Detail (LOD). Its a really impressive method to define the scope of an aggregation. This is a really cool addition because (in addition to ratios), those measures can be reused through out the workbook. You can define a calculation with respect to the scope of data that’s shown in a control.
The only consideration i have not tested is solve order …
While we are on the subject of measures, there’s also a new in-line editing feature that allows you to define calculations in the worksheet. You just start typing the (intelli-sense enabled) formula in rows or columns and the calculation is added to the sheet. Then, If you want to create a measure for the workbook, you can simply drag it to the measures tab.
There have previously been two options for sharing workbooks in Tableau. Firstly, you can run up your own instance of Tableau Server which was the enterprise web server. For those not familiar with it, I would liken it to Share Point or Report Server for Tableau workbooks. Something that you need to engage the IT department on to get up and running. Secondly there was Tableau Public – a no security implementation of Tableau Server that is open to everyone – not really a reporting solution for a department.
This leaves some organisations that I speak to in an interesting predicament. They like the desktop tool but can’t share the workbook because of challenges with their IT department (the direct sale of tools to business users seems to exasperate any ill feelings between IT and business doesn’t it?). Tableau Public is just not an option and a Server wont get past IT or the business.
Enter Tableau Online. A secure pay by the user hosted service of Tableau Server. Actually, thinking about the offering, I’m surprised that it has taken this long to eventuate because (in hindsight) it seems like such a sweet spot for the sales model (well at least the clients I speak to). Its a secure, private implementation of Tableau Server that’s hosted by Tableau – a very nice offering indeed.
Another very nice feature included in the new server (all versions I believe) is search functionality. Workbooks can be tagged and searched within the site. This type of functionality seems to be the new norm for finding what you want.
If your not interested in the server, you could try tableau reader for some method of distribution and collaboration. This is equivalent to a PDF reader (for tableau files).
So not much new (well perhaps advertised) in the visualization space. There are some nice features for grouping (lassoing) data points in maps but for the most part, I see the improvements relating to how the user interacts (and perhaps what they can define) with data.
What does the data vault data warehousing methodology have in common with data mining?
There are two conferences / courses being run in Australia in November, 2014.
The Data Vault data warehousing methodology has gained a lot of traction over the past few years. If you’ve never heard of it, I would suggest that it is somewhat of a cross between normalisation and star schema design (of course I am leaving myself open to a bit of criticism with that definition but it is only half a sentence).
Apart from the mandatory recognition to the inventor (Dan Linstedt), the best book that I’ve read on the methodology was written by Hans Hultgren and it is simply a must have if you are interested in learning the method (you can check it out on amazon here ). I cannot recommend this book highly enough. Anyway, I digress.
Hans will be in Sydney conducting Data Vault training on 12th – 14th November (that is, training to become a certified Data Vault Modeller). He has partnered with Analytics8 and MIP (don’t ask me why the prices are different) and while I would never usually promote vendors training solutions, I can make an exception for training delivered by Hans. If you are interested in finding out more about here.
AusDM or Australasian Data Mining Conference is begin run in Brisbane at Queensland University of Technology (Gardens Point) on 27th-28th November. The first time I went to this conference, it was purely an academic conference on data mining. If you’ve never been to an academic conference, they are a lot like other conferences, except that research papers are presented, so the presentations are focused and specific (you’ll also need a bit of background in the subject area). To be honest most industry participants find this pretty dry and boring. Research findings leads technical implementation by a good number of years and the work seems mostly theoretical.
What I like about AusDm is that they have an industry focus in addition to the academic presentations. For example, there is workshop on R which (IMO) makes the price of admission inconsequential (when compared to other R training).
If you would like to know more about AusDm please check out the site http://ausdm14.ausdm.org/home
Usually, the cost of IT & reference books is … well pricey to say the least. Unfortunately, if you wait for a sale, you save some cash but end up with old tech books.
Well luckily Packt Publishing is having a 10 year celebration. No e-book over $10. If you are in the market, it might be worthwhile checking it out . But here’s the kicker. Its only lasting 10 days.
I have long been a fan of the Jedox product for its write-back and text capabilities in OLAP. Its ability to publish Excel pages to the web (server) allows reports (and input forms) to be created quickly. There are some very impressive methods for write back and it can be a great piece of technology (in the right environment of course)!
Now, thanks to Chris Mentor there is a new practical blog about using Jedox with tips, tricks and explanations. If your interested in the product or want to extend your understanding of the toolset then the site should be on your reading list.