The Jedox Roadshow comes to Brisbane

Jedox is one of the best unknown data platforms I know of.  Calling them unknown may sound a little harsh because they are a major product however, I say unknown because I don’t see them being considered for purchase in a large number of companies when they are clearly a fit for purpose product – and the best one at that.

For budgeting applications and situations where users need 2 way interaction with data – they ROCK.  Consider a budgeting example where you say something like ‘apportion $5000 like last year’, or ‘apportion $5000 evenly’ , jeodx just takes care of it.  And if you want comment why you decided to make that allocation, you can.  For financial budgeting, there is just no better product.  There’s automatic Excel integration, report authoring from Excel with publish to the Web, so accountants, planners , analysts and alike find it easy and natural to use.

I’m always keen when things are on in Brisbane and they’ll be here on 19th May (2017).  You can find out more here.

Schema Definitions from an ODBC Text Source

Recently, I’ve worked on disjointed migration project where things just didn’t work the way you expect.  For example, you usually like to access data direct from its source (direct from a relational database) but in this situation, this was just not allowed and we were given a load of text files (hundreds) to load.  Perhaps not so bad, you may say however, unfortunately, the documentation around those files was incomplete and incorrect.  It seems that we spent for too much time confirming valid formats and playing too and fro with vendors – Its not really value adding and lets face it, its not really a good use of time to transfer and manually configure text import.  So what could we do?  All files were deposited in a directory, so why not use the Schema collections from an ODBC source?

The idea is to setup an ODBC source that points to the directory with files in it, create a connection to that source and then use the meta data of that connection to iterate over each table in the source and get the column information from that table.  Note that for a text driver, each file represents a table and that’s why I refer to tables above.

1. Setting Up the Driver

I think this is relatively straight forward so I’ll just add the key points with a word of warning.  There is no x64 bit driver installed by default for windows, so you may want to download the Access one from here if you need to.  Access the ODBC drivers from your settings (just type ODBC in the search bar and then choose to setup a connection or access ODBC sources), then just add a new source and choose the text driver.

image

When you click Finish you’ll be presented with the window below where you can specify the directory you want to use and give the source a name (see below).  There’s no need to look at options, but if you want to, you can specify specific extensions but for now lets assume that everything in the folder is for import.

image

2. Data Access

As a side note, after you’ve set up the driver, you can now import the data as a table into Excel.  The menu path is (Data Tab) Get External Data > From Other Sources > Microsoft Query.  I know you could import the file directly (ie Get External Data > From Text) however, you want to use the text driver and text your ODBC source.  You’ll notice that this source is available as a database (see below), and each file in the directory is presented as a table.  The source that I previously created is shown below and you can see the following list of tables.

image image

3. Using the ODBC Shema Object

As a general rule, when we create a database connection in code, the connection object is accompanied by a schema and we can use this to interrogate  the source.  As you can see from the snippet, all we do is acquire the “tables” schema (ie list the tables), then iterate over each table to get the columns in that table.  The sample output this shown below and, as you can see, it defines the column names and types.

            OdbcConnection _con = new OdbcConnection("DSN=biml_text_files");
_con.Open();

DataTable _schema = _con.GetSchema();
DataTable _restrictions = _con.GetSchema("Restrictions");
DataTable _tables = _con.GetSchema("Tables");

foreach (DataRow _tablerow in _tables.Rows)
{
Console.WriteLine("{0}", _tablerow["TABLE_NAME"]);

string[] _table_restrictions = { null, null, _tablerow["TABLE_NAME"].ToString() };
DataTable _columns = _con.GetSchema("Columns", _table_restrictions);
foreach (DataRow _colrow in _columns.Rows)
{
Console.WriteLine("\t{0},{1},{2}", _colrow["COLUMN_NAME"], _colrow["TYPE_NAME"], _colrow["COLUMN_SIZE"]);

}

}


_con.Close();

image

Actually, you don’t even need to go to this trouble.  You could just get the columns schema without the restrictions which will give you a complete list of all columns for every table.  However, in our implementation, the data was sent back to a database so others could use it.

Conclusion

Working with the Schema Information objects allows us derive information about the source.  Its conceptually similar to the INFORMATION_SCHEMA tables in most databases however, when we apply this to text files, we can use that meta-data to build our ETL.

BI Success – Does the tail wag the dog?

I am watching recorded event this morning about the keys to Business Intelligence success. The top 4 points that were provided are;

  1. Finding the right (executive sponsor)
  2. Don’t build what they ask for, build what they need
  3. Offer up a buffet (deliver multiple options)
  4. Don’t try to boil the ocean.

Lets be honest, there’s nothing new here – having the right sponsor has been listed as a critical success factor numerous times. So, to paraphrase, the presentation was basically talking about finding the correct person and delivering something that truly works for the business in an incremental approach. Why incremental? Well, I guess, (if I have my skeptics hat on) I’d suggest its better to fail fast with small outcomes so the risk adverse manager doesn’t get tagged with a failed project. On the positive note, it better aligns the business outcomes with whats delivered, reduces inertia to adoption and maintains focus on outcomes (from both the business and those delivering).

What raised my eyebrows was the first choice, finding the right sponsor. This is so important that, without doubt, I’d suggest that any rouge project would eventually be killed if it doesn’t have the right ‘executive’ support (if you don’t think so tell me otherwise). I’ve delivered remarkable shadow IT projects that were canned and aggressively removed by IT departments. For example, I’ve had the business users refuse to use the corporate system and ask IT to build systems that work like the existing one (that was built by my team).

Before you suggest that there is some valid reason why this needs to happen let me assure you there’s not. Some common arguments include statements like its not valid because it lacks IT structure, backups, testing, validation and so on. They did so there’s no technical reason why the tool’s have been killed. So I reflect on the first point that was raised in the webinar – executive sponsorship.

Now to the main point of this post, why is executive sponsorship so important? Some common arguments are that it ensures appropriate resources and organizational focus however, I cant help wondering if these are recursive justifications – somewhat aligned to the chicken and the egg, its hard to truly figure out which comes first. Is there some strategic and leadership focus here or does it reflect more about the structure of organizations?

Could the requirement for executive support be an indication of organizational compliance rather than effective management? For example, if executive John Bloggs is behind the project no one dares speak up against its short comings and so it can’t be a failure.  So, we’ll change the definition of success (scope or cost) because it appeases John Bloggs expectations and influence.

I know that there are cases where focused executives have played an active role in delivery and provided a positive impact to projects but I cant help feeling that, sometimes, the tail is wagging the dog.

Now for the ultra positive and the additional factor which I cant help thinking what comes first.  I’ve been lucky enough to work in a variety of organizations and have seen a lot of different management styles.  However, from the organizational point of view, does it not say something about the organization (and the type of people who work there) when executives actively back the project?  I am sure that something people feel passionate about and work towards will be successful, so is it not the organization rather that the politics that drives success?  That is, if executives are willing to get support the project (rather than treat it like a political football), because that’s the type of people they are, then how can it not be a success?

What are your experiences?

Automation of Excel Macros

There is still a huge value of reporting and using Excel for end users.  Excel is simple to work with, allows data access and allows end users to program with minimal learning.  As users develop their skills they inevitably look at automation through macros.  So if we want to turn this into something a little more ‘supported’ (by the corporate IT department) so that its scheduled on some manner, we’ll have to turn to a bit of programming.  So, lets look at how we can do this.  Firstly we’ll look at some rudimentary code and then we’ll look at scheduling.

Code

So here is a snippet to do that (including the reference).  You’ll notice that there are a couple of key features.  There are 2 parameters passed (the file name and the macro), then (in code) we simply open excel, then the workbook, execute the macro and save and close the workbook (and application)

using xls = Microsoft.Office.Interop.Excel;

static void Main(string[] args)
 {

    xls.Application _x = null;
     xls.Workbook _wb = null;

    string _filename = args[0];
     string _macro = args[1];

    _x = new xls.Application();
     _x.Visible = false;
     _wb = _x.Workbooks.Open(_filename);
     _x.Run(_macro);
     _wb.Save();
     _wb.Close();
     _wb = null;
     _x.Quit();
     _x = null;

}

So, if I want to run my code from the command console, you’d just run the following (note the macro is called main).

ExcelManager.exe “this-is-the-excel-workbook-name” “main”

or as a SQL Command, wrap it in a batch and then execute;

EXEC master..xp_CMDShell ‘c:\temp\excel_manager.bat’

A Gotcha

In debug mode, this all worked and the task could be scheduled and executed in windows scheduler.  However, when the the tasked was scheduled (SQL Server agent) or scheduled to run when a user was not logged in there would be be errors.  This related to file not found errors and other (generally other unhelpful and unreliable messages).  It turns out, that you’ll need to ensure that there are some folders required (don’t ask me why).  You may want to check if the following folders exist.  If not, try adding them;

C:\Windows\System32\config\systemprofile\Desktop
C:\Windows\SysWOW64\config\systemprofile\Desktop

An example of one of the type of errors you may get is;

excel_automation_error

Great Courses Down Under

For all Australian readers, there are a couple of great courses coming to Australia. 

Firstly, Barry Devlin is running a 2 day course from 24th August (see here).  For those that don’t know Barry, he is the un-official farther of data warehousing.  There at the start and  never go the media attention of others.  What I like about his approach (and specifically his later work) is that he challenges the status quo of data warehousing and information management.

Secondly, Hans Hultgren is running Data Vault certifications in July. You can find our more here AND, there is also a course being run in Brisbane (at our offices). If you want to get into Data Vault, I think this is a really great place to start.

If you want to find out any information about either of these courses, please feel free to ping me direct.

Chernoff Faces, R and Power BI

One of my favorite visualizations are Chernoff faces because they allow you to incorporate many attributes into a single visualization.  Compared to a bar chart (which only displays one attribute), you can get a ‘full picture’ of the situation in a single iconic image (pardon the pun).  Further, faces are a natural recognizable characteristic (for us humans) and this a fair bit of evidence to suggest you notice things in faces a lot easier than bland representations as a result of our social upbringing.  This post is about implementing Chernoff faces in Power BI (through R).  Firstly (in this post), I will the infamous crime statistics demonstration, and in the second post, I’ll show child faces against different dataset (Northwind) which incorporates normalization.

If you are not familiar with Chernoff faces, you can read more about them at Wikipedia here but basically they use facial characteristics to display an attribute value.  For example, the size (height) of the face may be related to income, the width of the face might be related to quantity with the size of the eyes related to margin.

Chernoff implementation in R

I will include only a snippet of R required to show Chernoff faces (and then the output).  A more complete tutorial can be found here and here.  Basically, we need to install the library, get some data and then display the visual.  This can be done with the following snippet;

install.packages(“aplpack”) # install the package
library(‘aplpack’)
crime <- read.csv(“
http://datasets.flowingdata.com/crimeRatesByState-formatted.csv”) #get the data
faces(
  crime[,2:8]
  ,face.type=0
  ,scale=TRUE
  ,main=’Crime Stats USA’
  ,labels=crime$state
  ,print.info=TRUE
)  #plot the chart

We then get the face visual (as below)

SNAGHTMLa8c21d3

Chernoff implementation Power BI

In Power BI, there are a few things we need to do.  These are;

  1. Setup Power BI for R
  2. Get our data (or build our model)
  3. Add the visualization (by)
    1. Telling the R (custom component R visualization what data to use) and
    2. Use R code to create/format the visualization

These steps are followed below;

1. Setting up Power BI for R

Before you can use R in Power BI, you need to install that are is installed on your local machine and that you have set and enabled R’s working directories in Power BI.  To set R options in Power BI use the File > Options and settings > Options menu path.  You can see that the directory of R needs to be set and you need to allow R to plot a visual in Power BI (this is a preview feature).  I’ve included these images show the both the R directory setting and the use of R to plot visuals.

SNAGHTMLa78230d

SNAGHTMLa77ed18

Once these features are enabled, you should see the R icon appear in your Visualisations pane (note that you’ll need to restart Power BI if you enabled R as in the above checkbox).

SNAGHTMLa7c01e5

2. Get Data

Usually we expect to have a data model in Power BI (that is, we have imported some data, built a model and now we want to visualize it).  For this demo though, I want to import the same data set that I used in the above snippet so I can use the R Script data source in Power BI.  Here, all I have to do is choose the R source (accepting that this is a preview connector), and paste the R (read) script into the R dialog that Power BI Shows.  This is shown in the 2 images below;

SNAGHTMLa9492f5

SNAGHTMLa9677d2

After this occurs, Power BI opens up the Data Navigator and you can either Load or Edit the Import query.  I’ll just Load it (as shown below)

SNAGHTMLa98a149

3. Add the Visualization

Once we have data in our Power BI workbook (you’ll notice that we have a crime table), all we need to do is;

  1. add the R visualization to the page canvas
  2. enable the script component (after all its custom component)
  3. add the data (that the visualization uses) and
  4. code the visualization (in R).  Note that the data available in R code (ie the data frame) is defined by the data that was added in the prior step. 

I have tried to show all these steps in the image below.  One thing to note about adding data to the visualization is that the order in which you add fields (from the model) defines the columns that are in the data frame.  This is important (for the Chernoff faces in R) because the first fields of the data frame defines the axis (grouping) of faces.  Note that you can also see this order in R script editor in Power BI (the code is in the image below and un-editable in Power BI but looks like # dataset <- data.frame (  columns )).

 image

You may recall (from the R script above) that we referred to crime as a data frame in R when we built created the code in R.  In Power BI however, the data frame is referred to as dataset so in order to use our original code, we need to either create a new data frame in R (called crime) or change our existing R code to use a data frame called dataset.  I’ll go with the first option and paste the following code into the R script editor (where you see ‘Paste or type your R-script code here).

crime <- dataset
library(‘aplpack’)
faces(
  dataset[,2:8]
  ,face.type=0
  ,scale=TRUE
  ,main=’Crime Stats USA’
  ,labels=crime$state
  ,print.info=TRUE
)  #plot the chart

In order to change the visual (while still editing code), just press the run button (in the R Script editor).  My Power BI screen now looks like the following;

SNAGHTML50a76d3

Of course the great thing about Power BI is that there is automatic filtering among visuals on the page.  The next thing to do is add a slicer for state and allow the user to select what states to show.  The R visual updates automatically.  Doing this we can easily build a visual that allows the user to select and compare the faces of the states they select (as below).

SNAGHTML50d8bb4