## SQL Saturday Lisbon Portugal

I can only express my appreciation also – thanks guys!

Originally posted on Paul Turley's SQL Server BI Blog:

We’re on our way home from Lisbon, Portugal after speaking at SQL Saturday #267. Having had the pleasure to attend and speak at quite a few SQL Saturday events over the years, I’ve never seen a group of event organizers work so tirelessly with as much passion and dedication. Thanks to Niko Neugebauer, Vitor Pombeiro, Paulo Matos, Quilson Antunes, Murilo Miranda, André Batista and Paulo Borges for the late nights and long hours you all invested to make this a very special SQL Saturday. The event was a great success; as well as a special day of sightseeing the day afterward for all the speakers. After recruiting an impressive list of speakers from all over the globe, these volunteers went well beyond the call of duty to chauffer speakers from the airport to hotels, the event venues and around the city. It was quite a treat. Thank you!

## Creating Tables in Excel with VBA and External Data – Part II

In Part I, we looked at adding a data table to a Workbook which uses an external data source (we used OLAP but the process can be applied to any source). This post looks at manipulating an existing table.

Without VBA, we can manually manage the table – both the its properties and the underlying query. Simply right click on the table and select the **Edit Query..** or **External Data Properties** from the popup menu. Changes are made to the table data are made automatically.

If we chose to edit the query, we can simply overwrite the Command Text (as seen in the image below). These changes well be automatically applied (including resizing the table by rows or columns for a different sized data set) once the OK button is clicked.

For **External Data Properties**, we can configure how the table reacts with new data. For example, you may notice that, the table accommodates additional rows and columns however, when the query returns a table with fewer rows and columns, the table retains its old sizing (number of columns) and includes a blank columns (for data that previously existed). You can manually resize this (dragging the bounds of the tables border) or set the properties of the table to overwrite existing data. If you want to ensure that this option exists and that new sizes are automatically incorporated into the table – make sure that the check box for **Overwrite** is marked in External Data Properties.

**VBA
**

Now to the VBA – As commandant Lassard would say “There are many, many, many, many fine reasons to use VBA”. We have so much flexibility but let’s keep it simple, here’s what we I’ve set up.

Cell B1 is data validated based on the cell range D1:D2 – nice and simple. When we change that cell, the table updates for the new country.

In order to determine if the there is a change in or data (the Country selected) we have to create a worksheet event to capture and test the change. I have gone into this in some detail here and the code is below. Note that this needs to be added to the sheet code (not in a separate bas module). All we do is check that our *only* B1 is updated and then call the refresh subroutine.

Private Sub Worksheet_Change(ByVal Target As Range)

‘ update table if cell 1,B is changed

If Target.Cells.Count = 1 And Target.Cells.Row = 1 And Target.Cells.Column = 2 Then UpdateMyTable

End Sub

Now for the updating component – the bit that’s called when cell(B1) is changed. I think this is pretty straight forward but I’ll walk through it anyway. First, the code;

Public Sub UpdateMyTable()

‘ ensure that any new changes are reflected in the table dimensions

Sheet1.ListObjects(“Table_abax_sql3″).QueryTable.RefreshStyle = xlOverwriteCells

‘ set the comand text

Sheet1.ListObjects(“Table_abax_sql3″).QueryTable.CommandText = NewQuery(Sheet1.Cells(1, 2))

Sheet1.ListObjects(“Table_abax_sql3″).Refresh

End Sub

Private Function NewQuery(Country As String) As String

NewQuery = “select {[Measures].[Reseller Sales Amount] } on 0, ” & _

“[Product].[Category].[Category] on 1 ” & _

“from [Adventure Works] ” & _

“where [Geography].[Geography].[Country].&[" & Country & "]“

End Function

I’ve kept the same format as in the original post. The function NewQuery determines what the MDX should be – based on the provided country. All we is set the tables command to the new mdx (in (QueryTable.CommandText)) and refresh it.

I’ve also set the refresh style so that any changes in the command (grid size) are automatically reflected in the worksheet table.

That’s about the size of it! – I hope you find it useful.

## Creating Tables in Excel with VBA and External Data – Part I

This post looks at how we can add a table to an Excel sheet which uses a MDX query as its source. This is a very handy feature to use for a couple reasons;

- The table retains the connection OLAP source (hence can be updated by a user at will)
- We can use it to extract data from MOLAP or tabular sources (i.e. run MDX or DAX)
- We can define complex queries to return a result set that cannot be obtained with a pivot table

*Note that most workarounds for creating a table from OLAP sources rely on the creation of the pivot table, its formatting is a tabular source and a copy and paste the values. Hardly an attractive option!
*

- We can use the table!! – (This is really important for certain activities like data mining table analysis)

**How to Do It
**

We’ll look at a simple query from adventure works;

select [Measures].[Reseller Sales Amount] on 0,

[Product].[Category].[Category] on 1

from [Adventure Works]

where [Geography].[Geography].[Country].&[Australia]

and an OLEDB connection string (note the OLEDB specification at the start of the string)

OLEDB;Provider=MSOLAP;Data Source=*@server*;Initial Catalog=Adventure Works DW 2008R2;

I have incorporated those to strings into 2 functions (MyQuery and MyConnectionString) – *this just removes some of the clutter from the code.*

Now we just need to use the ListObjects.Add method. The code (now in with all Sub’s and Functions) is pretty much the bare bones you need to add the table. In other posts, I’ll look into higher level of control for the output.

**The CODE
**

The complete code is shown below. Ive included everything so it can simply be pasted into a new VB module

Sub CreateTable()

With Sheet1.ListObjects.Add(SourceType:=0 _

, Source:=MyConnectionString() _

, Destination:=Range(“$A$1″) _

).QueryTable

.CommandType = xlCmdDefault

.CommandText = MyQuery()

.ListObject.DisplayName = “MyMDXQueryTable”

.Refresh BackgroundQuery:=False

.PreserveColumnInfo = False

End With

End Sub

Private Function MyQuery() As String

MyQuery = “select [Measures].[Reseller Sales Amount] on 0, ” & _

“[Product].[Category].[Category] on 1 ” & _

“from [Adventure Works] ” & _

“where [Geography].[Geography].[Country].&[Australia]“

End Function

Private Function MyConnectionString() As String

MyConnectionString = “OLEDB;Provider=MSOLAP;Data Source=@server;Initial Catalog=Adventure Works DW 2008R2;”

End Function

**Walk through
**

This is pretty much the bare bones approach. As code walk through (see Sub CreateTable), we add the list object specifying its connection string and destination, set the command and refresh info. The only statement that is not entirely necessary is naming the table (see .ListObject.DisplayName) but I tend to think is a good idea because we will want to refer to it by name at a later stage.

**Out Come
**

The code will add a table like the one in the following image. The field names are fully qualified which is not that nice and we will look at how this can be changed in another post. For now, our purpose is to get a table is in the workbook (the purpose of this post) so that it can be used as a table and refreshed.

*PS – the code above adds the listobject by reference to the sheet within VBA (see *Sheet1.ListObjects*). Its probably worthwhile to point out that this is the sheet reference (ie the number of the sheet in the book) and not the name of the sheet.
*

*One more thing – when the query uses refers to a attributes in a hierarchy the OLEDB result set (table) will include parent attributes of the hierarchy as a column. This is nothing to worry about for the moment!
*

Next - changing the tables query.

## Microsoft Tabular Modeling Cookbook

I am pleased to announce that my tabular modelling book is finished. The title is succulent – ‘Microsoft Tabular Modeling Cookbook’ and its available from packt (and others) at http://www.packtpub.com/microsoft-tabular-modeling-cookbook/book

There is (naturally) a lot in the book – but it is designed to get you up and running fast. We look at all flavours of modelling – both power pivot and SSAS and pretty much everything in between.

Of course any comments and feedback is welcome!

## Excel New Feature – INQUIRE

Excel 2013 included some great new features for BI. However, one really cool feature that flew under the radar was Inquire. For all intents and purposes, this is formula tracing at the GYM, on steroids and the only the only runner in the race. It’s a great inclusion to understand the structure of the workbook including dependencies between books (including formula references), record the formulas and document just about anything in the book.

So let’s start with a quick recap of formula auditing. This is still available and shows the how a cell is derived. You typically activate the cell and then press the Trace Precedents (if you want to see what cells are used in the cells formula) or Trace Dependents (if you want to see what other cells have a formula which is dependent on the selected cell). These are in the formulas ribbon. We can see how this works in the following image where the dependencies of cell B7 are shown as B5 and B6 (note the blue line connecting the cells).

When the formula was a linked value from another book, a grid would show to indicate and external reference. In the prior image, the cells A1 and A2 refer to cells in another workbook (called book 2) and so tracing the Precedents would show the following.

**Now let’s compare this to Inquire
**

*Firstly, Inquire must be activate as an Excel add-in. Go to File à Options, then chose add-ins from the options pane, then manage click Go from the Excel Add-Ins dropdown.
*

*Ensure that the Inquire add-in is checked. Then an Inquire ribbon tab should be present.
*

The ribbon gives us many options, but let’s focus on the structure of the work book. If we click the Workbook analysis button, a new window will open which allows us to chose what we want to look at. For example, we could list formulas by selecting the **All formulas** from the **Formulas** node in the tree (as shown). Note that all formulas are listed which includes a reference to an external book.

Don’t like the window? We can even export the results for any number of selected nodes (to a new workbook) by hitting the ‘Excel Export’ button (in this window).

We can investigate the relationships (both Precedents and Dependencies) of a particular cell (just click the Cell Relationship button in the ribbon). Here a new window opens with our choices (we can chose the direction and number of expansion levels (for investigation)).

This post has just scratched the surface – there are a few other interesting options to explore (eg file comparisons and passwords), however, this feature should be very useful for tracing (and perhaps more importantly documenting) formulas in books.

## Understanding the LIFT CHART

The lift chart is synonymous with evaluating data mining model performance and the predictive power of one model against another. Often, in presentations and training sessions it is suggested that the chart is indicative of the models ability to accurately predict within a training population. For example, the following explanation is provided;

*“the lift chart shows that this model is good because it only needs to evaluate 30% of data in order to correctly predict all the target outcomes”
*

This type of statement is simply not true – it is INCORRECT, WRONG, MISLEADING and shows a lack of understanding about what the chart represents. This post looks at explaining the chart by examining how it is created - seeking to remove some of the misconceptions about the use of the chart.

Consider the following example. In this chart it would be argued that an ideal model (red line) would only need ~ 55% of the population in order to predict all the target states. Without a model, we would need to use the entire population and so our model (being somewhat useful) lies between the best model and a random guess. These values can be determined by the intercepts each model with the X axis (note that at 55% of the population, the best model achieves 100% accuracy).

Another common question arising from the interpretation of this chart occurs when we know that the target (predicted) state is found in only 55% of the population. The question is “why do we show 100% accuracy when only 55% of the population can exist in the predicted state and therefore the Y axis should have a maximum of 55%”.

For my own analysis, I shall ask a question of the reader so that the construction of the chart can better be understood. The question is simple.

**If my model does not predict the correct result 100% of the time how could my accuracy ever achieve 100%? Let’s be realistic, it would have to be a pretty impressive model to never be wrong – and this is what the chart always shows à 100% accuracy!
**

**Now let’s look at construction
**

In order to create a lift chart (also referred to as an accumulative gain) the data mining model needs to be able to predict the probability of its prediction. For example, we predict a state and the probability of that state. Within SSAS, this is achieved with the PredictProbability

function.

Now, since we can include the probability of our predictions, we would naturally order training data by the predicted probability in suggesting the likelihood of a test case being the predicted target state. Or perhaps put another way, if I only had 10 choices for choosing which test case (that is a an observation from the testing data) would be the predicted value, I would choose the top 10 testing cases based on their predicted probability – after all the model is suggesting that these cases have the highest probability of being the predicted state.

As we move through the testing data (and the predicted probability decreases), it is natural to expect the model to become less accurate – will make more false predictions. So let’s summarise this (training) data. For convenience, I have group my training data into 10 bins and each bin has ~ 320 cases (the red line below). Working with the assumption that the predictive power of my model decreases with probability, the number of predictions also decreases as we move through more of the training data. This is clearly visible in the chart and data below – the first bin has a high predictive power (275 correct predictions) while the last bin has only 96 correct predictions.

If I focus on the models ability to correctly predict values, I will notice that it can predict 1,764 correct results – but now let’s turn our look to the accumulative power of the model. If, from the set of my sample data I could only choose 322 cases (coincidently this is the number of cases in bin 1), I would choose all cases from Bin 1 and get 275 correct (or 16% of the possible correct values). If I had to choose 645 cases, I would choose the cases from bin 1 & 2 and get 531 correct (30% of correct possibilities). This continues with the more predictions that I make and is summarised in the following table.

This data is transposed onto the lift chart – the Bin on the X axis (representing the % of population) and the Percent Running Correct on the Y axis (representing the number of correct predictions). As we can see, the data is indicative of the models ability to quickly make accurate predictions rather than its overall predictive ability.

**Best and Random Cases
**

The chart also includes best and random cases as guidelines for prediction – let’s focus on these. These lines are theoretical – really ‘what if’ type of scenarios.

Suppose that I had an ideal model. If this was the case my model would predict 322 in bin 1, 323 in bin 2 and so on – it must because we have ordered the data by PredictProbability and in a perfect world we would get them all correct! However, the model can only predict 1,764 correct values -we know this from the actual results. Because of this we would only need up to bin 6 to get all our correct values (see column ‘Running Correct Best Case’ in the following table. Just as we did for the model prediction we can convert this to a percent of total correct (the population) and chart it with the model.

Now for the random guess – again this is theoretical. I know that I can only predict 1,764 correct values so, if these were evenly distributed amongst my bins, I would have ~176 correct predictions in each bin. This is then added to the chart.

**What is the Problem?**

Now we can see that the chart is essentially just a view of how quickly the model makes accurate predictions. Perhaps there is nothing wrong with that but what happens when we compare models? Well, in this case, the comparison is relative. Those steps are reproduced for each chart and what you essentially see is relative comparative performance. Thus, the comparison of two models in the charts gives NO indication of performance accuracy – after all how could they since they each plot relative percent accuracy for their own states.

For this reason, relying on this chart as a sole measure of accuracy is just dangerous and really shows very little about the total accuracy of the model.

**Conclusion
**

Understanding how the lift chart has been constructed can help in understanding how to interpret it. We can see that it indicates the accumulative power of the model to give predictions – or perhaps more correctly the accumulative power of the model to give **its **correct prediction.