Creating Tables in Excel with VBA and External Data – Part II

In Part I, we looked at adding a data table to a Workbook which uses an external data source (we used OLAP but the process can be applied to any source). This post looks at manipulating an existing table.

Without VBA, we can manually manage the table – both the its properties and the underlying query. Simply right click on the table and select the Edit Query.. or External Data Properties from the popup menu. Changes are made to the table data are made automatically.

If we chose to edit the query, we can simply overwrite the Command Text (as seen in the image below). These changes well be automatically applied (including resizing the table by rows or columns for a different sized data set) once the OK button is clicked.

For External Data Properties, we can configure how the table reacts with new data. For example, you may notice that, the table accommodates additional rows and columns however, when the query returns a table with fewer rows and columns, the table retains its old sizing (number of columns) and includes a blank columns (for data that previously existed). You can manually resize this (dragging the bounds of the tables border) or set the properties of the table to overwrite existing data. If you want to ensure that this option exists and that new sizes are automatically incorporated into the table – make sure that the check box for Overwrite is marked in External Data Properties.


VBA

Now to the VBA – As commandant Lassard would say “There are many, many, many, many fine reasons to use VBA”. We have so much flexibility but let’s keep it simple, here’s what we I’ve set up.

Cell B1 is data validated based on the cell range D1:D2 – nice and simple. When we change that cell, the table updates for the new country.

In order to determine if the there is a change in or data (the Country selected) we have to create a worksheet event to capture and test the change. I have gone into this in some detail here and the code is below. Note that this needs to be added to the sheet code (not in a separate bas module). All we do is check that our only B1 is updated and then call the refresh subroutine.

Private Sub Worksheet_Change(ByVal Target As Range)

  ‘ update table if cell 1,B is changed
If Target.Cells.Count = 1 And Target.Cells.Row = 1 And Target.Cells.Column = 2 Then UpdateMyTable

End Sub

Now for the updating component – the bit that’s called when cell(B1) is changed. I think this is pretty straight forward but I’ll walk through it anyway. First, the code;

Public Sub UpdateMyTable()

  ‘ ensure that any new changes are reflected in the table dimensions
Sheet1.ListObjects(“Table_abax_sql3″).QueryTable.RefreshStyle = xlOverwriteCells

  ‘ set the comand text
Sheet1.ListObjects(“Table_abax_sql3″).QueryTable.CommandText = NewQuery(Sheet1.Cells(1, 2))
Sheet1.ListObjects(“Table_abax_sql3″).Refresh

End Sub

Private Function NewQuery(Country As String) As String

NewQuery = “select {[Measures].[Reseller Sales Amount] } on 0, ” & _
“[Product].[Category].[Category] on 1 ” & _
“from [Adventure Works] ” & _
“where [Geography].[Geography].[Country].&[” & Country & “]”


End Function

I’ve kept the same format as in the original post. The function NewQuery determines what the MDX should be – based on the provided country. All we is set the tables command to the new mdx (in (QueryTable.CommandText)) and refresh it.

I’ve also set the refresh style so that any changes in the command (grid size) are automatically reflected in the worksheet table.

That’s about the size of it! – I hope you find it useful.

Creating Tables in Excel with VBA and External Data – Part I

This post looks at how we can add a table to an Excel sheet which uses a MDX query as its source. This is a very handy feature to use for a couple reasons;

    1. The table retains the connection OLAP source (hence can be updated by a user at will)
    2. We can use it to extract data from MOLAP or tabular sources (i.e. run MDX or DAX)
    3. We can define complex queries to return a result set that cannot be obtained with a pivot table

Note that most workarounds for creating a table from OLAP sources rely on the creation of the pivot table, its formatting is a tabular source and a copy and paste the values. Hardly an attractive option!

  1. We can use the table!! – (This is really important for certain activities like data mining table analysis)

How to Do It

We’ll look at a simple query from adventure works;

select [Measures].[Reseller Sales Amount] on 0,
[Product].[Category].[Category] on 1
from [Adventure Works]
where [Geography].[Geography].[Country].&[Australia]

and an OLEDB connection string (note the OLEDB specification at the start of the string)

OLEDB;Provider=MSOLAP;Data Source=@server;Initial Catalog=Adventure Works DW 2008R2;

I have incorporated those to strings into 2 functions (MyQuery and MyConnectionString) – this just removes some of the clutter from the code.

Now we just need to use the ListObjects.Add method. The code (now in with all Sub’s and Functions) is pretty much the bare bones you need to add the table. In other posts, I’ll look into higher level of control for the output.

The CODE

The complete code is shown below. Ive included everything so it can simply be pasted into a new VB module

Sub CreateTable()

  With Sheet1.ListObjects.Add(SourceType:=0 _
, Source:=MyConnectionString() _
, Destination:=Range(“$A$1″) _
                            ).QueryTable
.CommandType = xlCmdDefault
.CommandText = MyQuery()
.ListObject.DisplayName = “MyMDXQueryTable”
.Refresh BackgroundQuery:=False
.PreserveColumnInfo = False

  End With

End Sub

Private Function MyQuery() As String

     MyQuery = “select [Measures].[Reseller Sales Amount] on 0, ” & _
“[Product].[Category].[Category] on 1 ” & _
“from [Adventure Works] ” & _
“where [Geography].[Geography].[Country].&[Australia]”

End Function

Private Function MyConnectionString() As String

     MyConnectionString = “OLEDB;Provider=MSOLAP;Data Source=@server;Initial Catalog=Adventure Works DW 2008R2;”

End Function

Walk through

This is pretty much the bare bones approach. As code walk through (see Sub CreateTable), we add the list object specifying its connection string and destination, set the command and refresh info. The only statement that is not entirely necessary is naming the table (see .ListObject.DisplayName) but I tend to think is a good idea because we will want to refer to it by name at a later stage.

Out Come

The code will add a table like the one in the following image. The field names are fully qualified which is not that nice and we will look at how this can be changed in another post. For now, our purpose is to get a table is in the workbook (the purpose of this post) so that it can be used as a table and refreshed.


PS – the code above adds the listobject by reference to the sheet within VBA (see Sheet1.ListObjects). Its probably worthwhile to point out that this is the sheet reference (ie the number of the sheet in the book) and not the name of the sheet.

One more thing – when the query uses refers to a attributes in a hierarchy the OLEDB result set (table) will include parent attributes of the hierarchy as a column. This is nothing to worry about for the moment!

Next - changing the tables query.

DAX III – Filtering Queries with CALCULATETABLE and FILTER

So far the queries that we have created in DAX have focused on the entire result set (with the exception of the start at clause in evaluate). Naturally, we would want to restrict the rowset in some way and this is the focus of this post – how to filter row-set tables in DAX queries. There are two ways to do this, firstly, we can use the CACULCUATETABLE syntax and secondly we can use a FILTER that covers the evaluate syntax.

BOL defines the function of CALCULATETABLE as “Evaluates a table expression in a context modified by the given filters” taking the form;

 

CALCULATETABLE(<expression>,<filter1>,<filter2>,…)

 

There are many applications of how we can use this, however, let’s look at the first of reducing the rows returned, say for the first six months of 2000. We can evaluate this query like this.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates'[year]
                , ‘dates'[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales'[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    )
)

 

One of the restrictions of using CALCULATETABLE to restrict rows is that the filter cannot be applied to calculated measures. So if we extended the filter arguments to include a condition on ‘sales_amount’, we would get an error.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates'[year]
                , ‘dates'[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales'[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    , calculate(sum(‘sales'[amount])) > 10000
    )
)

A function ‘CALCULATE’ has been used in a True/False expression that is used as a table filter expression. This is not allowed.

 

Note that the filter arguments in calculate table are applied to the underlying table regardless of the fields returned (projected or selected). So, if we wanted to sum for the first 6 months sales for the year 2000, we could simply remove the ‘dates'[month] field from the summarize and the calculation is applied only for Jan to Jun.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates'[year]
            )
        
            , “sum_amount”calculate(sum(‘Sales'[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
   
    )
)

 

We could also include this restriction in the definition of the calculation and achieve the same results (see sum_amount2 below).

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates'[year]
                
            )
        
            , “sum_amount”calculate(sum(‘Sales'[amount]))
            , “sum_amount2″calculate(sum(‘sales'[amount])
                                        , all(dates)
                                        , ‘dates'[Year]=2000
                                        , ‘dates'[Month] <6
                                        , ‘dates'[Month] >=1
                                        )
        )
        
        
    , dates[year]=2000
    , ‘dates'[Month] <6
    , ‘dates'[Month] >=1

    )
)

 

From the rowset point of view, we are still faced with the problem of how to restrict rows based on a calculated value. In order to do this, we can apply the FILTER function against the table. Filter returns a subset of the table and takes the form;

 

FILTER(<table>,<filter>)

 

Consider yearly sales for the first six months of the year as below;

evaluate 
(
    calculatetable
    (
         addcolumns
        (
            summarize
            (
                ‘dates’
                , ‘dates'[year]       
            )
            , “sum_amount”calculate(sum(‘Sales'[amount]))
        )
    , ‘dates'[Month] <6
    , ‘dates'[Month] >=1
    )
)

 

A filter can be applied to this rowset so that only sales over 50,000 are returned. Note that it is wrapped around the CALCUALTETABLE.

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates'[year]
                    )
                    , “sum_amount”calculate(sum(‘Sales'[amount]))
                )
            , ‘dates'[Month] <6
            , ‘dates'[Month] >=1
            )
    
            , calculate(sum(‘Sales'[amount])) > 50000 
    )

)

 

In these situations, it may be best to define the measure explicitly;

define measure ‘dates'[sum_amount] = calculate(sum(‘Sales'[amount]))

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates'[year]
                    )
                    , “sum_amount”‘dates'[sum_amount]
                )
            , ‘dates'[Month] <6
            , ‘dates'[Month] >=1
            )
    
            , ‘dates'[sum_amount] > 50000 
    )

)

 

DAX II – Extending the use of Evaluate

In a previous post, I looked at evaluate as the method to return a table in a row-set form. However, the query syntax allows us to more precisely define the result set and this is the focus of this post.

Books on line specifies three optional arguments that can be used in query syntax to control, there are DEFINE, ORDER BY and START AT. The syntax follows the format;

 

[DEFINE { MEASURE <tableName>[<name>] = <expression> }

BISM : Competing & Continuous Store Sales using DAX

Retail analysis models often include a competing outlet indicator to improve the quality of the model. The indicator does this by allowing the user to manage the effect that is associated with partial trading periods. For example, the true / false indicator shows whether the store has traded for all available weeks in a month. This post looks at how the competing store indicator can be implemented using powerpivot and DAX.

The Competing Store Definition

There are a few definitions of competitive stores (and perhaps another post to follow). In this post, the definition of competitive is a store that trades for every week in the month. If the store has not traded for all weeks, it is not considered competitive and this permits the identification of sales that occurred when;

  1. The store was open or closed for the month in question or
  2. The store didn’t trade for some other reason (eg refurbishment)

The Model

The underlying model considers a store dimension, date dimension, and sales data. The grain for sales is { store x day } à sales amount. Although, this implementation treats the sales relation {store x day} as unique, there is no requirement to do so. The dates table follows standard structure of a date dimension table (grain of day) with the fiscal calendar attributes aligned along a 445 Calender. The fields fiscal_fullweek_key has the format YYYYWW and fiscal_fullmonth_key has the format YYYYMM. These are standard implementations where YYYY refers to the century, WW refers to the week of year and MM refers to the month of year.


The store table is a type 1 dimension (showing the ‘current’ state of the store). There is a strong argument to show the [competing store] as an attribute of the store dimension however, this would require a type 2 store dimension and intensive ETL. This is outside the management of the end user who consumes available data as it would require more intensive resources and relational constructs.

Method

The approach to determine whether the sale is competing or not is based on the comparison of the stores actual trading weeks in a month compared to the available trading weeks. When the two values are the same the sale is competitive, otherwise it is not. Additionally, note that this formula is applicable to each row of the sales fact because the sales fact is the only table that combines both store and date data and can therefore be used as the auto-exists cross-join of stores and dates.

For each row in the sales fact, we must determine;

  1. weeks_in_month as the number of weeks in the current month, where current refers to the row context of date (ie the number of weeks in the month of the sales date).
  2. sales_in_month as the number of weeks in the current month that the store traded. Again, current refers to the row context of sale date.

In order to improve readability of the formula the sales table has been extended to include a field full_month_key (formula = RELATED(fiscal_fullmonth_key)) and full_week_key (formula = RELATED(fiscal_fullweek_key))

Formula 1 – Available Trading Weeks in Current Month (weeks_in_month)

The available weeks in the current month are the distinct count of weeks for the related month. We can achieve this by removing the filter context of the dates table and reapplying it so that the row sales month equals the dates month (below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , filter(all(Dates), Sales[fullmonth_key]=Dates[fiscal_fullmonth_key])

        )

If the sales table did not include the fullmonth_key field, we could use the related function directly in the formula.

CALCULATE(

            DISTINCTCOUNT(Dates[fiscal_weeks_key])

            , filter(all(dates), RELATED(Dates[fiscal_fullmonth_key])=Dates[fiscal_fullmonth_key])

        )

Alternatively, we can remove the row filter context of the dates table to the extent that it only includes the current month. This is done with the ALLEXCEPT function so that the dates table filters are removed except for the listed column restrictions (as below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , ALLEXCEPT(Dates,Dates[fiscal_fullmonth_key])

        )

 

Formula 2 – Weeks Traded in the Current Month (weeks_sales_in_month)

To determine the number of weeks that a store traded for (in the current month), we use a similar restriction but apply it to the sales table. That is, count distinct weeks after we remove the current row filter context and reapply it based on the current month of the rows sale date.

CALCULATE(

        DISTINCTCOUNT(sales[week_key])

        , ALLEXCEPT(sales,sales[store_key], sales[fullmonth_key])

        )

 

Formula 3 – Continuing Sales

The final (and only visible formula) is a simple if function that compares the trading weeks to those available for a true/false output.

=if([weeks_in_month]=[weeks_sales_in_month], “Yes”, “No”)

Final Model

The final schema for the model is shown below. The measure [Sales Amount] is simply the sum of the sales_amount field.


Follow

Get every new post delivered to your Inbox.

Join 277 other followers