Archive

Archive for the ‘DAX’ Category

The Last Non-Empty Dilemma TABULAR Style

March 21, 2012 7 comments

In a previous post, I addressed the function of the last non empty aggregation in SSAS (OLAP). Specifically, I looked at what happens when there are ‘missing’ chucks of data in the fact table and how the OLAP engine natively carries forward the balance to an aggregated time member without regard for the what is the ‘true’ last date. That is, we expect the last date of stock to be applied for all aggregated date values regardless of whether data exists or not. This post looks at the LNE calculation in the tabular model.

Generally, we expect the stock schema to take the form of the fact (balances) surrounded by the dimension table. The fact table (Stock) holds the [Quantity on Hand] (on_hand) for a given date with the Dates table showing a natural hierarchy of Year, Month and Date.

Summing the column on_hand will allow the calculation of daily balances (we will be able to show correct balances when a date is selected) however, if we use any aggregated Date field, the [Stock on Hand] value will be incorrect because the formula adds the amount for all days filtered by the date restriction.

For example, if we consider a very small subset of data (shown as ‘Source Data’) and [Stock on Hand] defined as

Stock On Hand:=Sum([on_hand])

We can see that Jan 2011 for Product A (20)=15+4+1 and for product B (2011)=10+8

Source Data [Stock On Hand] using Sum

Clearly, the addition for aggregate time members does not give us what we want.

The next thought is to change the stock filter context to the max of time. That is, return the sum of stock for the last date in the selected time period. For example;

Stock On Hand:=CALCULATE(sum([on_hand]), FILTER(ALL(Dates[date]), Dates[date]=max(Dates[date])))

The problem with this approach is that the aggregate values will only show for the last date in the month (or year). This is not a problem when we have full year worth of data in the fact and the maximum of the date filter is the last date in the period. But in this case we don’t and usually the most recent data will not be complete for the year. For Product A in the picture below, the total for Jan-2011 is correct (1) because the maximum date for Jan is 31-Jan and has data (see source). However, in the year context (ie 2011) the maximum date is 31-Dec-2011 and there is no stock for this and therefore the sum is empty.

Source Data [Stock On Hand] using max(date.date)

To solve this problem, it is important to remember that the filter context applies a filter to the fact. So, for the filter context of Jan-2011 (say when Jan-2011 is on a row), the data in the fact is filtered for all the dates in Jan. What we are really interested in therefore is the last date in the fact for the current filter context.

Stock On Hand:=CALCULATE(sum([on_hand]), filter(Stock, Stock[date_key]=Max(Stock[date_key])))

In this formula, we recognise that the filter context has been applied to the fact table and use this to derive the maximum date for the current filter period and then use this to further filter the fact.

Source Data [Stock On Hand] using max(Stock.date)

Alternate Approach using Transactions

In addition to using the quantity of stock as a field in the fact, we may also consider using the transactional amount. In this case the use of summing against the Date table filter context works because we are interested in summing all data less than the max date of the filter context. In this situation, our formula would be;

Stock On Hand:=CALCULATE(sum([qty_mvt]), FILTER(Stock, Stock[date_key]<=max(Dates[date])))

Tabular verse Dimensional Design Concepts

February 8, 2012 1 comment

The multidimensional model appears to provide a richer environment for model design. However, for the case of snapshot fact tables, the tabular design may offer a much faster design and refreshing options. This post looks at the snapshot design and how business requirements are modelled in both multidimensional and tabular platforms.

The Snapshot Fact

Snapshot data (or accumulating snapshot fact tables) are usually used to record process driven data where records change over time. Consider, for example, the tables FactResellerSales and FactInternetSales tables in AdventureWorks. Here, the record shows Order Date, Due Date and a Ship Date. When the sale is made, the Order Date is known and the Due date is probably known, however, the ship date can change and is only truly known after the order has shipped. For this type of fact table, data should be added when the sale occurs and updated as information about the sale comes to light (that is, when it is shipped).

Classic Star Modelling

The standard way to model this relationship is through the reuse of the date table. That is, each fact field relates to the same Date.DateKey field in the dates table. This is shown for the tabular and multidimensional design models below

Tabular

Multidimensional

 

Multidimensional Interpretation

When a cube is built from this design, the Dates dimension becomes a role playing dimension and is joined to the fact for each relationship identified in the data source view. Thus, the Dates dimension is reused by the cube and with the dimension names appearing as the name of the fact field (after camel case conversion). Notice that there is a single dimension in the solution however, there appears to be three date dimensions (Order Date, Due Date and Ship Date).

Solution View

Cube View

 

While this approach may provide answers to simple business questions (eg, What is that value / quantity of product shipped on date xyz), the modelling technique fails when the query becomes complicated across restrictive dates. For example, it is not straight forward to determine the quantity ordered, shipped and due on date xyz.

Tabular Interpretation

In contrast to the multidimensional model, the tabular model employees an active relationship as the default join between fact and dimensions and each table appears only once in the ‘dimensional representation’ of the model. For example, there is only one date table in the pivot view.

By default, aggregation functions will use this relationship. In this situation, the active relationship (solid line) is between the OrderDate and the Date table. A sum measure ( sum([OrderQuantity])
) defined without context will therefore show the quantity ordered on each date.

The tabular model also allows functions to specify which relationship will be used as joins between tables. Therefore, the quantity of products shipped on a date can be determined by specifying the relationship between ResellerSales.ShipDateKey and Dates.DateKey. For example,

Ship Quantity:=CALCULATE(sum([OrderQuantity]),USERELATIONSHIP(‘ResellerSales’[ShipDateKey],’Dates’[DateKey]))

This allows the determination of measures that relate to more generic dimensions. For example, we can easily define [Order Quantity], [Ship Quantity], [Due Quantity] which specifies these values by date. This is in direct contrast to the default multidimensional behaviour and allows for more native browsing. For example, the date x value pivot below quickly identifies the sparse nature and trend of adventure works data.

I had never looked at adventure works data like this before. Here we can easily see that products are ordered on the 1st of the month, shipped on the 8th and due on the 13th. There are very few exceptions to this in the fact data.

 

Compromise?

The UDM can be designed to produce this outcome; however, it is not part of the ‘default’ behaviour. One way to achieve this would be to conform all dates into a single field (for example through a union) and specify separate measures for each union join (ie; add the Order Data, then the Ship Data and finally the Due Date data). However, this would require longer load times (since we are effectively staking facts) and increase the measure group size. The tabular approach (in my opinion) is a much nicer compromise.

NB It is also easy to mimic the multidimensional behaviour in tabular. Mulitple date tables are added to the model (one for each fact date) and labelled ‘Ship Date’ , ‘Due Date’, …

 

 

Categories: BISM, DAX, MDX, SSAS Tags: , ,

PowerPivot and Linked Table Names

January 28, 2012 Leave a comment

When PowerPivot creates a linked table, it appears to assign an arbitrary name to the import. The table name in powerpivot (Table1, Table2 etc) is then usually renamed as part of the design process. However, this practice is sloppy because the table name in excel is different to that of powerpivot and the name in excel is poorly defined (which may confuse anyone updating data at a later stage). This post looks at methods of managing linked tables in powerpivot and excel in order to provide more robust models.

Import (without name)

The standard way of creating a linked table in powerpivot is simply to select the data range, click on the powerpivot ribbon and select ‘Create Linked Table’. Once this is done the ‘Create Table’ dialog displays (usually the user indicates the table has headers) and the table is imported into powerpivot.

The imported table is then renamed in powerpivot with a double click (or right click à rename) so that table name has meaning in the pivot table.

In excel, we can see that this operation has also created an excel table by selecting the ‘Name Manager’ button from the Formulas ribbon. A dialog showing all tables and ranges in the workbook is displayed. Note that Table1 has been created.

The creation of the linked table in powerpivot has defined a table in excel and then created a connection for that table in powerpivot. We can see this in powerpivot by the definition of the linked table.

While this may achieve the outcome of creating a usable powerpivot model, it can become frustrating for the user (or anyone updating excel data) because the definitions in excel (ie Table1) are not the same as the table names in powerpivot (eg Dates).

Importing a Named Table

One way around this is to define the excel table and its name before the excel data is imported into powerpivot. This is easily achieved by selecting a formatting style from the ‘Format as Table’ button in the Home ribbon. The name of the table can be edited using the name manager (Formulas à Name Manager à Edit (after the appropriate table has been selected)).

Now, when the powerpivot linked table is created, it is automatically imported with the same name as the excel table.

Rename the Table and Manage the Import

Where the powerpivot table is created first and then the name of the table is changed in excel, the definition of the table in powerpivot will be in error. This does not present as a problem until the data in powerpivot is refreshed. When the data is refreshed, an ‘Errors in Linked Tables’ dialog is shown (as below) so that the connection can be managed.

The options for management (as below) are rather self-explanatory and allow for;

  • The selection of another excel table. Note that this is any excel table (defined table) that is not linked to a power pivot table.
  • Materialising the table (that is, removing the link to excel)
  • Deleting the powerpivot table from the model

The selection of an existing excel table completely redefines the import so that the refresh will alter the columns and data in the table so that the powerpivot table will be exactly the same as the excel table. While this may seem intuitive, the implication for the model is enormous because the linked table in powerpivot can change structure (adding columns etc).

Other Linked Table Management

Alternatively, a linked table can be managed by the ‘table definition’ button in the ‘Linked Table’ ribbon in powerpivot (as highlighted below). When selected, the dropdown shows all tables that have been defined in the excel workbook (regardless of their usage as linked tables), and, upon table change, imports the new definition and data.

If the excel table is already used as a linked table in the powerpivot model, the user is warned that the table is already used in a link and continuing will break the link (as below). If the user continues, the ‘old linked table’ is materialised and cannot be updated.

Conclusion

The use of linked tables in powerpivot provides a flexible way of importing, managing and using data in the powerpivot model. However, the automatic creation of linked tables without good naming conventions between powerpivot and excel may be confusing as the model is further developed, changed and updated. In-fact, every excel workbook model regardless of the use of powerpivot should implement good naming conventions. However, this can be managed with excel and powerpivot in various ways at various stages of model development.

Categories: BISM, DAX, PowerPivot Tags: ,

Whitepaper and Samples Released: Data Analysis Expressions (DAX) In the Tabular BI Semantic Model

January 4, 2012 Leave a comment

If your looking to get into DAX and the tabular model but not sure where to start or where dax fits in, theres a new whitepaper released.  It gives a good overview and will get you started in no time.

http://www.microsoft.com/download/en/details.aspx?id=28572

Categories: BISM, DAX Tags: ,

BISM : Competing & Continuous Store Sales using DAX

December 29, 2011 Leave a comment

Retail analysis models often include a competing outlet indicator to improve the quality of the model. The indicator does this by allowing the user to manage the effect that is associated with partial trading periods. For example, the true / false indicator shows whether the store has traded for all available weeks in a month. This post looks at how the competing store indicator can be implemented using powerpivot and DAX.

The Competing Store Definition

There are a few definitions of competitive stores (and perhaps another post to follow). In this post, the definition of competitive is a store that trades for every week in the month. If the store has not traded for all weeks, it is not considered competitive and this permits the identification of sales that occurred when;

  1. The store was open or closed for the month in question or
  2. The store didn’t trade for some other reason (eg refurbishment)

The Model

The underlying model considers a store dimension, date dimension, and sales data. The grain for sales is { store x day } à sales amount. Although, this implementation treats the sales relation {store x day} as unique, there is no requirement to do so. The dates table follows standard structure of a date dimension table (grain of day) with the fiscal calendar attributes aligned along a 445 Calender. The fields fiscal_fullweek_key has the format YYYYWW and fiscal_fullmonth_key has the format YYYYMM. These are standard implementations where YYYY refers to the century, WW refers to the week of year and MM refers to the month of year.


The store table is a type 1 dimension (showing the ‘current’ state of the store). There is a strong argument to show the [competing store] as an attribute of the store dimension however, this would require a type 2 store dimension and intensive ETL. This is outside the management of the end user who consumes available data as it would require more intensive resources and relational constructs.

Method

The approach to determine whether the sale is competing or not is based on the comparison of the stores actual trading weeks in a month compared to the available trading weeks. When the two values are the same the sale is competitive, otherwise it is not. Additionally, note that this formula is applicable to each row of the sales fact because the sales fact is the only table that combines both store and date data and can therefore be used as the auto-exists cross-join of stores and dates.

For each row in the sales fact, we must determine;

  1. weeks_in_month as the number of weeks in the current month, where current refers to the row context of date (ie the number of weeks in the month of the sales date).
  2. sales_in_month as the number of weeks in the current month that the store traded. Again, current refers to the row context of sale date.

In order to improve readability of the formula the sales table has been extended to include a field full_month_key (formula = RELATED(fiscal_fullmonth_key)) and full_week_key (formula = RELATED(fiscal_fullweek_key))

Formula 1 – Available Trading Weeks in Current Month (weeks_in_month)

The available weeks in the current month are the distinct count of weeks for the related month. We can achieve this by removing the filter context of the dates table and reapplying it so that the row sales month equals the dates month (below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , filter(all(Dates), Sales[fullmonth_key]=Dates[fiscal_fullmonth_key])

        )

If the sales table did not include the fullmonth_key field, we could use the related function directly in the formula.

CALCULATE(

            DISTINCTCOUNT(Dates[fiscal_weeks_key])

            , filter(all(dates), RELATED(Dates[fiscal_fullmonth_key])=Dates[fiscal_fullmonth_key])

        )

Alternatively, we can remove the row filter context of the dates table to the extent that it only includes the current month. This is done with the ALLEXCEPT function so that the dates table filters are removed except for the listed column restrictions (as below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , ALLEXCEPT(Dates,Dates[fiscal_fullmonth_key])

        )

 

Formula 2 – Weeks Traded in the Current Month (weeks_sales_in_month)

To determine the number of weeks that a store traded for (in the current month), we use a similar restriction but apply it to the sales table. That is, count distinct weeks after we remove the current row filter context and reapply it based on the current month of the rows sale date.

CALCULATE(

        DISTINCTCOUNT(sales[week_key])

        , ALLEXCEPT(sales,sales[store_key], sales[fullmonth_key])

        )

 

Formula 3 – Continuing Sales

The final (and only visible formula) is a simple if function that compares the trading weeks to those available for a true/false output.

=if([weeks_in_month]=[weeks_sales_in_month], “Yes”, “No”)

Final Model

The final schema for the model is shown below. The measure [Sales Amount] is simply the sum of the sales_amount field.


Categories: BISM, DAX Tags: ,

DAX : YTD on 445 Calender and Irregular End Dates

December 16, 2011 Leave a comment

In a previous post I looked at the built in DAX Total to date functions (TotalYTD, TotalMTD, TotalQTD) which sum calendar periods to an end date. The YTD function is somewhat flexible because it allows a year end date to be passed as an optional argument in the function. However, the end date is static in the formula, that is, the same year end date is applied must be applied to all years (also indicating the start of the next year is the day after the prior year ends). The 445 calendar contravenes this requirement because the yearend date (and year start date) is not consistent between years.

The 445 Calendar

The 445 calendar (and variants) are a commonly used to standardise months so that there are an even number of weeks in each month.One of the side effects of this is that the year does not start and end on the same day each year (note 2010 starting on 3-Jan and 2-Jan in 2011). This means that functions that rely on dates cannot be used to aggregate yearly totals.

The Formula

The calculation works by simply removing the current date filter context and then reapplying it so that the year is restricted to the current fiscal year and the date is less than the maximum date for any filter (ie row, column, filter axis) that is applied. The use of max for the fiscal_year_key is simply to return the current year as a scalar.


Using the totalytd function, the YTD value for December 2010 is effectively calculated as the YTD value for a date which ends on for 1-Jan-2011 (1 day).This does not occur when the filter context is reapplied because the calculation includes all data for the current year that is below the last date of the current filtered year (that is, all dates up to and including 28-Nov-2010).

An alternative approach that deals with text running balances can be found at Javier Gullen‘s blog.

Categories: BISM, DAX Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 40 other followers