BISM : Competing & Continuous Store Sales using DAX

Retail analysis models often include a competing outlet indicator to improve the quality of the model. The indicator does this by allowing the user to manage the effect that is associated with partial trading periods. For example, the true / false indicator shows whether the store has traded for all available weeks in a month. This post looks at how the competing store indicator can be implemented using powerpivot and DAX.

The Competing Store Definition

There are a few definitions of competitive stores (and perhaps another post to follow). In this post, the definition of competitive is a store that trades for every week in the month. If the store has not traded for all weeks, it is not considered competitive and this permits the identification of sales that occurred when;

  1. The store was open or closed for the month in question or
  2. The store didn’t trade for some other reason (eg refurbishment)

The Model

The underlying model considers a store dimension, date dimension, and sales data. The grain for sales is { store x day } à sales amount. Although, this implementation treats the sales relation {store x day} as unique, there is no requirement to do so. The dates table follows standard structure of a date dimension table (grain of day) with the fiscal calendar attributes aligned along a 445 Calender. The fields fiscal_fullweek_key has the format YYYYWW and fiscal_fullmonth_key has the format YYYYMM. These are standard implementations where YYYY refers to the century, WW refers to the week of year and MM refers to the month of year.


The store table is a type 1 dimension (showing the ‘current’ state of the store). There is a strong argument to show the [competing store] as an attribute of the store dimension however, this would require a type 2 store dimension and intensive ETL. This is outside the management of the end user who consumes available data as it would require more intensive resources and relational constructs.

Method

The approach to determine whether the sale is competing or not is based on the comparison of the stores actual trading weeks in a month compared to the available trading weeks. When the two values are the same the sale is competitive, otherwise it is not. Additionally, note that this formula is applicable to each row of the sales fact because the sales fact is the only table that combines both store and date data and can therefore be used as the auto-exists cross-join of stores and dates.

For each row in the sales fact, we must determine;

  1. weeks_in_month as the number of weeks in the current month, where current refers to the row context of date (ie the number of weeks in the month of the sales date).
  2. sales_in_month as the number of weeks in the current month that the store traded. Again, current refers to the row context of sale date.

In order to improve readability of the formula the sales table has been extended to include a field full_month_key (formula = RELATED(fiscal_fullmonth_key)) and full_week_key (formula = RELATED(fiscal_fullweek_key))

Formula 1 – Available Trading Weeks in Current Month (weeks_in_month)

The available weeks in the current month are the distinct count of weeks for the related month. We can achieve this by removing the filter context of the dates table and reapplying it so that the row sales month equals the dates month (below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , filter(all(Dates), Sales[fullmonth_key]=Dates[fiscal_fullmonth_key])

        )

If the sales table did not include the fullmonth_key field, we could use the related function directly in the formula.

CALCULATE(

            DISTINCTCOUNT(Dates[fiscal_weeks_key])

            , filter(all(dates), RELATED(Dates[fiscal_fullmonth_key])=Dates[fiscal_fullmonth_key])

        )

Alternatively, we can remove the row filter context of the dates table to the extent that it only includes the current month. This is done with the ALLEXCEPT function so that the dates table filters are removed except for the listed column restrictions (as below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , ALLEXCEPT(Dates,Dates[fiscal_fullmonth_key])

        )

 

Formula 2 – Weeks Traded in the Current Month (weeks_sales_in_month)

To determine the number of weeks that a store traded for (in the current month), we use a similar restriction but apply it to the sales table. That is, count distinct weeks after we remove the current row filter context and reapply it based on the current month of the rows sale date.

CALCULATE(

        DISTINCTCOUNT(sales[week_key])

        , ALLEXCEPT(sales,sales[store_key], sales[fullmonth_key])

        )

 

Formula 3 – Continuing Sales

The final (and only visible formula) is a simple if function that compares the trading weeks to those available for a true/false output.

=if([weeks_in_month]=[weeks_sales_in_month], “Yes”, “No”)

Final Model

The final schema for the model is shown below. The measure [Sales Amount] is simply the sum of the sales_amount field.


DAX : YTD on 445 Calender and Irregular End Dates

In a previous post I looked at the built in DAX Total to date functions (TotalYTD, TotalMTD, TotalQTD) which sum calendar periods to an end date. The YTD function is somewhat flexible because it allows a year end date to be passed as an optional argument in the function. However, the end date is static in the formula, that is, the same year end date is applied must be applied to all years (also indicating the start of the next year is the day after the prior year ends). The 445 calendar contravenes this requirement because the yearend date (and year start date) is not consistent between years.

The 445 Calendar

The 445 calendar (and variants) are a commonly used to standardise months so that there are an even number of weeks in each month.One of the side effects of this is that the year does not start and end on the same day each year (note 2010 starting on 3-Jan and 2-Jan in 2011). This means that functions that rely on dates cannot be used to aggregate yearly totals.

The Formula

The calculation works by simply removing the current date filter context and then reapplying it so that the year is restricted to the current fiscal year and the date is less than the maximum date for any filter (ie row, column, filter axis) that is applied. The use of max for the fiscal_year_key is simply to return the current year as a scalar.


Using the totalytd function, the YTD value for December 2010 is effectively calculated as the YTD value for a date which ends on for 1-Jan-2011 (1 day).This does not occur when the filter context is reapplied because the calculation includes all data for the current year that is below the last date of the current filtered year (that is, all dates up to and including 28-Nov-2010).

An alternative approach that deals with text running balances can be found at Javier Gullen‘s blog.

SSAS Dimension Error Processing & Duplicate Attributes

SSAS error configurations allow for some flexible arrangements in object processing. Naively, these error configurations may be changed so that a dimension can be successfully processed however, the error configurations can have a dramatic effect on how the dimension and fact data is processed. This post examines the error configuration for duplicate attributes. In doing so, I address dimension design, its processing and the effect that the configuration has on fact and dimension data.

In this post, we use a simple geography model for illustration, with a single measure (value) against each leaf node. Note that the city Berlin appears in the states of Hessen and Saarland.

Structure

Data

 

The data for the model is also very simple (with both fact data and dimension in a single table). The row_key field has been added to the table to improve the dimension appearance (design appearance). Additionally, some will note that the row_key is an excellent candidate for the [city] attribute key in this scenario, however, to keep things simple, we use attribute names as keys.

Dimension Design

Default Design (no relationships)

When the dimension is created it has three attributes of [key], [State] and [City]. The default behaviour of these attributes is to relate each (non-key) attribute directly to the dimensions key as below;

Structure

Attribute Relationships

 

If a hierarchy is created so that state drills into city, the relationship view changes slightly and separates the attributes into distinctive nodes (as below). In this situation, the relationships have not changed as both [State] and [City] relate directly to the key. Note that the hierarchy provides a warning triangle () which states that attribute relationships do not exist between levels in the hierarchy and this may decrease performance.

Structure

Attribute Relationships

 

The importance of relationships is well understood and, is arguably the most important performance consideration in dimension design. Because, we expect cities to roll up to states, the relationship should be created as part of the design. This (second design) choice is for performance.

Performance Design (relationships)

When the dimension is designed with relationships (below), the warning indicator is removed from the hierarchy.

Structure

Attribute Relationships

 

 

Dimension Processing

Dimension processing involves many stages. For our purpose, we are primarily interested with the load of data (that is loading member data) and the effect that relationships have on processing.

The default design sends a query to the data source for each non key attribute in the dimension. These are highlighted in the first two queries in the trace below where the data for [city] and [state] is loaded. The final query holds the data for the dimension key and relates the key to the [state] and [city] attributes. Note that in this design, each attribute is directly related to the dimension key and therefore, the members for these attributes have a direct relation with the key, hence the third query must relate the key to both [state] and [city] attributes.

When the performance design dimension is processed (note the dimension fails processing), the queries sent for data mimic the attribute structure in the relationship design. That is;

  1. A query is sent to retrieve [State] data. This is the same as in the default design.
  2. A query is sent to retrieve the [city] and [state] data. Unlike the default design, this includes [state] because state has a relationship with city.
  3. Finally, a query is sent for [key] data and the [city] relationship. In the default design, this query also included the [state] attribute however, since [state] is related to city (and this relation is exposed in the prior query), there is no requirement for [state] data and it is excluded.

Dimension Processing Error Configuration

Under the default error configuration, the performance design dimension fails during processing with a ‘duplicate key’ error is displayed (as below).


 

This error relates to the expectation that a [city] will relate to one (and only one) [state]. The data shows that the city Berlin is found in both Hessen and Saarland.

 

In order to process the dimension, the KeyDuplicate error configuration may be changed to IgnoreError or ReportAndContinue from the default value of ReportAndStop. However, while this may permit the dimension to process, the change in error configuration can have adverse impacts on both dimension data and fact data.

 

Data Outcomes

When the performance dimension is processed by ignoring duplicate attribute errors, members which raise the error are dropped from the dimension. Not all members are dropped.

 

Berlin is not found under Hessen and the fact values do not match the expected [state] totals. Note that Hessen should sum to 50 and Saarland should sum to 70. The 20 difference relates to the 20 for Berlin which should reside in Hessen (but now resides in Saarland).

 

Correcting the Outcome

A standard way of correcting this type of error is to apply a composite key to the attribute that causes the issue. In this situation, the key of the city attribute now uses both the state and city (see below) and shows the expected structure.

 

Conclusion

While the use of error configurations may permit the processing of dimensions, the outcomes of the processing operation can have unexpected results. A more certain way of managing processing is to address the underlying issue of the error at the source or through other factors in dimension design.