Archive

Posts Tagged ‘DAX Calculations’

DAX III – Filtering Queries with CALCULATETABLE and FILTER

August 1, 2012 1 comment

So far the queries that we have created in DAX have focused on the entire result set (with the exception of the start at clause in evaluate). Naturally, we would want to restrict the rowset in some way and this is the focus of this post – how to filter row-set tables in DAX queries. There are two ways to do this, firstly, we can use the CACULCUATETABLE syntax and secondly we can use a FILTER that covers the evaluate syntax.

BOL defines the function of CALCULATETABLE as “Evaluates a table expression in a context modified by the given filters” taking the form;

 

CALCULATETABLE(<expression>,<filter1>,<filter2>,…)

 

There are many applications of how we can use this, however, let’s look at the first of reducing the rows returned, say for the first six months of 2000. We can evaluate this query like this.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                , ‘dates’[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    )
)

 

One of the restrictions of using CALCULATETABLE to restrict rows is that the filter cannot be applied to calculated measures. So if we extended the filter arguments to include a condition on ‘sales_amount’, we would get an error.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                , ‘dates’[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    , calculate(sum(‘sales’[amount])) > 10000
    )
)

A function ‘CALCULATE’ has been used in a True/False expression that is used as a table filter expression. This is not allowed.

 

Note that the filter arguments in calculate table are applied to the underlying table regardless of the fields returned (projected or selected). So, if we wanted to sum for the first 6 months sales for the year 2000, we could simply remove the ‘dates’[month] field from the summarize and the calculation is applied only for Jan to Jun.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
   
    )
)

 

We could also include this restriction in the definition of the calculation and achieve the same results (see sum_amount2 below).

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
            , “sum_amount2″calculate(sum(‘sales’[amount])
                                        , all(dates)
                                        , ‘dates’[Year]=2000
                                        , ‘dates’[Month] <6
                                        , ‘dates’[Month] >=1
                                        )
        )
        
        
    , dates[year]=2000
    , ‘dates’[Month] <6
    , ‘dates’[Month] >=1

    )
)

 

From the rowset point of view, we are still faced with the problem of how to restrict rows based on a calculated value. In order to do this, we can apply the FILTER function against the table. Filter returns a subset of the table and takes the form;

 

FILTER(<table>,<filter>)

 

Consider yearly sales for the first six months of the year as below;

evaluate 
(
    calculatetable
    (
         addcolumns
        (
            summarize
            (
                ‘dates’
                , ‘dates’[year]       
            )
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
    , ‘dates’[Month] <6
    , ‘dates’[Month] >=1
    )
)

 

A filter can be applied to this rowset so that only sales over 50,000 are returned. Note that it is wrapped around the CALCUALTETABLE.

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates’[year]
                    )
                    , “sum_amount”calculate(sum(‘Sales’[amount]))
                )
            , ‘dates’[Month] <6
            , ‘dates’[Month] >=1
            )
    
            , calculate(sum(‘Sales’[amount])) > 50000 
    )

)

 

In these situations, it may be best to define the measure explicitly;

define measure ‘dates’[sum_amount] = calculate(sum(‘Sales’[amount]))

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates’[year]
                    )
                    , “sum_amount”‘dates’[sum_amount]
                )
            , ‘dates’[Month] <6
            , ‘dates’[Month] >=1
            )
    
            , ‘dates’[sum_amount] > 50000 
    )

)

 

DAX II – Extending the use of Evaluate

July 24, 2012 1 comment

In a previous post, I looked at evaluate as the method to return a table in a row-set form. However, the query syntax allows us to more precisely define the result set and this is the focus of this post.

Books on line specifies three optional arguments that can be used in query syntax to control, there are DEFINE, ORDER BY and START AT. The syntax follows the format;

 

[DEFINE { MEASURE <tableName>[<name>] = <expression> }

Categories: BISM, DAX, SSAS Tags: , ,

DAX Querying Part I

July 23, 2012 4 comments

This post looks the basics for querying tabular models using a DAX query. While Tabular models can be queried with MDX, DAX is the native language for tabular databases and may provide better results than MDX. This post looks how to building a simple table in DAX so that a row set can be exposed.

The basis of record-set queries in DAX is the evaluate statement which returns the table that is defined in the functions argument. The argument can be any statement that forms a table. For example, we can simply pass a table to the statement and return all the rows and columns from it;

evaluate 
(  
    ‘dates’ 
)

 

More likely though, we will want to define table which is defined by our query, that is, we define the table which will return the results needed. As more and more DAX queries are used and the community develops its standards, it is likely that there will be a query pattern which forms. For now though, consider that the argument of evaluate can be anything that is returned as a table.

Consider selecting distinct years. Since the values function returns a table (of a single column), we can be use the values function as the argument for an evaluate.

evaluate 
(  
    values(‘dates’[year])
)

 

Similarly, summarize returns a table and produce the same results.

evaluate 
(  
    summarize
    ( 
        ‘dates’
        , ‘dates’[year]
    )
)

 

The key point here is that we need to define a table as part of the evaluate statement.

Extending the Defined Table with ADDCOLUMNS

Usually, we want our defined table to combine data from another table. There are a few ways we can go about this and one method is to use the ADDCOLUMNS function. As the name suggests, ADDCOLUMNS adds a calculated column to a table. Note that a new column is defined both its name and expression.

ADDCOLUMNS(<table>, <name>, <expression>[, <name>, <expression>]…)

 

It is interesting to note the evaluation context of the expression. While we may expect a calculation to be evaluated in the context of the row it is being added to, the context applies to the entire model. For example, consider summing the sales amount by year. Extending our yearly values table by adding a column to sum sales amount is evaluated in the context of the entire sales table (as below). This is akin to the query context in MDX for measures that are defined with the WITH MEMBER MEASURES.[Measure Name].

 

evaluate
(

    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”sum(‘Sales’[amount])
    )

)

 

In order to change the evaluate context of the calculation to the row, we must use the calculate statement.

evaluate
(
    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”calculate(sum(‘Sales’[amount]))
    )
)

 

Any filters that are applied with the calculation context are evaluated with the calculations context. Therefore (as expected), to retrieve a single store sales, we would include it in the calculate statement.

evaluate
(
    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”calculate(sum(‘Sales’[amount]))
        
        , “store 1 sales”calculate(sum(‘Sales’[amount])
                                    , ‘Stores’[store_name]=“Store 1″
                                    )
    )

)

 

The Last Non-Empty Dilemma TABULAR Style

March 21, 2012 8 comments

In a previous post, I addressed the function of the last non empty aggregation in SSAS (OLAP). Specifically, I looked at what happens when there are ‘missing’ chucks of data in the fact table and how the OLAP engine natively carries forward the balance to an aggregated time member without regard for the what is the ‘true’ last date. That is, we expect the last date of stock to be applied for all aggregated date values regardless of whether data exists or not. This post looks at the LNE calculation in the tabular model.

Generally, we expect the stock schema to take the form of the fact (balances) surrounded by the dimension table. The fact table (Stock) holds the [Quantity on Hand] (on_hand) for a given date with the Dates table showing a natural hierarchy of Year, Month and Date.

Summing the column on_hand will allow the calculation of daily balances (we will be able to show correct balances when a date is selected) however, if we use any aggregated Date field, the [Stock on Hand] value will be incorrect because the formula adds the amount for all days filtered by the date restriction.

For example, if we consider a very small subset of data (shown as ‘Source Data’) and [Stock on Hand] defined as

Stock On Hand:=Sum([on_hand])

We can see that Jan 2011 for Product A (20)=15+4+1 and for product B (2011)=10+8

Source Data [Stock On Hand] using Sum

Clearly, the addition for aggregate time members does not give us what we want.

The next thought is to change the stock filter context to the max of time. That is, return the sum of stock for the last date in the selected time period. For example;

Stock On Hand:=CALCULATE(sum([on_hand]), FILTER(ALL(Dates[date]), Dates[date]=max(Dates[date])))

The problem with this approach is that the aggregate values will only show for the last date in the month (or year). This is not a problem when we have full year worth of data in the fact and the maximum of the date filter is the last date in the period. But in this case we don’t and usually the most recent data will not be complete for the year. For Product A in the picture below, the total for Jan-2011 is correct (1) because the maximum date for Jan is 31-Jan and has data (see source). However, in the year context (ie 2011) the maximum date is 31-Dec-2011 and there is no stock for this and therefore the sum is empty.

Source Data [Stock On Hand] using max(date.date)

To solve this problem, it is important to remember that the filter context applies a filter to the fact. So, for the filter context of Jan-2011 (say when Jan-2011 is on a row), the data in the fact is filtered for all the dates in Jan. What we are really interested in therefore is the last date in the fact for the current filter context.

Stock On Hand:=CALCULATE(sum([on_hand]), filter(Stock, Stock[date_key]=Max(Stock[date_key])))

In this formula, we recognise that the filter context has been applied to the fact table and use this to derive the maximum date for the current filter period and then use this to further filter the fact.

Source Data [Stock On Hand] using max(Stock.date)

Alternate Approach using Transactions

In addition to using the quantity of stock as a field in the fact, we may also consider using the transactional amount. In this case the use of summing against the Date table filter context works because we are interested in summing all data less than the max date of the filter context. In this situation, our formula would be;

Stock On Hand:=CALCULATE(sum([qty_mvt]), FILTER(Stock, Stock[date_key]<=max(Dates[date])))

BISM : Competing & Continuous Store Sales using DAX

December 29, 2011 Leave a comment

Retail analysis models often include a competing outlet indicator to improve the quality of the model. The indicator does this by allowing the user to manage the effect that is associated with partial trading periods. For example, the true / false indicator shows whether the store has traded for all available weeks in a month. This post looks at how the competing store indicator can be implemented using powerpivot and DAX.

The Competing Store Definition

There are a few definitions of competitive stores (and perhaps another post to follow). In this post, the definition of competitive is a store that trades for every week in the month. If the store has not traded for all weeks, it is not considered competitive and this permits the identification of sales that occurred when;

  1. The store was open or closed for the month in question or
  2. The store didn’t trade for some other reason (eg refurbishment)

The Model

The underlying model considers a store dimension, date dimension, and sales data. The grain for sales is { store x day } à sales amount. Although, this implementation treats the sales relation {store x day} as unique, there is no requirement to do so. The dates table follows standard structure of a date dimension table (grain of day) with the fiscal calendar attributes aligned along a 445 Calender. The fields fiscal_fullweek_key has the format YYYYWW and fiscal_fullmonth_key has the format YYYYMM. These are standard implementations where YYYY refers to the century, WW refers to the week of year and MM refers to the month of year.


The store table is a type 1 dimension (showing the ‘current’ state of the store). There is a strong argument to show the [competing store] as an attribute of the store dimension however, this would require a type 2 store dimension and intensive ETL. This is outside the management of the end user who consumes available data as it would require more intensive resources and relational constructs.

Method

The approach to determine whether the sale is competing or not is based on the comparison of the stores actual trading weeks in a month compared to the available trading weeks. When the two values are the same the sale is competitive, otherwise it is not. Additionally, note that this formula is applicable to each row of the sales fact because the sales fact is the only table that combines both store and date data and can therefore be used as the auto-exists cross-join of stores and dates.

For each row in the sales fact, we must determine;

  1. weeks_in_month as the number of weeks in the current month, where current refers to the row context of date (ie the number of weeks in the month of the sales date).
  2. sales_in_month as the number of weeks in the current month that the store traded. Again, current refers to the row context of sale date.

In order to improve readability of the formula the sales table has been extended to include a field full_month_key (formula = RELATED(fiscal_fullmonth_key)) and full_week_key (formula = RELATED(fiscal_fullweek_key))

Formula 1 – Available Trading Weeks in Current Month (weeks_in_month)

The available weeks in the current month are the distinct count of weeks for the related month. We can achieve this by removing the filter context of the dates table and reapplying it so that the row sales month equals the dates month (below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , filter(all(Dates), Sales[fullmonth_key]=Dates[fiscal_fullmonth_key])

        )

If the sales table did not include the fullmonth_key field, we could use the related function directly in the formula.

CALCULATE(

            DISTINCTCOUNT(Dates[fiscal_weeks_key])

            , filter(all(dates), RELATED(Dates[fiscal_fullmonth_key])=Dates[fiscal_fullmonth_key])

        )

Alternatively, we can remove the row filter context of the dates table to the extent that it only includes the current month. This is done with the ALLEXCEPT function so that the dates table filters are removed except for the listed column restrictions (as below)

CALCULATE(

        DISTINCTCOUNT(Dates[fiscal_fullweek_key])

        , ALLEXCEPT(Dates,Dates[fiscal_fullmonth_key])

        )

 

Formula 2 – Weeks Traded in the Current Month (weeks_sales_in_month)

To determine the number of weeks that a store traded for (in the current month), we use a similar restriction but apply it to the sales table. That is, count distinct weeks after we remove the current row filter context and reapply it based on the current month of the rows sale date.

CALCULATE(

        DISTINCTCOUNT(sales[week_key])

        , ALLEXCEPT(sales,sales[store_key], sales[fullmonth_key])

        )

 

Formula 3 – Continuing Sales

The final (and only visible formula) is a simple if function that compares the trading weeks to those available for a true/false output.

=if([weeks_in_month]=[weeks_sales_in_month], “Yes”, “No”)

Final Model

The final schema for the model is shown below. The measure [Sales Amount] is simply the sum of the sales_amount field.


Categories: BISM, DAX Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 228 other followers