Archive

Archive for the ‘DAX’ Category

Managing Connections In Power Pivot

January 15, 2013 5 comments

Demonstrations that show importing data into Power Pivot do this from database icon in the home window (as identified below). However, once data is in your model and you wish to add new tables it is wiser to reuse the existing connection rather than create a new connection for each import. Each time that table(s) are added by this dialogue, a new connection is created within the model with multiple connections possible referring to the same source. For model simplicity, it is better to have a single connection referring to a single data source.

Standards for Import

The standard method importing data into Power Pivot is to use the import from database icon as shown in the diagram below (when a database is being used). This utility has features that allow you to detect relationships between tables and may suffice from two models. That is, when there are no existing tables and the model.

What this utility does is create a new connection to the data source each time the Wizard is invoked. Consider for example, an existing model that has the product table from Adventure Works imported. The initial import did not include the ProductCategory and ProductSubCateogory table. Naturally, we would want to include this in the model.

If we use this Wizard to reimport the tables, a new connection is set up. This connection is associated with the two new tables. We can see this by selecting the “Existing Connections” icon in the design.

Note that there are two connections in the model (as below) which was caused by the “re-import”of data once the initial population of the model had occurred.

Altering an Existing Connection

We can add data to the existing model and reuse an existing connection at the same time. If we wish to do this you simply reopen the existing connections properties and append tables (or add a query). Click the ‘Open’ button from the dialogue and select either the ‘Select from a list of tables…’ or ‘Write a query that will….’.

 

This will append any new tables to the model utilising existing connection.

Perhaps the main reason for reusing the connection is that it keeps the model succinct and manageable. Once a table has been imported through a connection the connection properties for that table cannot be changed without first deleting all the tables using the connection. Naturally this would increase the amount of rework required to the model should consolidation of connections be required at a later stage.

Finally, the connections are retained within the model even after all the tables using connection have been deleted. The connection must manually be deleted from the ‘Existing Connections’ dialogue.

 

 

 

Categories: BISM, DAX, PowerPivot Tags: ,

OLAP Rigidity wins against Tabular?

January 7, 2013 1 comment

There are many reasons why you might choose a Tabular Model over Multidimensional one. Marco Russo discusses some of the pros of tabular in his blog here which general relate to the flexibility that the tabular model gives you. However, one reason surprising reason for choosing multidimensional may be its rigid dimensional structure and the certainty that this structure gives in changing data environments.

A Multidimensional dimension provides many ways to record information against an attribute. Most importantly these are the attributes KEY, NAME and VALUE which are set under the dimension source properties. For example we can see the [State-Province] attribute from the Geography dimension in Adventure Works as below. The Key is the only property that must be set for an attribute so if no name is specified, the key will automatically be applied as the name.

Note that in the tabular engine, there is no concept of the KEY or Name. These are not specified as native DAX refers to column value(s).

Referring to Attribute Members

When an MDX query refers to members of the attribute it can do so by using the member name or the members unique name. Thus, the two MDX statements are equivalent;

– query using members unique name

select

[Measures].[Internet Sales Amount] on 0,

[Product].[Product Categories].[Category].&[4] on 1

from [Direct Sales]

 

– query using members name

select

[Measures].[Internet Sales Amount] on 0,

[Product].[Product Categories].[Category].[Accessories] on 1

from [Direct Sales]

 

There are many reasons why you would choose the first query over the second, namely, the second can give unexpected values when there is more than one member with the name being sought. That is, if there were two product categories (with different keys) and the same name (as Accessories), then the query would return only the first member. One might expect that the query would aggregate the results but this does not occur.

Once the query has been written, the members unique name is confusing and many people complain that it lacks reference to the underlying data (after all its impossible to know what does .&[4] means anyway). However, it is still the best way to refer to a member and this is the way most client tools generate MDX (using the unique_member_name).

Note that since the tabular engine has no concept of a key for an attribute (you don’t uniquely specify the key or name for an attribute) the MDX equivalent passed to the tabular engine uses what we would otherwise consider the name as the key. Thus, as in the example above, the tabular equivalent for this member is [Product].[Category].[Category].&[Accessories]. To an OLAP (multidimensional) developer, this type of key (ie the name as the key) is generally considered a real NO NO and contravenes best practice.

 

Client Queries and the interesting stuff.

For tabular models, the unique name for a member is generated by the name. Furthermore, most client tools refer to the key in their script. I have even seen some tools hard code the published member for parameters. So, what happens when the underlying attribute name changes? The change could break your report.

We can easily demonstrate how this error would occur using Excel as a client. Suppose I have a pivot based on a multidimensional cube and I convert the simple pivot to formulas (as below). The Accessories member in Cell A2 is hardcoded (the formula is in D2 and uses the key [4] as above). If the name of this member changes, the spread sheet still works fine, and the new name is displayed.

However, for the tabular example (as below), when the hardcoded name is changed, the MDX returns a #N/A error for the CUBEMEMBER function.

In the second picture, the name of the member has changed to [Accessories & Other] however the ‘report’ still relies on the hard coded value .&[Accessories] and the error occurs.

Conclusion

The use of a ‘key’ for each member would offer a way to solve the problem as addressed above and this is only available in multidimensional dimensions. To be fair though, the problem is as much a function of design as it is of structure. A multidimensional dimension would still raise the issue if the attributes name had been used as a key (but no-one does that do they?).

DAX III – Filtering Queries with CALCULATETABLE and FILTER

August 1, 2012 1 comment

So far the queries that we have created in DAX have focused on the entire result set (with the exception of the start at clause in evaluate). Naturally, we would want to restrict the rowset in some way and this is the focus of this post – how to filter row-set tables in DAX queries. There are two ways to do this, firstly, we can use the CACULCUATETABLE syntax and secondly we can use a FILTER that covers the evaluate syntax.

BOL defines the function of CALCULATETABLE as “Evaluates a table expression in a context modified by the given filters” taking the form;

 

CALCULATETABLE(<expression>,<filter1>,<filter2>,…)

 

There are many applications of how we can use this, however, let’s look at the first of reducing the rows returned, say for the first six months of 2000. We can evaluate this query like this.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                , ‘dates’[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    )
)

 

One of the restrictions of using CALCULATETABLE to restrict rows is that the filter cannot be applied to calculated measures. So if we extended the filter arguments to include a condition on ‘sales_amount’, we would get an error.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                , ‘dates’[Month]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
    , calculate(sum(‘sales’[amount])) > 10000
    )
)

A function ‘CALCULATE’ has been used in a True/False expression that is used as a table filter expression. This is not allowed.

 

Note that the filter arguments in calculate table are applied to the underlying table regardless of the fields returned (projected or selected). So, if we wanted to sum for the first 6 months sales for the year 2000, we could simply remove the ‘dates’[month] field from the summarize and the calculation is applied only for Jan to Jun.

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
        
        
    , dates[year]=2000
    , dates[month]>=1
    , dates[month]<=6
   
    )
)

 

We could also include this restriction in the definition of the calculation and achieve the same results (see sum_amount2 below).

evaluate 
(
    calculatetable
    (
 
        addcolumns
        (
    
            summarize
            (
                ‘dates’
                , ‘dates’[year]
                
            )
        
            , “sum_amount”calculate(sum(‘Sales’[amount]))
            , “sum_amount2″calculate(sum(‘sales’[amount])
                                        , all(dates)
                                        , ‘dates’[Year]=2000
                                        , ‘dates’[Month] <6
                                        , ‘dates’[Month] >=1
                                        )
        )
        
        
    , dates[year]=2000
    , ‘dates’[Month] <6
    , ‘dates’[Month] >=1

    )
)

 

From the rowset point of view, we are still faced with the problem of how to restrict rows based on a calculated value. In order to do this, we can apply the FILTER function against the table. Filter returns a subset of the table and takes the form;

 

FILTER(<table>,<filter>)

 

Consider yearly sales for the first six months of the year as below;

evaluate 
(
    calculatetable
    (
         addcolumns
        (
            summarize
            (
                ‘dates’
                , ‘dates’[year]       
            )
            , “sum_amount”calculate(sum(‘Sales’[amount]))
        )
    , ‘dates’[Month] <6
    , ‘dates’[Month] >=1
    )
)

 

A filter can be applied to this rowset so that only sales over 50,000 are returned. Note that it is wrapped around the CALCUALTETABLE.

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates’[year]
                    )
                    , “sum_amount”calculate(sum(‘Sales’[amount]))
                )
            , ‘dates’[Month] <6
            , ‘dates’[Month] >=1
            )
    
            , calculate(sum(‘Sales’[amount])) > 50000 
    )

)

 

In these situations, it may be best to define the measure explicitly;

define measure ‘dates’[sum_amount] = calculate(sum(‘Sales’[amount]))

evaluate 
(

    filter
    (
    
            calculatetable
            (
                 addcolumns
                (
                    summarize
                    (
                        ‘dates’
                        , ‘dates’[year]
                    )
                    , “sum_amount”‘dates’[sum_amount]
                )
            , ‘dates’[Month] <6
            , ‘dates’[Month] >=1
            )
    
            , ‘dates’[sum_amount] > 50000 
    )

)

 

DAX Studio (Beta) Released

The first installers for DAX Studio are released.

You can get them here

(x64) – http://daxstudio.codeplex.com/downloads/get/463653

(x32) – http://daxstudio.codeplex.com/downloads/get/463654

Categories: BISM, DAX, SSAS Tags: , , , ,

DAX II – Extending the use of Evaluate

July 24, 2012 1 comment

In a previous post, I looked at evaluate as the method to return a table in a row-set form. However, the query syntax allows us to more precisely define the result set and this is the focus of this post.

Books on line specifies three optional arguments that can be used in query syntax to control, there are DEFINE, ORDER BY and START AT. The syntax follows the format;

 

[DEFINE { MEASURE <tableName>[<name>] = <expression> }

Categories: BISM, DAX, SSAS Tags: , ,

DAX Querying Part I

July 23, 2012 4 comments

This post looks the basics for querying tabular models using a DAX query. While Tabular models can be queried with MDX, DAX is the native language for tabular databases and may provide better results than MDX. This post looks how to building a simple table in DAX so that a row set can be exposed.

The basis of record-set queries in DAX is the evaluate statement which returns the table that is defined in the functions argument. The argument can be any statement that forms a table. For example, we can simply pass a table to the statement and return all the rows and columns from it;

evaluate 
(  
    ‘dates’ 
)

 

More likely though, we will want to define table which is defined by our query, that is, we define the table which will return the results needed. As more and more DAX queries are used and the community develops its standards, it is likely that there will be a query pattern which forms. For now though, consider that the argument of evaluate can be anything that is returned as a table.

Consider selecting distinct years. Since the values function returns a table (of a single column), we can be use the values function as the argument for an evaluate.

evaluate 
(  
    values(‘dates’[year])
)

 

Similarly, summarize returns a table and produce the same results.

evaluate 
(  
    summarize
    ( 
        ‘dates’
        , ‘dates’[year]
    )
)

 

The key point here is that we need to define a table as part of the evaluate statement.

Extending the Defined Table with ADDCOLUMNS

Usually, we want our defined table to combine data from another table. There are a few ways we can go about this and one method is to use the ADDCOLUMNS function. As the name suggests, ADDCOLUMNS adds a calculated column to a table. Note that a new column is defined both its name and expression.

ADDCOLUMNS(<table>, <name>, <expression>[, <name>, <expression>]…)

 

It is interesting to note the evaluation context of the expression. While we may expect a calculation to be evaluated in the context of the row it is being added to, the context applies to the entire model. For example, consider summing the sales amount by year. Extending our yearly values table by adding a column to sum sales amount is evaluated in the context of the entire sales table (as below). This is akin to the query context in MDX for measures that are defined with the WITH MEMBER MEASURES.[Measure Name].

 

evaluate
(

    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”sum(‘Sales’[amount])
    )

)

 

In order to change the evaluate context of the calculation to the row, we must use the calculate statement.

evaluate
(
    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”calculate(sum(‘Sales’[amount]))
    )
)

 

Any filters that are applied with the calculation context are evaluated with the calculations context. Therefore (as expected), to retrieve a single store sales, we would include it in the calculate statement.

evaluate
(
    addcolumns
    (
        values(‘dates’[Year])
        , “yearly sales”calculate(sum(‘Sales’[amount]))
        
        , “store 1 sales”calculate(sum(‘Sales’[amount])
                                    , ‘Stores’[store_name]=“Store 1″
                                    )
    )

)

 

Introducing DAX Studio

July 9, 2012 7 comments

If you have tried to query a tabular model using DAX, you will soon realise that it is not an easy undertaking. Why is it hard? Well, for one, SSMS isn’t currently much of a friend for tabular models. When you connect to a tabular instance with SSMS the model is displayed in a multidimensional format. The only tabular meta-data which you can easily derive are some table names (dimension names), some field names based on attribute hierarchies and the names of measures (without context for the table that they are built in). This post looks at DAX Studio which is an Excel 2010 Addin available on CodePlex. It is designed for DAX and tabular models. When released, the installer can be down loaded from here.

What’s the Issue with SSMS?

SSMS does not show useful metadata for building DAX queries. Further, its functionality is suited to MDX queries with the drag and drop name placement showing the multidimensional naming standard. In this post, I have created a really simple tabular model with three tables as shown below. It has a sales fact and dimensions of Stores and dates and you can see that all the fields from the Sales table are hidden from client tools except for calculated values (‘Sales Amount’ and ‘Store Ratio’).

When I connect to the model in SSMS (as below), the model presents itself as a multidimensional model.

Some of the implications of using SSMS are that;

  1. The object explorer in SSMS shows only tables and you can’t easily determine fields.

     

  2. Visible table fields are shown as attribute hierarchies so (for example) we can surmise that the dates table has fields named Month and Year.

     

  3. When all the fields for a table are hidden (including hierarchies), the table is not shown as a dimension in the multidimensional explorer. This behaviour may be expected but it doesn’t allow you to see the structure for the table. In the object explorer, you can see that there are three tables (dates, Sales and Stores) which are the same tables as in the model. In the query window, tables in the model are exposed as dimensions to the extent that at least one table field (or user hierarchy) is visible. Note that the Sales table is not visible is the query browser because the table has no visible fields in the model.

     

  4. Measures are shown under a display folder with the same name as the table in which the measures were created. The measures defined in the Sales table appear in the Sales display folder. Again, this may be expected however, as a user, I cannot do not see a tabular view of the objects in the model (fields and calculations).

     

  5. Hidden fields and measures are not visible. Again while you may expect this, the nature of DAX may require you to query a hidden field from a table.

     

In general, the main concern for using SSMS is that you cannot see the model structure and metadata. You can derive some of it (for example, through attribute hierarchies), but SSMS you does not allow you convenient access to the model structure. Additionally, when you drag an object which you would expect to be a field (for example store_name in the above model), the field appears with the multidimensional naming convention ([Stores].[store_name]).

What’s the Solution

Dax Studio is an Excel 2010 Addin available on CodePlex that allows you to connect to any tabular model, examine the metadata and query it. Excel is a good launch application because it allows the PowerPivot pro’s to also query their models without the requirement of a tabular server. Once installed, you will notice that the Add-Ins tab on the ribbon appears (if you and DAX Studio is added as an icon). DAX Studio does not form part of the PowerPivot tab.

The tabular model used above is shown through DAX Studio and the model structure is immediately visible from the model explorer (metadata tab as shown below).


Query Results

As an Excel Addin, output are generally targeted at an excel sheet. You can set both the output method and target destination through the run button drop down and output results drop down. These are discussed below.

The Output Method (found with the Run button drop down) allows DAX table results to be outputted as a table (that is an excel table), Static Output (that is a Simple Sheet) or no result output (Validate which validates the command). These sample outputs are shown below.


Table Output

Static Output

Grid Output

 

The Output Target (identified below) specifies what sheet the results will be sent to (where an Excel sheet is the target of the output). By default results are sent to a sheet titled ‘DAXResults’. This is added if the sheet does not exist in your workbook.

Connecting to a Tabular Server

The current server connection is managed through the server icon and the active Model identified through the model drop down. These are both highlighted in the captures below.

When you connect to a server (dialog show below), allowed models are populated.

Server Connection Dialog

 

Connecting to PowerPivot Model

DAX Studio allows you to connect to a tabular model which is contained in a workbook that has a PowerPivot Model. In order to do this, you must launch DAX Studio from where the active cell is a PowerPivot Pivot Table. That is, select a cell in a Pivot Table that is derived from a PowerPivot model and then launch DAX Studio. You will notice that the model is titled Microsoft_SQLServer_AnalysisServer. You can see the outcome of these steps (1, 2 and 3) in the diagram below.

  1. Ensure that a PowerPivot Pivot Table is selected
  2. Launch DAX Studio (this icon is in the Addins Ribbon Tab)
  3. Note the Model name ‘Microsoft_SQLServer_AnalysisServer’

If you have launched DAX Studio from a PowerPivot Pivot Table, you can still connect to a tabular server and then reconnect to the Excel PowerPivot model at a later stage. You may have noticed that the connection dialog (above) specified no PowerPivot models in the Excel Workbook. If launched from a PowerPivot Pivot, this option is available and you can re-query the workbook model without reopening DAX Studio. This is shown in the two screen shots below

When DAX Studio is launched from a PowerPivot Pivot, the connection dialog identifies the workbook as a tabular model. You can still connect to a tabular server and then reconnect back to the workbook. Note the PowerPivot Model option in the connection dialog.

If you do not open DAX studio from a PowerPivot Pivot, a tabular model that is in the workbook is not available for use as a connection.

 

Other Cool Features

One other noteworthy feature of DAX Studio is the inbuilt support for SSAS Dynamic Management Views which can query both to Tabular and OLAP servers. The DMV tab exposes these schemas which (when dragged across) build the full select syntax (as shown below). I reiterate, the DMV queries can be run against existing OLAP servers (which you can connect to with the connection manager).

 

Categories: BISM, DAX, MDX, PowerPivot, SSAS Tags:

The Last Non-Empty Dilemma TABULAR Style

March 21, 2012 7 comments

In a previous post, I addressed the function of the last non empty aggregation in SSAS (OLAP). Specifically, I looked at what happens when there are ‘missing’ chucks of data in the fact table and how the OLAP engine natively carries forward the balance to an aggregated time member without regard for the what is the ‘true’ last date. That is, we expect the last date of stock to be applied for all aggregated date values regardless of whether data exists or not. This post looks at the LNE calculation in the tabular model.

Generally, we expect the stock schema to take the form of the fact (balances) surrounded by the dimension table. The fact table (Stock) holds the [Quantity on Hand] (on_hand) for a given date with the Dates table showing a natural hierarchy of Year, Month and Date.

Summing the column on_hand will allow the calculation of daily balances (we will be able to show correct balances when a date is selected) however, if we use any aggregated Date field, the [Stock on Hand] value will be incorrect because the formula adds the amount for all days filtered by the date restriction.

For example, if we consider a very small subset of data (shown as ‘Source Data’) and [Stock on Hand] defined as

Stock On Hand:=Sum([on_hand])

We can see that Jan 2011 for Product A (20)=15+4+1 and for product B (2011)=10+8

Source Data [Stock On Hand] using Sum

Clearly, the addition for aggregate time members does not give us what we want.

The next thought is to change the stock filter context to the max of time. That is, return the sum of stock for the last date in the selected time period. For example;

Stock On Hand:=CALCULATE(sum([on_hand]), FILTER(ALL(Dates[date]), Dates[date]=max(Dates[date])))

The problem with this approach is that the aggregate values will only show for the last date in the month (or year). This is not a problem when we have full year worth of data in the fact and the maximum of the date filter is the last date in the period. But in this case we don’t and usually the most recent data will not be complete for the year. For Product A in the picture below, the total for Jan-2011 is correct (1) because the maximum date for Jan is 31-Jan and has data (see source). However, in the year context (ie 2011) the maximum date is 31-Dec-2011 and there is no stock for this and therefore the sum is empty.

Source Data [Stock On Hand] using max(date.date)

To solve this problem, it is important to remember that the filter context applies a filter to the fact. So, for the filter context of Jan-2011 (say when Jan-2011 is on a row), the data in the fact is filtered for all the dates in Jan. What we are really interested in therefore is the last date in the fact for the current filter context.

Stock On Hand:=CALCULATE(sum([on_hand]), filter(Stock, Stock[date_key]=Max(Stock[date_key])))

In this formula, we recognise that the filter context has been applied to the fact table and use this to derive the maximum date for the current filter period and then use this to further filter the fact.

Source Data [Stock On Hand] using max(Stock.date)

Alternate Approach using Transactions

In addition to using the quantity of stock as a field in the fact, we may also consider using the transactional amount. In this case the use of summing against the Date table filter context works because we are interested in summing all data less than the max date of the filter context. In this situation, our formula would be;

Stock On Hand:=CALCULATE(sum([qty_mvt]), FILTER(Stock, Stock[date_key]<=max(Dates[date])))

Tabular verse Dimensional Design Concepts

February 8, 2012 1 comment

The multidimensional model appears to provide a richer environment for model design. However, for the case of snapshot fact tables, the tabular design may offer a much faster design and refreshing options. This post looks at the snapshot design and how business requirements are modelled in both multidimensional and tabular platforms.

The Snapshot Fact

Snapshot data (or accumulating snapshot fact tables) are usually used to record process driven data where records change over time. Consider, for example, the tables FactResellerSales and FactInternetSales tables in AdventureWorks. Here, the record shows Order Date, Due Date and a Ship Date. When the sale is made, the Order Date is known and the Due date is probably known, however, the ship date can change and is only truly known after the order has shipped. For this type of fact table, data should be added when the sale occurs and updated as information about the sale comes to light (that is, when it is shipped).

Classic Star Modelling

The standard way to model this relationship is through the reuse of the date table. That is, each fact field relates to the same Date.DateKey field in the dates table. This is shown for the tabular and multidimensional design models below

Tabular

Multidimensional

 

Multidimensional Interpretation

When a cube is built from this design, the Dates dimension becomes a role playing dimension and is joined to the fact for each relationship identified in the data source view. Thus, the Dates dimension is reused by the cube and with the dimension names appearing as the name of the fact field (after camel case conversion). Notice that there is a single dimension in the solution however, there appears to be three date dimensions (Order Date, Due Date and Ship Date).

Solution View

Cube View

 

While this approach may provide answers to simple business questions (eg, What is that value / quantity of product shipped on date xyz), the modelling technique fails when the query becomes complicated across restrictive dates. For example, it is not straight forward to determine the quantity ordered, shipped and due on date xyz.

Tabular Interpretation

In contrast to the multidimensional model, the tabular model employees an active relationship as the default join between fact and dimensions and each table appears only once in the ‘dimensional representation’ of the model. For example, there is only one date table in the pivot view.

By default, aggregation functions will use this relationship. In this situation, the active relationship (solid line) is between the OrderDate and the Date table. A sum measure ( sum([OrderQuantity])
) defined without context will therefore show the quantity ordered on each date.

The tabular model also allows functions to specify which relationship will be used as joins between tables. Therefore, the quantity of products shipped on a date can be determined by specifying the relationship between ResellerSales.ShipDateKey and Dates.DateKey. For example,

Ship Quantity:=CALCULATE(sum([OrderQuantity]),USERELATIONSHIP(‘ResellerSales’[ShipDateKey],’Dates’[DateKey]))

This allows the determination of measures that relate to more generic dimensions. For example, we can easily define [Order Quantity], [Ship Quantity], [Due Quantity] which specifies these values by date. This is in direct contrast to the default multidimensional behaviour and allows for more native browsing. For example, the date x value pivot below quickly identifies the sparse nature and trend of adventure works data.

I had never looked at adventure works data like this before. Here we can easily see that products are ordered on the 1st of the month, shipped on the 8th and due on the 13th. There are very few exceptions to this in the fact data.

 

Compromise?

The UDM can be designed to produce this outcome; however, it is not part of the ‘default’ behaviour. One way to achieve this would be to conform all dates into a single field (for example through a union) and specify separate measures for each union join (ie; add the Order Data, then the Ship Data and finally the Due Date data). However, this would require longer load times (since we are effectively staking facts) and increase the measure group size. The tabular approach (in my opinion) is a much nicer compromise.

NB It is also easy to mimic the multidimensional behaviour in tabular. Mulitple date tables are added to the model (one for each fact date) and labelled ‘Ship Date’ , ‘Due Date’, …

 

 

Categories: BISM, DAX, MDX, SSAS Tags: , ,

PowerPivot and Linked Table Names

January 28, 2012 Leave a comment

When PowerPivot creates a linked table, it appears to assign an arbitrary name to the import. The table name in powerpivot (Table1, Table2 etc) is then usually renamed as part of the design process. However, this practice is sloppy because the table name in excel is different to that of powerpivot and the name in excel is poorly defined (which may confuse anyone updating data at a later stage). This post looks at methods of managing linked tables in powerpivot and excel in order to provide more robust models.

Import (without name)

The standard way of creating a linked table in powerpivot is simply to select the data range, click on the powerpivot ribbon and select ‘Create Linked Table’. Once this is done the ‘Create Table’ dialog displays (usually the user indicates the table has headers) and the table is imported into powerpivot.

The imported table is then renamed in powerpivot with a double click (or right click à rename) so that table name has meaning in the pivot table.

In excel, we can see that this operation has also created an excel table by selecting the ‘Name Manager’ button from the Formulas ribbon. A dialog showing all tables and ranges in the workbook is displayed. Note that Table1 has been created.

The creation of the linked table in powerpivot has defined a table in excel and then created a connection for that table in powerpivot. We can see this in powerpivot by the definition of the linked table.

While this may achieve the outcome of creating a usable powerpivot model, it can become frustrating for the user (or anyone updating excel data) because the definitions in excel (ie Table1) are not the same as the table names in powerpivot (eg Dates).

Importing a Named Table

One way around this is to define the excel table and its name before the excel data is imported into powerpivot. This is easily achieved by selecting a formatting style from the ‘Format as Table’ button in the Home ribbon. The name of the table can be edited using the name manager (Formulas à Name Manager à Edit (after the appropriate table has been selected)).

Now, when the powerpivot linked table is created, it is automatically imported with the same name as the excel table.

Rename the Table and Manage the Import

Where the powerpivot table is created first and then the name of the table is changed in excel, the definition of the table in powerpivot will be in error. This does not present as a problem until the data in powerpivot is refreshed. When the data is refreshed, an ‘Errors in Linked Tables’ dialog is shown (as below) so that the connection can be managed.

The options for management (as below) are rather self-explanatory and allow for;

  • The selection of another excel table. Note that this is any excel table (defined table) that is not linked to a power pivot table.
  • Materialising the table (that is, removing the link to excel)
  • Deleting the powerpivot table from the model

The selection of an existing excel table completely redefines the import so that the refresh will alter the columns and data in the table so that the powerpivot table will be exactly the same as the excel table. While this may seem intuitive, the implication for the model is enormous because the linked table in powerpivot can change structure (adding columns etc).

Other Linked Table Management

Alternatively, a linked table can be managed by the ‘table definition’ button in the ‘Linked Table’ ribbon in powerpivot (as highlighted below). When selected, the dropdown shows all tables that have been defined in the excel workbook (regardless of their usage as linked tables), and, upon table change, imports the new definition and data.

If the excel table is already used as a linked table in the powerpivot model, the user is warned that the table is already used in a link and continuing will break the link (as below). If the user continues, the ‘old linked table’ is materialised and cannot be updated.

Conclusion

The use of linked tables in powerpivot provides a flexible way of importing, managing and using data in the powerpivot model. However, the automatic creation of linked tables without good naming conventions between powerpivot and excel may be confusing as the model is further developed, changed and updated. In-fact, every excel workbook model regardless of the use of powerpivot should implement good naming conventions. However, this can be managed with excel and powerpivot in various ways at various stages of model development.

Categories: BISM, DAX, PowerPivot Tags: ,
Follow

Get every new post delivered to your Inbox.

Join 114 other followers