Archive
Managing Connections In Power Pivot
Demonstrations that show importing data into Power Pivot do this from database icon in the home window (as identified below). However, once data is in your model and you wish to add new tables it is wiser to reuse the existing connection rather than create a new connection for each import. Each time that table(s) are added by this dialogue, a new connection is created within the model with multiple connections possible referring to the same source. For model simplicity, it is better to have a single connection referring to a single data source.
Standards for Import
The standard method importing data into Power Pivot is to use the import from database icon as shown in the diagram below (when a database is being used). This utility has features that allow you to detect relationships between tables and may suffice from two models. That is, when there are no existing tables and the model.
What this utility does is create a new connection to the data source each time the Wizard is invoked. Consider for example, an existing model that has the product table from Adventure Works imported. The initial import did not include the ProductCategory and ProductSubCateogory table. Naturally, we would want to include this in the model.
If we use this Wizard to reimport the tables, a new connection is set up. This connection is associated with the two new tables. We can see this by selecting the “Existing Connections” icon in the design.
Note that there are two connections in the model (as below) which was caused by the “re-import”of data once the initial population of the model had occurred.
Altering an Existing Connection
We can add data to the existing model and reuse an existing connection at the same time. If we wish to do this you simply reopen the existing connections properties and append tables (or add a query). Click the ‘Open’ button from the dialogue and select either the ‘Select from a list of tables…’ or ‘Write a query that will….’.
|
|
|
This will append any new tables to the model utilising existing connection.
Perhaps the main reason for reusing the connection is that it keeps the model succinct and manageable. Once a table has been imported through a connection the connection properties for that table cannot be changed without first deleting all the tables using the connection. Naturally this would increase the amount of rework required to the model should consolidation of connections be required at a later stage.
Finally, the connections are retained within the model even after all the tables using connection have been deleted. The connection must manually be deleted from the ‘Existing Connections’ dialogue.
OLAP Rigidity wins against Tabular?
There are many reasons why you might choose a Tabular Model over Multidimensional one. Marco Russo discusses some of the pros of tabular in his blog here which general relate to the flexibility that the tabular model gives you. However, one reason surprising reason for choosing multidimensional may be its rigid dimensional structure and the certainty that this structure gives in changing data environments.
A Multidimensional dimension provides many ways to record information against an attribute. Most importantly these are the attributes KEY, NAME and VALUE which are set under the dimension source properties. For example we can see the [State-Province] attribute from the Geography dimension in Adventure Works as below. The Key is the only property that must be set for an attribute so if no name is specified, the key will automatically be applied as the name.
Note that in the tabular engine, there is no concept of the KEY or Name. These are not specified as native DAX refers to column value(s).
Referring to Attribute Members
When an MDX query refers to members of the attribute it can do so by using the member name or the members unique name. Thus, the two MDX statements are equivalent;
– query using members unique name
select
[Measures].[Internet Sales Amount] on 0,
[Product].[Product Categories].[Category].&[4] on 1
from [Direct Sales]
– query using members name
select
[Measures].[Internet Sales Amount] on 0,
[Product].[Product Categories].[Category].[Accessories] on 1
from [Direct Sales]
There are many reasons why you would choose the first query over the second, namely, the second can give unexpected values when there is more than one member with the name being sought. That is, if there were two product categories (with different keys) and the same name (as Accessories), then the query would return only the first member. One might expect that the query would aggregate the results but this does not occur.
Once the query has been written, the members unique name is confusing and many people complain that it lacks reference to the underlying data (after all its impossible to know what does .&[4] means anyway). However, it is still the best way to refer to a member and this is the way most client tools generate MDX (using the unique_member_name).
Note that since the tabular engine has no concept of a key for an attribute (you don’t uniquely specify the key or name for an attribute) the MDX equivalent passed to the tabular engine uses what we would otherwise consider the name as the key. Thus, as in the example above, the tabular equivalent for this member is [Product].[Category].[Category].&[Accessories]. To an OLAP (multidimensional) developer, this type of key (ie the name as the key) is generally considered a real NO NO and contravenes best practice.
Client Queries and the interesting stuff.
For tabular models, the unique name for a member is generated by the name. Furthermore, most client tools refer to the key in their script. I have even seen some tools hard code the published member for parameters. So, what happens when the underlying attribute name changes? The change could break your report.
We can easily demonstrate how this error would occur using Excel as a client. Suppose I have a pivot based on a multidimensional cube and I convert the simple pivot to formulas (as below). The Accessories member in Cell A2 is hardcoded (the formula is in D2 and uses the key [4] as above). If the name of this member changes, the spread sheet still works fine, and the new name is displayed.
However, for the tabular example (as below), when the hardcoded name is changed, the MDX returns a #N/A error for the CUBEMEMBER function.
In the second picture, the name of the member has changed to [Accessories & Other] however the ‘report’ still relies on the hard coded value .&[Accessories] and the error occurs.
Conclusion
The use of a ‘key’ for each member would offer a way to solve the problem as addressed above and this is only available in multidimensional dimensions. To be fair though, the problem is as much a function of design as it is of structure. A multidimensional dimension would still raise the issue if the attributes name had been used as a key (but no-one does that do they?).
DAX III – Filtering Queries with CALCULATETABLE and FILTER
So far the queries that we have created in DAX have focused on the entire result set (with the exception of the start at clause in evaluate). Naturally, we would want to restrict the rowset in some way and this is the focus of this post – how to filter row-set tables in DAX queries. There are two ways to do this, firstly, we can use the CACULCUATETABLE syntax and secondly we can use a FILTER that covers the evaluate syntax.
BOL defines the function of CALCULATETABLE as “Evaluates a table expression in a context modified by the given filters” taking the form;
|
CALCULATETABLE(<expression>,<filter1>,<filter2>,…) |
There are many applications of how we can use this, however, let’s look at the first of reducing the rows returned, say for the first six months of 2000. We can evaluate this query like this.
|
evaluate |
|
|
One of the restrictions of using CALCULATETABLE to restrict rows is that the filter cannot be applied to calculated measures. So if we extended the filter arguments to include a condition on ‘sales_amount’, we would get an error.
|
evaluate |
|
A function ‘CALCULATE’ has been used in a True/False expression that is used as a table filter expression. This is not allowed. |
Note that the filter arguments in calculate table are applied to the underlying table regardless of the fields returned (projected or selected). So, if we wanted to sum for the first 6 months sales for the year 2000, we could simply remove the ‘dates’[month] field from the summarize and the calculation is applied only for Jan to Jun.
|
evaluate |
|
|
We could also include this restriction in the definition of the calculation and achieve the same results (see sum_amount2 below).
|
evaluate ) |
|
|
From the rowset point of view, we are still faced with the problem of how to restrict rows based on a calculated value. In order to do this, we can apply the FILTER function against the table. Filter returns a subset of the table and takes the form;
|
|
Consider yearly sales for the first six months of the year as below;
|
evaluate |
|
|
A filter can be applied to this rowset so that only sales over 50,000 are returned. Note that it is wrapped around the CALCUALTETABLE.
|
evaluate filter ) |
|
|
In these situations, it may be best to define the measure explicitly;
|
define measure ‘dates’[sum_amount] = calculate(sum(‘Sales’[amount])) evaluate filter ) |
DAX Studio (Beta) Released
The first installers for DAX Studio are released.
You can get them here
DAX II – Extending the use of Evaluate
In a previous post, I looked at evaluate as the method to return a table in a row-set form. However, the query syntax allows us to more precisely define the result set and this is the focus of this post.
Books on line specifies three optional arguments that can be used in query syntax to control, there are DEFINE, ORDER BY and START AT. The syntax follows the format;
|
[DEFINE { MEASURE <tableName>[<name>] = <expression> }
This post looks the basics for querying tabular models using a DAX query. While Tabular models can be queried with MDX, DAX is the native language for tabular databases and may provide better results than MDX. This post looks how to building a simple table in DAX so that a row set can be exposed.
The basis of record-set queries in DAX is the evaluate statement which returns the table that is defined in the functions argument. The argument can be any statement that forms a table. For example, we can simply pass a table to the statement and return all the rows and columns from it;
evaluate
More likely though, we will want to define table which is defined by our query, that is, we define the table which will return the results needed. As more and more DAX queries are used and the community develops its standards, it is likely that there will be a query pattern which forms. For now though, consider that the argument of evaluate can be anything that is returned as a table.
Consider selecting distinct years. Since the values function returns a table (of a single column), we can be use the values function as the argument for an evaluate.
evaluate
Similarly, summarize returns a table and produce the same results.
evaluate
The key point here is that we need to define a table as part of the evaluate statement.
Extending the Defined Table with ADDCOLUMNS Usually, we want our defined table to combine data from another table. There are a few ways we can go about this and one method is to use the ADDCOLUMNS function. As the name suggests, ADDCOLUMNS adds a calculated column to a table. Note that a new column is defined both its name and expression.
ADDCOLUMNS(<table>, <name>, <expression>[, <name>, <expression>]…)
It is interesting to note the evaluation context of the expression. While we may expect a calculation to be evaluated in the context of the row it is being added to, the context applies to the entire model. For example, consider summing the sales amount by year. Extending our yearly values table by adding a column to sum sales amount is evaluated in the context of the entire sales table (as below). This is akin to the query context in MDX for measures that are defined with the WITH MEMBER MEASURES.[Measure Name].
evaluate addcolumns )
In order to change the evaluate context of the calculation to the row, we must use the calculate statement.
evaluate
Any filters that are applied with the calculation context are evaluated with the calculations context. Therefore (as expected), to retrieve a single store sales, we would include it in the calculate statement.
evaluate )
If you have tried to query a tabular model using DAX, you will soon realise that it is not an easy undertaking. Why is it hard? Well, for one, SSMS isn’t currently much of a friend for tabular models. When you connect to a tabular instance with SSMS the model is displayed in a multidimensional format. The only tabular meta-data which you can easily derive are some table names (dimension names), some field names based on attribute hierarchies and the names of measures (without context for the table that they are built in). This post looks at DAX Studio which is an Excel 2010 Addin available on CodePlex. It is designed for DAX and tabular models. When released, the installer can be down loaded from here.
What’s the Issue with SSMS? SSMS does not show useful metadata for building DAX queries. Further, its functionality is suited to MDX queries with the drag and drop name placement showing the multidimensional naming standard. In this post, I have created a really simple tabular model with three tables as shown below. It has a sales fact and dimensions of Stores and dates and you can see that all the fields from the Sales table are hidden from client tools except for calculated values (‘Sales Amount’ and ‘Store Ratio’).
When I connect to the model in SSMS (as below), the model presents itself as a multidimensional model.
Some of the implications of using SSMS are that;
In general, the main concern for using SSMS is that you cannot see the model structure and metadata. You can derive some of it (for example, through attribute hierarchies), but SSMS you does not allow you convenient access to the model structure. Additionally, when you drag an object which you would expect to be a field (for example store_name in the above model), the field appears with the multidimensional naming convention ([Stores].[store_name]). What’s the Solution Dax Studio is an Excel 2010 Addin available on CodePlex that allows you to connect to any tabular model, examine the metadata and query it. Excel is a good launch application because it allows the PowerPivot pro’s to also query their models without the requirement of a tabular server. Once installed, you will notice that the Add-Ins tab on the ribbon appears (if you and DAX Studio is added as an icon). DAX Studio does not form part of the PowerPivot tab.
The tabular model used above is shown through DAX Studio and the model structure is immediately visible from the model explorer (metadata tab as shown below).
Query Results As an Excel Addin, output are generally targeted at an excel sheet. You can set both the output method and target destination through the run button drop down and output results drop down. These are discussed below.
The Output Method (found with the Run button drop down) allows DAX table results to be outputted as a table (that is an excel table), Static Output (that is a Simple Sheet) or no result output (Validate which validates the command). These sample outputs are shown below.
Table Output Static Output Grid Output
The Output Target (identified below) specifies what sheet the results will be sent to (where an Excel sheet is the target of the output). By default results are sent to a sheet titled ‘DAXResults’. This is added if the sheet does not exist in your workbook.
Connecting to a Tabular Server The current server connection is managed through the server icon and the active Model identified through the model drop down. These are both highlighted in the captures below.
When you connect to a server (dialog show below), allowed models are populated.
Server Connection Dialog
Connecting to PowerPivot Model DAX Studio allows you to connect to a tabular model which is contained in a workbook that has a PowerPivot Model. In order to do this, you must launch DAX Studio from where the active cell is a PowerPivot Pivot Table. That is, select a cell in a Pivot Table that is derived from a PowerPivot model and then launch DAX Studio. You will notice that the model is titled Microsoft_SQLServer_AnalysisServer. You can see the outcome of these steps (1, 2 and 3) in the diagram below.
If you have launched DAX Studio from a PowerPivot Pivot Table, you can still connect to a tabular server and then reconnect to the Excel PowerPivot model at a later stage. You may have noticed that the connection dialog (above) specified no PowerPivot models in the Excel Workbook. If launched from a PowerPivot Pivot, this option is available and you can re-query the workbook model without reopening DAX Studio. This is shown in the two screen shots below
When DAX Studio is launched from a PowerPivot Pivot, the connection dialog identifies the workbook as a tabular model. You can still connect to a tabular server and then reconnect back to the workbook. Note the PowerPivot Model option in the connection dialog. If you do not open DAX studio from a PowerPivot Pivot, a tabular model that is in the workbook is not available for use as a connection.
Other Cool Features One other noteworthy feature of DAX Studio is the inbuilt support for SSAS Dynamic Management Views which can query both to Tabular and OLAP servers. The DMV tab exposes these schemas which (when dragged across) build the full select syntax (as shown below). I reiterate, the DMV queries can be run against existing OLAP servers (which you can connect to with the connection manager).
This book is a must ready for anyone who is interested in using Power View or anyone who wants to make their analysis and reports interactive with Power View (well that was straight to the point, now lets get into some detail). A current trend in the business intelligence landscape is a shift in focus from an IT user to the business user. That is, the person asking questions needs to interact with the data rather than specify report requirements to a technical user. Power View is Microsoft’s offering in this area and empowers the end user through an interactive environment for visualising data (whether it is creation, alteration or manipulation). In-fact, many organisations can take advantage of this product without cost. “Visualizing Data with Microsoft Power View” is a step by step guide which not only shows the end user how to use the Power View product but also get the most benefit out of their visualizations. Written for the Business User “Visualizing Data with Microsoft Power View” is well targeted at the business user and written in a context that does not alienate a non-technical user. The structure of the book builds on prior learning and reinforces sound business scenarios with practical examples. Every scenario has a “Learn by Doing” exercise which gives a practical, easy to follow example of how to apply the discussed content. The examples are applicable, easy to follow and relevant. Included Data and Additional Media In addition to the book content, there is a range of downloadable content (or installable if the book is purchased as a hardcopy) which includes videos (over 4 hours) and the data that is used in the examples. The videos reinforce the lessons, highlighting applicable areas of the screen and the commands used. The appendices go into the steps required to install the data in your own environment. Book Structure Discarding the appendixes, the book is broken down into two parts (the chapter outlines for the book are below). Part I deals with Power View and how to use it in an existing environment. If you were only interested in creating, viewing and using visualizations, Part I would be enough (say for example, managers and sales people). Part II extends the offering by diving into model creation (the data that Power View uses). This is more applicable to a savvy business user, power user (and so on). Notwithstanding the likely audience for Part II, it is very easy to follow with plenty of practical examples and exercises. Chapter Outlines The structure of the book is outlined as; Part I – Power View Part II – Creating a BI Semantic Model (BISM) Part III – Appendixes You can get “Visualizing Data with Microsoft Power View” at http://mhprofessional.com/product.php?isbn=0071780823 In a previous post, I addressed the function of the last non empty aggregation in SSAS (OLAP). Specifically, I looked at what happens when there are ‘missing’ chucks of data in the fact table and how the OLAP engine natively carries forward the balance to an aggregated time member without regard for the what is the ‘true’ last date. That is, we expect the last date of stock to be applied for all aggregated date values regardless of whether data exists or not. This post looks at the LNE calculation in the tabular model. Generally, we expect the stock schema to take the form of the fact (balances) surrounded by the dimension table. The fact table (Stock) holds the [Quantity on Hand] (on_hand) for a given date with the Dates table showing a natural hierarchy of Year, Month and Date. Summing the column on_hand will allow the calculation of daily balances (we will be able to show correct balances when a date is selected) however, if we use any aggregated Date field, the [Stock on Hand] value will be incorrect because the formula adds the amount for all days filtered by the date restriction. For example, if we consider a very small subset of data (shown as ‘Source Data’) and [Stock on Hand] defined as Stock On Hand:=Sum([on_hand]) We can see that Jan 2011 for Product A (20)=15+4+1 and for product B (2011)=10+8 Clearly, the addition for aggregate time members does not give us what we want. The next thought is to change the stock filter context to the max of time. That is, return the sum of stock for the last date in the selected time period. For example; Stock On Hand:=CALCULATE(sum([on_hand]), FILTER(ALL(Dates[date]), Dates[date]=max(Dates[date]))) The problem with this approach is that the aggregate values will only show for the last date in the month (or year). This is not a problem when we have full year worth of data in the fact and the maximum of the date filter is the last date in the period. But in this case we don’t and usually the most recent data will not be complete for the year. For Product A in the picture below, the total for Jan-2011 is correct (1) because the maximum date for Jan is 31-Jan and has data (see source). However, in the year context (ie 2011) the maximum date is 31-Dec-2011 and there is no stock for this and therefore the sum is empty. To solve this problem, it is important to remember that the filter context applies a filter to the fact. So, for the filter context of Jan-2011 (say when Jan-2011 is on a row), the data in the fact is filtered for all the dates in Jan. What we are really interested in therefore is the last date in the fact for the current filter context. Stock On Hand:=CALCULATE(sum([on_hand]), filter(Stock, Stock[date_key]=Max(Stock[date_key]))) In this formula, we recognise that the filter context has been applied to the fact table and use this to derive the maximum date for the current filter period and then use this to further filter the fact. Alternate Approach using Transactions In addition to using the quantity of stock as a field in the fact, we may also consider using the transactional amount. In this case the use of summing against the Date table filter context works because we are interested in summing all data less than the max date of the filter context. In this situation, our formula would be; Stock On Hand:=CALCULATE(sum([qty_mvt]), FILTER(Stock, Stock[date_key]<=max(Dates[date]))) The multidimensional model appears to provide a richer environment for model design. However, for the case of snapshot fact tables, the tabular design may offer a much faster design and refreshing options. This post looks at the snapshot design and how business requirements are modelled in both multidimensional and tabular platforms.
The Snapshot Fact
Snapshot data (or accumulating snapshot fact tables) are usually used to record process driven data where records change over time. Consider, for example, the tables FactResellerSales and FactInternetSales tables in AdventureWorks. Here, the record shows Order Date, Due Date and a Ship Date. When the sale is made, the Order Date is known and the Due date is probably known, however, the ship date can change and is only truly known after the order has shipped. For this type of fact table, data should be added when the sale occurs and updated as information about the sale comes to light (that is, when it is shipped).
Classic Star Modelling The standard way to model this relationship is through the reuse of the date table. That is, each fact field relates to the same Date.DateKey field in the dates table. This is shown for the tabular and multidimensional design models below
Tabular Multidimensional
Multidimensional Interpretation When a cube is built from this design, the Dates dimension becomes a role playing dimension and is joined to the fact for each relationship identified in the data source view. Thus, the Dates dimension is reused by the cube and with the dimension names appearing as the name of the fact field (after camel case conversion). Notice that there is a single dimension in the solution however, there appears to be three date dimensions (Order Date, Due Date and Ship Date).
Solution View Cube View
While this approach may provide answers to simple business questions (eg, What is that value / quantity of product shipped on date xyz), the modelling technique fails when the query becomes complicated across restrictive dates. For example, it is not straight forward to determine the quantity ordered, shipped and due on date xyz.
Tabular Interpretation In contrast to the multidimensional model, the tabular model employees an active relationship as the default join between fact and dimensions and each table appears only once in the ‘dimensional representation’ of the model. For example, there is only one date table in the pivot view.
By default, aggregation functions will use this relationship. In this situation, the active relationship (solid line) is between the OrderDate and the Date table. A sum measure ( sum([OrderQuantity]) The tabular model also allows functions to specify which relationship will be used as joins between tables. Therefore, the quantity of products shipped on a date can be determined by specifying the relationship between ResellerSales.ShipDateKey and Dates.DateKey. For example,
Ship Quantity:=CALCULATE(sum([OrderQuantity]),USERELATIONSHIP(‘ResellerSales’[ShipDateKey],’Dates’[DateKey])) This allows the determination of measures that relate to more generic dimensions. For example, we can easily define [Order Quantity], [Ship Quantity], [Due Quantity] which specifies these values by date. This is in direct contrast to the default multidimensional behaviour and allows for more native browsing. For example, the date x value pivot below quickly identifies the sparse nature and trend of adventure works data. I had never looked at adventure works data like this before. Here we can easily see that products are ordered on the 1st of the month, shipped on the 8th and due on the 13th. There are very few exceptions to this in the fact data.
Compromise? The UDM can be designed to produce this outcome; however, it is not part of the ‘default’ behaviour. One way to achieve this would be to conform all dates into a single field (for example through a union) and specify separate measures for each union join (ie; add the Order Data, then the Ship Data and finally the Due Date data). However, this would require longer load times (since we are effectively staking facts) and increase the measure group size. The tabular approach (in my opinion) is a much nicer compromise. NB It is also easy to mimic the multidimensional behaviour in tabular. Mulitple date tables are added to the model (one for each fact date) and labelled ‘Ship Date’ , ‘Due Date’, …
|
































