X64 Excel Data Mining Add-in
The wait is over! Finally, the data mining Add-in for excel has been released as a x64 install. You can download it from
http://www.microsoft.com/download/en/details.aspx?id=29061
The wait is over! Finally, the data mining Add-in for excel has been released as a x64 install. You can download it from
http://www.microsoft.com/download/en/details.aspx?id=29061
This post looks at extending the time series prediction so that multiple values can be predicted. In the last post (DMX 101) only a single value was predicted and while this can be useful, the approach has obvious draw backs because there is often an expectation that different classes have different patterns. Consider seasonal fashion. We expect that seasonal products will have a distinct life-cycle and to group products (from multiple seasons) would distort a specific seasons sales.
In this post, we predict two sequences within the same model. For brevity, both series follow the function y=mx+b;
y1 = ½ X + 20 (and)
y2 = 2x

There are two ways of dealing with multiple predictors. Firstly, we can treat each as a function value of the same sequence. In this method, we identify a value for y1 and a corresponding value for y2. In this approach, the series key is simply the x (time) value. The second method considers that only one value is being predicted (y) which is a value derived from the series key and an additional identifier which specifies the function (as either y1 or y2).
| In this situation, our predicting data is represented as functions of (the time) series key (see right). This is represented by the SQL below. with xs(x) as ( select convert(money,1) union all select x+1 from xs where x<20 ) select x, x/2+20 as y1, 2*x as y2 from xs |
![]() |
The model is created and trained with the following commands.
– MODEL CREATION : create single sequence key with mulitple prediction values
create mining model ts_method1
( x long key time
, y1 double continuous predict
, y2 double continuous predict
) USING MICROSOFT_TIME_SERIES
– MODEL TRAINING : train the model
INSERT INTO ts_method1(x, y1, y2)
OPENROWSET(‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
, ‘ with xs(x) as (select convert(money,1) union all select x+1 from xs where x<20)
select x, x/2+20 as y1, 2*x as y2 from xs’
)
Note with this method, the predicted values (function) must be explicitly identified. That is, we can predict y1 or y2 but not y1+y2. For example, we can predict y1 with;
| – select flattned results for y1 select flattened predict([y1],5) from [ts_method1] |
![]() |
The second approach incorporates the function (or class type) into the model key. We can see that the data for the model shows a y_value that is dependent on both the x and y_function columns.
| with xs(x) as (select convert(money,1) union all select x+1 from xs where x<20) select x, ‘y1′ as y_function , x/2+20 as y_value from xs union all select x, ‘y2′ as y_function, 2*x as y2 from xs |
![]() |
The model can be created and trained with the following commands;
– MODEL CREATION : create complex sequence key with single prediction values
create mining model ts_method2
( x long key time
, y_function text key
, y_value double continuous predict
) USING MICROSOFT_TIME_SERIES
– MODEL TRAINING : Train the model
insert into ts_method2(x, y_function, y_value)
OPENROWSET (‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
, ‘ with xs(x) as ( select convert(money,1) union all select x+1 from xs where x<20)
select x, ”y1” as y_function, x/2+20 as y_value from xs
union all select x, ”y2” as y_function, 2*x as y2 from xs
order by 1
‘)
Additionally, it is important to note that training Time Series model requires ordered data.
| Prediction from a model which includes a composite key requires that the function (class) be specified in the query. The first query (as the results on the right show) does not show what class relates to what predictions. Note that two series of data are returned
– Q1 : Prediction without class specification
|
![]() |
| This can be overcome by showing the class in the result set (in which case all results are still returned-
– Q2 : Prediction showing class specification – Q2 : Prediction specifying class |
|
The addition of class tags into time series models improves their useability by disseminating an aggregate sequence into components. This post has looked at the two methods for including class tags and the prediction implicatios of each.
Life is ironic: After the short rant about company willingness to use data mining, I was asked to mentor a client on it! They wanted to investigate the predictive capability of time series modelling.
Regardless of whether mining is applied in a descriptive or predictive manner, Analysis Services makes it easy to include some facet of data mining into operations (or at least investigate it). The use of DMX (Data Mining Extensions) makes it simple to create the model and obtain values from it and is equivalent to SQL for data mining. Some are surprised to find that BIDS is not required to create, train and query the model.
In this example, we will predict a sinusoidal curve using the time series algorithm. The curve takes the form as seen in the picture below;

There are three basic steps to obtaining the predictions. Firstly, the model has to be created. For all intensive purposes, this is the implementation of one mining algorithm with a distinct set of parameters. The model relies on a structure which defines the meta-data for input but, just as in BIDS, the creation of a model will automatically create the structure for model use. Secondly, the model needs to be trained, and finally, the model can be queried.
A model can be created by simply using the CREATE MIING MODEL SYNATX (as below). Model columns must be defined by name, data type and content type. Content types define behaviour (eg role as key and time) and data behaviour (DISCRETE, CONTINUOUS, DISCRETIZED). Additionally, prediction columns (ones being predicted) require a prediction request (PREDICT OR PREDICT_ONLY).
CREATE MINING MODEL [SinPredict]
(
Degree LONG KEY TIME
, SinValue DOUBLE CONTINUOUS PREDICT
) USING
MICROSOFT_TIME_SERIES
WITH DRILLTHROUGH
As stated, the creation of the model will automatically create a structure. By default, the structure name will be the models name with a ‘_Structure’ suffix (eg SinPredict_Structure). Had we wished to specify the structure first (also perfectly valid) and append the model to it, we could use the create / alter-append syntax below.
– Create structure
CREATE MINING STRUCTURE [SinTimeSeries]
(
Degree LONG KEY TIME
, SinValue DOUBLE CONTINUOUS
)
– Append the model
ALTER MINING STRUCTURE [SinTimeSeries]
ADD MINING MODEL [SinPredict]
(
Degree
, SinValue PREDICT
) USING MICROSOFT_TIME_SERIES
WITH DRILLTHROUGH
Once the model has been created, it can be trained (or populated). Queries run against an unprocessed model will err as will trying to re-train the model without first deleting prior contents. The generic form of training is an insert statement. For example,
INSERT INTO Model(Columns)
<< The RowSet >>
Since the generation of sinusoidal data for training can be easily accomplished with a CTE, an OPENROWSET statement to an SQL server can be used to populate the model (note the cte provided the chart above). The DMX query to populate the model is;
INSERT INTO SinPredict(Degree, SinValue)
OPENROWSET(‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
, ‘ WITH Deg AS
(
SELECT CAST(1 AS MONEY) AS Deg
UNION ALL SELECT Deg + 1
FROM Deg
WHERE Deg < 180
)
SELECT *, CAST(SIN(Deg/9) AS MONEY) AS SinValue
FROM Deg OPTION (MAXRECURSION 180)’
)
Note that, by default, rowset queries are not enabled in SSAS and shouls be configured in server properties. These are basic (open rowset queries) and advanced properties (allowed providers).

Once the model has been trained, it can be queried. For time series, we can use the PREDICT or PREDICTTIMESERIES
functions (both achieve the same result) and forecasts the next n values. By default, the model returns results in the nested form which can be flattened using the keyword (as below).
SELECT PREDICTTIMESERIES(SinValue, 20)
FROM [SinPredict]
![]() |
![]() |
SELECT FLATTENED PREDICTTIMESERIES(SinValue, 20)
FROM [SinPredict]

One of the interesting observations about time series predictions is that the sequence identifier values are incremented from the last value in the training record set. These values are not predicted as part of the prediction function which becomes apparent when using smart keys. For example a date smart key in the form of YYYYMMDD with the last training value of 20110520 (20-May-2011) will predict incrementally from 21-May. When 30 is reached, the next sequence number is 31, then 32 and so on.
The use of DMX offers a simple way of implementing SSAS data mining. This simple example demonstrates how to create a time series model, train it and then predict from it.
It has always amazed me, how un-eager people have been to adopt or at least entertain some aspect of data mining as opposed to the more traditional report and investigate style of analysis. This is especially so when it’s essentially freely available (as most SSAS instances already exist) and easy to use thanks to the data mining plugin for excel.
For an overview of the excel plugin, you can check out Add-ins Launch or download it from Add-ins Download and start playing.
The excel add-in requires a connection to a SSAS instance and if that’s not available you may want to check out Microsoft’s (very simplified) cloud version of the add-in at http://clouddm.msftlabs.com/ which allows for csv upload (and maintains some of the bike buyers sample data). I am sure that this will increase functionality as time goes on.
Finally, I would be remiss if I didn’t mention Predixion Software (https://www.predixionsoftware.com/predixion/) which offers both an excel add-in and cloud solution. Predixion’s principle architect was a driving force behind the MS excel add-in.