Archive

Posts Tagged ‘Data mining’

X64 Excel Data Mining Add-in

The wait is over!  Finally,  the data mining Add-in for excel has been released as a x64 install.  You can download it from

http://www.microsoft.com/download/en/details.aspx?id=29061

 

Categories: Data Mining, SSAS Tags: ,

DMX 102 – SSAS Time Series with Multiple Attributes

August 7, 2011 1 comment

This post looks at extending the time series prediction so that multiple values can be predicted. In the last post (DMX 101) only a single value was predicted and while this can be useful, the approach has obvious draw backs because there is often an expectation that different classes have different patterns. Consider seasonal fashion. We expect that seasonal products will have a distinct life-cycle and to group products (from multiple seasons) would distort a specific seasons sales.

In this post, we predict two sequences within the same model. For brevity, both series follow the function y=mx+b;

    y1 = ½ X + 20 (and)

y2 = 2x

There are two ways of dealing with multiple predictors. Firstly, we can treat each as a function value of the same sequence. In this method, we identify a value for y1 and a corresponding value for y2. In this approach, the series key is simply the x (time) value. The second method considers that only one value is being predicted (y) which is a value derived from the series key and an additional identifier which specifies the function (as either y1 or y2).

Method 1 – Multiple Y values

In this situation, our predicting data is represented as functions of (the time) series key (see right). This is represented by the SQL below.
with xs(x) as
( select convert(money,1) union  all
select x+1 from xs where x<20
)
select x, x/2+20 as y1, 2*x as y2
from xs

The model is created and trained with the following commands.

– MODEL CREATION : create single sequence key with mulitple prediction values
create mining model ts_method1
(   x long key time
, y1 double continuous predict
, y2 double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : train the model
INSERT INTO ts_method1(x, y1, y2)
OPENROWSET(‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as (select convert(money,1) union all select x+1 from xs where x<20)
select x, x/2+20 as y1, 2*x as y2 from xs’
)

Note with this method, the predicted values (function) must be explicitly identified. That is, we can predict y1 or y2 but not y1+y2. For example, we can predict y1 with;

– select flattned results for y1
select flattened predict([y1],5)
from [ts_method1]

Method 2 – Single Y Value and a Composite Key

The second approach incorporates the function (or class type) into the model key. We can see that the data for the model shows a y_value that is dependent on both the x and y_function columns.

with xs(x) as (select convert(money,1)
union all select x+1 from xs where x<20)
select x ‘y1′ as y_function , x/2+20 as y_value
from xs
union all
select x
, ‘y2′ as y_function, 2*x as y2
from xs

The model can be created and trained with the following commands;

– MODEL CREATION : create complex sequence key with single prediction values
create mining model ts_method2
 x long key time
, y_function text key
, y_value double continuous predict
) USING MICROSOFT_TIME_SERIES

– MODEL TRAINING : Train the model
insert into ts_method2(x, y_function, y_value)
OPENROWSET (‘SQLOLEDB.1′
, ‘Provider=SQLOLEDB.1;Integrated Security=SSPI;Initial Catalog=master;Data Source=.\SQL2008R2′
 , ‘        with xs(x) as ( select convert(money,1) union all select x+1 from xs where x<20)
select x, ”y1” as y_function, x/2+20 as y_value from xs
union all select x, ”y2” as y_function, 2*x as y2 from xs
order by 1
)

Additionally, it is important to note that training Time Series model requires ordered data.

Prediction from a model which includes a composite key requires that the function (class) be specified in the query. The first query (as the results on the right show) does not show what class relates to what predictions. Note that two series of data are returned

– Q1 : Prediction without class specification
select flattened Predict([y_value], 5)
from [ts_method2]


This can be overcome by showing the class in the result set (in which case all results are still returned-

– Q2 : Prediction showing class specification
select flattened [y_function], Predict([y_value], 5)
from [ts_method2]
or, by specifying the class that will be predicted.

– Q2 : Prediction specifying class
select flattened Predict([y_value], 5)
from [ts_method2]
where [y_function]=’y1′

The addition of class tags into time series models improves their useability by disseminating an aggregate sequence into components. This post has looked at the two methods for including class tags and the prediction implicatios of each.

Categories: Data Mining Tags: ,

Data Mining in the Cloud

It has always amazed me, how un-eager people have been to adopt or at least entertain some aspect of data mining as opposed to the more traditional report and investigate style of analysis. This is especially so when it’s essentially freely available (as most SSAS instances already exist) and easy to use thanks to the data mining plugin for excel.

For an overview of the excel plugin, you can check out Add-ins Launch or download it from Add-ins Download and start playing.

The excel add-in requires a connection to a SSAS instance and if that’s not available you may want to check out Microsoft’s (very simplified) cloud version of the add-in at http://clouddm.msftlabs.com/ which allows for csv upload (and maintains some of the bike buyers sample data). I am sure that this will increase functionality as time goes on.

Finally, I would be remiss if I didn’t mention Predixion Software (https://www.predixionsoftware.com/predixion/) which offers both an excel add-in and cloud solution. Predixion’s principle architect was a driving force behind the MS excel add-in.

Categories: Data Mining Tags:
Follow

Get every new post delivered to your Inbox.

Join 40 other followers