Data Mining in the Cloud

It has always amazed me, how un-eager people have been to adopt or at least entertain some aspect of data mining as opposed to the more traditional report and investigate style of analysis. This is especially so when it’s essentially freely available (as most SSAS instances already exist) and easy to use thanks to the data mining plugin for excel.

For an overview of the excel plugin, you can check out Add-ins Launch or download it from Add-ins Download and start playing.

The excel add-in requires a connection to a SSAS instance and if that’s not available you may want to check out Microsoft’s (very simplified) cloud version of the add-in at http://clouddm.msftlabs.com/ which allows for csv upload (and maintains some of the bike buyers sample data). I am sure that this will increase functionality as time goes on.

Finally, I would be remiss if I didn’t mention Predixion Software (https://www.predixionsoftware.com/predixion/) which offers both an excel add-in and cloud solution. Predixion’s principle architect was a driving force behind the MS excel add-in.

Advertisements

Dynamic Package Configurations (Part 2)

This article continue discussion about a method of dynamically assigning variable values at runtime Apply the Record Set to Applicable Variables

Initial State

The initial states of the string variables are set to C:\ with variables being used for the record set container.

The code in the Show Values script tasks is simple and only there to demonstrate a variable values. This is the same code for all three ‘Show Value’ scripts. Note that the variables are passed to the script via the default ReadOnlyVaraibles inputs for the script task.

Assign the Record Sets to the variables (Object Container)

The Execute SQL tasks (‘Get Params (1)’ and ‘Get Params (2)’) are used to pass the query to a container. The queries are not complicated (below) and the standard methods for assigning a record set to a variable apply. These are to ensure that the result set is set to ‘Full Result Set’

Query for Each Result Set

Assign Result Set to Variable

There are two requirements to assign the result set to a variable. Firstly ensure that the result set is set to ‘Full Result Set’ and secondly, that the variable is mapped to the result set.

The two script tasks (‘Assign Params (1)’ and ‘Assign Params (2)’) both contain the same code and only expose the record set variable. One of the main benefits of this technique is that the assignment script does not need to know ahead of time what variables are going to be assigned. This is deemed by the field ‘variable_name’ in the record set result.

There are two main pieces of code in this script task. Firstly, we assign the record set variable (variables) to a data table (in order to iterate over it) and secondly, we iterate over the table and assign variable a value.

Assign the Record Set to a Data Table

The code to assign the record set to the data table is below.

Assigning the Dataset To Variables

The code to assign the data table to variables is below. The iteration of data table (rows) shouldn’t be too much of an issue here or should the checking for package variable name but the key takeaways are that we

  1. Check to see that the variable exists (using VariableDespenser.Contains)
  2. Lock the variable (VariableDespenser.LockForOneWrite)
  3. and then write the value.

Benefits

There are other methods of variable assignment that can be used to change the value of variables at run time, for example, we could assign values in a execute SQL task based on a single row. However, I think that limitation with an approach like this is that task has to be configured ahead of time and the variable names must be known. This method allows variable assignment based solely on the passed record set and as such is very reusable.

I should also point out that this example assumes a sting data type. The code can be extended to permits user defined data types so the record set specifies the data type however this was not included for brevity.

Dynamic Package Configurations (Part 1)

Introduction

In the following two posts I show a method of dynamically assigning variable values at runtime with minimal user intervention. I consider this method of assignment dynamic because the package (and tasks) do not require specific configuration for to the variable that is being altered. Variable values are changed based on the variable names in a record set, and this offers an extremely high degree of flexibility in packages.

The use of SSIS configurations and variable assignment are convenient way to change the value of package variables at runtime.
Configuration files can be created through the menu path <SSIS><Package Configurations> or by selecting the ‘Package Configuration’ option after a right click in the control flow and variables can be assigned a runtime value by using the /set switch in dtexec (either command line or job execution text (as an agent job)). Coincidently, the package configurations can also be set a runtime using the /Conf switch.

However, a possible downside of these utilities is that variable values are set at runtime for the entire package and cannot be changed throughout package execution. In this blog, we will show a method assigning indeterminate variables at runtime without user intervention. This is achieved by

  1. Obtaining a record set for the variables (and values) that should be changed (and)
  2. Assigning the package variables values based on the record set parameter names.

This is a two set process that retrieves a list of variables to change (and their values) and then iterates the list, changing the variable identified on each iteration. Also, although the code presented here uses a query to retrieve these values, I find a stored procedure (with param) is a much cleaner way of retrieving the list.

Overview

To demonstrate this works, we will create a simple package that will dynamically change value of variables ouput_path and input_path at run time. We show the start values, first assignment values then final assignment values. This is shown in the mark up below;