SSIS – Creating an OLEDB Connection to an ODBC Source and SSIS parameters

This post was originally going to be about using the advanced features of a SSIS OLEDB command to set parameters.  I’ve been using Oracle sources lately and the OLEDB command did not recognise the parameters for my query. 

For example, my in my work environment, the statement (below) just did not recognise the key_field parameter and so I had to manually add it through the components advanced editor.

Update some_table 
set some_field = new_value 
where key_field = ?

We’ll look into the advanced features a little later, however, when I tried to mimic the behaviour (using a MYSQL connection), the only issue I had was creating an OLEDB connection in SSIS.  I have ODBC connections on my machine however, (naturally enough given the components name), the OLEDB command requires an OLEDB connection.

So, to create an OLEDB command (and have it reference an ODBC source), simply use the Native OLE DB\MSDataShape provider and use the ODBC name as the server (as in the image below).  The UserId, Password and catalogue/schema can all be set through the connection managers interface.

image

 

Parameter Management under OLEDB

I can now use that OLEDB connection to execute an OLEDB Command.  As discussed above, my original intent was to show how to add a parameter when SSIS did not recognise the parameters.  In my situation, this was against an Oracle source.  In this demo, I’ll use a simple command

update adventure_works_2012.customer 
set territory_id = ? 
where account_number = ? 

and the column mappings tab of the OLEDB command allows me to map fields in my source (input data flow) to my command (ie parameters in the destination).  We can see this in the mapping screen shot below.

image

You will notice that the names are generic (Param_0 …. ) and refer the the placement of the ? symbols in the source query.  That’s not really user friendly and if i want to give these meaningful names, you would think that I can set them in the Input and Output Properties (using the Name Property as below).  However, this is not the way this UI works and the name defaults to the placement of the ? in the query.  Long story short, you can’t control the names of the parameters.

image

However, this part of the (advanced UI) does give us a little indication into how to add parameters.  If no parameters are recognised, my can add them through the ‘Add Column’ button on the External Columns node of the Inputs and Outputs tab.  This is what i had to do with my Oracle command.

Biml XI – Conditional Splits in Data Flows

At first thought, I don’t see a huge need for conditional splits in SSIS, separating the stream (as it were) is something that I usually try and do at the source. That is, we should only bring in as much data as is required by the transform. However, in thinking about it, the transform is an absolute necessity when you create your own SCD (Slowly Changing Dimension) logic. Rather than the native SSIS SCD component which really doesn’t offer too much flexibility, building your own can offer huge performance improvements and complete control over the package execution.

The basics of a role your own SCD component work like this;

  1. the dimension table includes hash column which stores the hash for changing attributes
  2. the ETL package builds a view of the dimension and in doing so calculates the hash for the changing attributes
  3. the package looks the hash from the dimension table and then splits the data into three conditional streams;
    1. If there is no hash, the record doesn’t exist and must be added.
    2. If there is a hash and it is not the same as the ETL hash, the changing attributes have changed (from the dimension table) and the new record must be inserted (with the previous being expired).
    3. Finally, if the two hash values match, the changing attributes have not changed and no changes required to the dimension data.

It’s easy to see here why the split component is needed.

The Split in Biml

To demonstrate how the split works in Biml, we’ll continue with using customer names from adventure works. Our purpose is simply to create different outputs based on the customer last name. That is, if the customer’s last name begins with A they should be sent to an output, B to another and the remaining going to the default (or catch all) output. The executed package would appear as below. Note counts coming into the buffer (18,484) are split between A records (842), B records (1,061) and everything else (16,581).

The Biml for this package is shown below. Note that the outputs for the split (see the ConditionalSplit
tag) are defined by an OutputPath
(which is of course contained in the OutputPaths
root node of the transform). The condition for the split is defined in the Expression
nodes text. There is no requirement to define the default output.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="src_adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2008R2"/>
    <OleDbConnection Name="dest_stage" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=stage"/>
  </Connections>
  <Packages>
    <Package Name="split_transform" ConstraintMode="Linear">
      <Tasks>
        <Dataflow Name="split_transform_01">
          <Transformations>
            <OleDbSource Name="src_names" ConnectionName="src_adventure_works">
              <DirectInput>
                select
                [CustomerAlternateKey]  as customer_code
                , [FirstName] as first_name
                , [LastName] as last_name
                from dbo.DimCustomer
              </DirectInput>
            </OleDbSource>

            <ConditionalSplit Name="split_name">
              <OutputPaths>
                <OutputPath Name="Name_A" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="A"</Expression>
                </OutputPath>
                <OutputPath Name="Name_B" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="B"</Expression>
                </OutputPath>
              </OutputPaths>
            </ConditionalSplit>

            <OleDbDestination Name="Customers_Default" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Default" />
              <ExternalTableOutput Table="[dbo].[customers]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_A" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_A" />
              <ExternalTableOutput Table="[dbo].[customers_A]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_B" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_B" />
              <ExternalTableOutput Table="[dbo].[customers_B]" />
            </OleDbDestination>
            
            
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

The XML Gotcha

If you’ve used SSIS for a while, you’ll probably have forgotten the pain that you had learning that expressions are more C# based rather than formula. For example, to use an equal in a formula you’d use == rather than =. When we work with XML (html or any markup type language), the nature of the syntax may cause some issues with special characters are required. For example, XML is heavily reliant on the characters <> for tags, so how can we use those characters in a formula without the engines interpreter mistaking the symbol for a tag?

For example what would happen if we wanted to have an output like last name being from A <= M. In this case we have to substitute the special character with a code (as is exactly what happens in HTML). The snippet below shows how would achieve this by using the character codes for the symbols >, & and <. Our goal here is to use split names A-K and M-Z.

<ConditionalSplit Name="split_name">
  <OutputPaths>
	<OutputPath Name="Name_A" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="A" &amp;&amp; LEFT(last_name,1)&lt;"M" </Expression>
	</OutputPath>
	<OutputPath Name="Name_M" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="M"</Expression>
	</OutputPath>
  </OutputPaths>
</ConditionalSplit>


 

BIML I – Getting Started with BIML

Traditional SSIS development requires the ETL developer to create a package and then manually add and configure tasks. While this method of visual development is fine, it suffers from reproducibility and maintenance overhead. For example, imagine (a very contrived scenario) where you create a package with a single ‘Execute SQL Task‘ and configure it to fire a procedure. Simple enough but now suppose you want to create a new package and fire a different proc – here you have to manually go in and follow all the same steps again. Perhaps not so bad but what if you create 10 packages and then decide that the Execute SQL Task should be prefixed EXSQL, you would have to manually go into each package and rename the task. This could be very cumbersome and it (by nature) carries a maintenance burden (or maintenance overhead).

BIML (Business Intelligence Markup Language) is an XML specification that (amongst other things) allows you to specify the creation SSIS packages. One of the great things about BIML is that it can also include code snippets that amplifies the ability to generate code. This first post will form part of a series that show how to use BIML to create packages and what type of outcome we can achieve. But (of course) first things first so let’s start with a simple goal, so let’s look at using BIML to generate a package that executes an SQL command.

Before we continue, a quick revisit my above comments about maintainability for a moment.

An SSIS package is really just an XML file right, so why can’t I just edit that?

If you’ve ever played around with an SSIS package, you’ll probably know that it is an XML file. You can see this by viewing the code of your packages as shown below (right click on the package and view code). So you may think that you can generate or alter this pretty easily.

However, I have found this XML to be very finicky. Suppose you wanted to redefine a connection by editing script (something which should be simple you’d think and I’ve discussed here). You would expect that you could just examine the code, find the connection string, alter it and save it. But I’ve never had any success in that approach. When you try to open the file again (using the visual designer) it is invalid so your pretty much helpless with the changes you tried to make (sorry but there’s no comeback). That’s only to change an existing package so the creation of a package is (IMO of course) is just not possible. You could use an API but that just seems out of reach for most.

Of course, newer versions of SSIS have project wide objects (like connections). Using these would naturally require less maintenance however were are still faced with task management (altering and creation) and package development. Further, if we want to define a group of packages by meta-data (say for example a list of tables to extract), there is no current way to do it in SSIS. A much simpler approach is BIML with give us a generic specification to development.

What do I Need to get running with BIML?

BIML is available when BIDS helper is installed. BIDS Helper is a community addin for SQL Server Data Tools (SSDT and BIDS) and is available on code plex. If you haven’t heard of BIDS helper before, take some time to look around the features. It really is a must have for business intelligence development in the Microsoft space.

Once BIDS helper is installed, you can add a new BIML file by right clicking on the Project (or SSIS Packages) node and selecting ‘Add New Biml File‘. The file is created in the Miscellaneous Folder (as shown below).

 

A single BIML file can contain the code that creates many packages (as will become apparent) but for now would we will just work with the basics and use it to create a package that fires an SQL statement.

Adding Code

Opening the BimlScript file shows the XML in the file (we should probably rename it to give it a decent name but we will not worry about it this time). When you start, the file just contains the root node (as below).

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
</Biml>

What BIML now allows us to do is create objects within the root (and later other nodes). For example, since we want to create a package, we naturally expect that we need to add a package child. Naming is pretty intuitive since it follows the SSIS object names. I’ve already hinted that we can use the same file to create multiple packages so it seems natural that a package tag should be embedded within a packages tag. Something like the following.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Packages>
    <Package></Package>
  </Packages>
</Biml>

Actually, I did cheat a little (and so can you). The intellisense functionality lets you know what tags are allowed at the position you are. Remember in XML that a tag is just an identifier for an object so since I want to do something in within by BIML tag (where else would I put it), I can just open a tag and see what options are available. I want to create a package (so a search for package shows I can add packages and of course within that, I can add a package).

Validation and Creation

Back to our package snippet. We’ve got the packages declaration (as above) to specify a package within a packages node. You’d expect there is a way to validate the XML to be sure it has the required information to create packages? Well, just right click on the (explorer solution) and select ‘Check Biml for Errors‘. A dialog opens with a list of errors.

 

If we think about it, it makes sense doesn’t it? Of course we need to give our packages a name! In contrast, the error for ConstraintMode is something that we take for granted when we generate packages visually. We define the order of tasks in a package through constraints (or connectors). However Biml has no indication and so the layout of tasks other than the order in which they appear in XML. How they relate to each other must be defined in the package declaration and this is the purpose of the ConstraintMode property. There are two allowable values for this property (either parallel or linear) which (as the name suggests) specifies that tasks will not be connected or connected sequentially.

So in order to create a package, we update the Biml code as (note that since we are only having one task in this package the constraint mode is irrelevant);

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Packages>
  <Package ConstraintMode="Linear" Name="package_01"></Package>
</Packages>
</Biml>

We now have everything that we need to generate packages from the script. Simply right click on the Biml file and select the Generate SSIS Packages (as above). You’ll see a new packages create under the SIS packages node in solution explorer. Of course there are no tasks in our package.

Adding the Execute SQL task to the Package.

We already seen that the XML nodes in naming conventions are very similar to that of SSIS (both in structure and definition). If we think about adding an Execute SQL task to the package, then, there would be a few points we could make (or at least assume);

  1. a task would be embedded within a tasks node
  2. the task would probably be defined by name (or a task type)
  3. an execute SQL task requires a connection
  4. and execute SQL task requires a command
  5. the connection would be required before the command.

Turns out that the Biml to add this task to our package is pretty straightforward. Of course you can use the IntelliSense to suggest what nodes are available and, when you look at the code and think about what you’re trying to do it seems to make sense (that is, create a connection, create a package, at a task to the package uses a connection and has some syntax).

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <Connection Name="SQL01" ConnectionString="Provider=SQLNCLI10.1;Data Source=.;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW"/>
  </Connections>
  <Packages>
    <Package ConstraintMode="Linear" Name="package_01">
      <Tasks>
        <ExecuteSQL Name="fire_proc_01" ConnectionName="SQL01" >
          <DirectInput>Select 1</DirectInput>
        </ExecuteSQL>
      </Tasks>
    </Package>
  </Packages>
</Biml>

Perhaps the most interesting thing about this snippet is the connection object. We’d expect it to be embedded within connections however, the location as a root node gives us an advantage of reuse, so adding another task with the package or additional packages and tasks can reuse that connection. For example, to create 2 packages with 2 Execute SQL commands in each (which require the first to complete and then the next), we could write;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <Connection Name="SQL01" ConnectionString="Provider=SQLNCLI10.1;… (as above) "/>
  </Connections>
  <Packages>
    <Package ConstraintMode="Linear" Name="package_01">
      <Tasks>
        <ExecuteSQL Name="SQL_CMD_1" ConnectionName="SQL01" >
          <DirectInput>Select 1</DirectInput>
        </ExecuteSQL>
        <ExecuteSQL Name="SQL_CMD_2" ConnectionName="SQL01" >
          <DirectInput>Select 1</DirectInput>
        </ExecuteSQL>
      </Tasks>
    </Package>
    <Package ConstraintMode="Parallel" Name="package_02">
      <Tasks>
        <ExecuteSQL Name="SQL_CMD_1" ConnectionName="SQL01" >
          <DirectInput>Select 1</DirectInput>
        </ExecuteSQL>
        <ExecuteSQL Name="SQL_CMD_2" ConnectionName="SQL01" >
          <DirectInput>Select 1</DirectInput>
        </ExecuteSQL>
      </Tasks>
    </Package>
  </Packages>
</Biml>
This produces package_01 which looks like and we can see how the specification of ConstraintMode has caused SQL_CMD_1 to link to SQL_CMD_2.

 

Note the output from the second package definition (where ConstratinMode is Parallel) does not introduce precedence constraints between the 2 tasks.

Conclusion

This post has looked at using Biml to create packages in SSIS. It’s a pretty simple example, but we can see how the XML biml can be used to automatically create (and recreate) packages. If we wanted to change a property (say for example the SQL syntax), we could just change it in the code and regenerate the packages.

In later posts, we’ll extend this functional knowledge for more complicated scenarios.

 

 

SSIS 2012 Execution Reporting

SSIS 2012 has (arguably) two main improvements over prior versions. The first is a project deployment model where an entire project can be deployed and managed with a common structure. The second is a more favourable logging mechanism. This post looks at the latter and examines reporting on package and job execution in SSIS 2012.

There are standard inbuilt reports that show the execution of the project (and individual packages within the project). These can be seen in SQL Server Management Studio (SSMS) by right clicking on the SSIS catalogue (or node) and selecting the “All Executions” report from the reports menu (as below). Note that although you can get to the report through the path Reports > All Executions or Reports > Standard Reports > All Executions, the reports are the same (and shown below).

Accessing the Execution Reports through SSMS
Sample Report Output

One of the really great features of this report is that you can click on the Overview hyperlink and see what packages have executed as part of the (node) execution. So, for example, the packages that have executed under my ‘Catalog Testing’ (project) execution are listed since this is the node I selected (when I selected overview from the ‘All Executions’ report above. An example of this output is seen below. You may note that the overview includes the tasks that have executed within the package.

Overview Report (accessed by the Overview link in the ‘All Executions’ report)

But Wait – There’s More!

While these reports are good, they require SSMS to view the reports, so you might want to interrogate the SSIS database directly. Using this you can create some custom reports and expose these as normal SSRS reports.

Two views that are invaluable for examining package executions (and task executions) are catalog.executions
and catalog.executable_statistics. From MSDN the executions view shows “instances package execution in the Integration Services Catalogue”, therefore, we can use this to show (and identify the execution_id) of packages when execute. Consider the “All Executions” report above which shows the execution_id of 10275. This can be easily identified in the executions view with the query;

select  *
from catalog.executions
where execution_id = 10275

More importantly we can easily identify executions of packages (or groups of packages) by referring to the package_name field in this table.

Once we have the execution_id, we can then use this in the statistics view to see the detailed execution of the package. This is the same data that is shown in the ‘Overview’ report above.

select  *
from catalog.executable_statistics
where execution_id = 10275

Most of my projects utilise a parent/child architecture, so when I consider the natural drill path for package executions as the master_package à child_package à tasks. If I want to only show the packages that have executed as part of the Master package (note Master_1.dtsx package name in the executions above) I can simply use the following query;

select e.*
, CONVERT(datetimees.start_timeAS start_time
CONVERT(datetimees.end_timeAS end_time
es.execution_duration es.statistics_id
es.execution_result
case es.execution_result
when 0 then ‘Success’
when 1 then ‘Failure’
     when 2 then ‘Completion’
    when 3 then ‘Cancelled’
  end  as execution_result_description
from catalog.executables e
join catalog.executable_statistics es
on  e.executable_id = es.executable_id
 and e.execution_id = es.execution_id
where package_path  ‘\Package’
and  e.execution_id = 10275

A Word of Caution

while they SSISDB database can be used to provide some nice reporting, the executions view contain some inbuilt security which implements row security. If we examine the definition of the view we can see that the where clause includes the following conditions.

WHERE  opers.[operation_id] in  (SELECT id  FROM  [internal].current_user_readable_operations])
OR  (IS_MEMBER(‘ssis_admin’ = 1)
OR  (IS_SRVROLEMEMBER(‘sysadmin’= 1)

The credential for accessing the database should be a member of ssis_amdin or sysadmin role. If not, no rows are returned.

I will leave it to your own devices to come up with alternate methods for bypassing this requirement.


SSIS – Dynamically Adding Package Configurations

For my ETL design, most activity (or real work) is conducted through child packages that are controlled through master packages. This gives us a good amount of flexibility to pass variables (like audit keys and configuration file names) to the child from the master.

However, creating and maintaining the configurations in the child can be laborious. The UI is ok for a small number of changes but for large numbers, a programmatic approach is better. This post looks at managing configurations through code.

Assemblies and Constructs

The configuration class belongs to the package. This is automatically included in SSIS scripting tasks. If the code is simply a C# program the assembly the Dts.Runtime assembly is Microsoft.SQLServer.ManagedDTS.dll

And the package can be referenced in the following manner

Application _a = new
Application();

Package _p = _a.LoadPackage(@”Your Package File Name”, null);

Configuration Class

The package must be enabled for configurations. Simply set the EnableConfigurations property to true

_p.EnableConfigurations = true;

Adding a Configuration

A new configuration can be added with the .Add method. The new configuration is added at the end (becomes the last) and therefore we can refer to it by its ordinal. All new properties can be maintained in this manner.

To add a new configuration (config_file) to the package which assigns the package variable (config_file) the value of the parent packages config_file variable, we would use the following code;

_p.Configurations.Add();
_p.Configurations[_p.Configurations.Count – 1].Name = “config_file”;
_p.Configurations[_p.Configurations.Count – 1].ConfigurationType = DTSConfigurationType.ParentVariable;
_p.Configurations[_p.Configurations.Count – 1].ConfigurationString = “config_file”;
_p.Configurations[_p.Configurations.Count – 1].PackagePath = “\\Package.Variables[User::config_file].Properties[Value]”;
_p.Configurations[_p.Configurations.Count – 1].Description = “Inherit config_file from master”;

Iteration and Deletion

The configuration object is innumerable. Therefore we can iterate over existing package configurations using an iterator (as below). Similarly, we could iterate by ordinal (using its index position). To remove the configuration, simply use the remove method.

foreach (Configuration _c in _p.Configurations)
{
_p.Configurations.Remove(_c);
}

Save the Package

The package must be saved before any changes made through code are reflected in bids.

_a.SaveToXml(@”Your package file name”, _p, null);

Bringing it Together

This code can be used in a package to append a variable to a package and add its configuration. I simply used a foreach container with a script task and the following code (note the variable file_name is the fully qualified file name passed from the container);

string _file_name = Dts.Variables[“file_name”].Value.ToString();
string _file = System.IO.Path.GetFileName(_file_name);
_file = _file.Substring(0, _file.Length – 5);

Microsoft.SqlServer.Dts.Runtime.Application _a = new Microsoft.SqlServer.Dts.Runtime.Application();

Package _p = _a.LoadPackage(_file_name, null);
_p.Variables.Add(“config_file”, false, _file, “”);
_p.EnableConfigurations = true;
_p.Configurations.Add();
_p.Configurations[_p.Configurations.Count – 1].Name = “config_file”;
_p.Configurations[_p.Configurations.Count – 1].ConfigurationType = DTSConfigurationType.ParentVariable;
_p.Configurations[_p.Configurations.Count – 1].ConfigurationString = “config_file”;
_p.Configurations[_p.Configurations.Count – 1].PackagePath = \\Package.Variables[User::config_file].Properties[Value];
_p.Configurations[_p.Configurations.Count – 1].Description = “Inherit config_file from master”;
_a.SaveToXml(_file_name, _p, null);
Dts.TaskResult = (int)ScriptResults.Success;


Real Time ETL with SSIS and WMI

Often when we design ETL packages we assume that there is a hard start point, or a specific part of the day when the ETL Process can begin. However, in a distributed environment, remote machines may exchange data with a host at random intervals and it may be this exchange that should dictate the start of an ETL process. For example we may want to start our ETL when files are copied to an FTP directory. This posts looks at how to monitor a directory location using WMI (Window Management Instrumentation) and start ETL for this type of scenario.

Overview

Remote systems are often disconnected from their host. That is, the remote site retains its own data independently of a continuous connection to the host. In order to sync data between the two sites, the remote systems extract master data and transfer files back to the host. These transfers occur when the remote site needs to operate independently of the host.

Data transferred back to the host may be triggered by some event at the remote site (for example closing the daily register at the remote). The implication of this is that data arrives at the host at different and unknown times. If we wish to provide a low latency ETL at the host, we need to load this data as soon as it becomes available on the host (that is, as soon as the files are created).

A Brute Force Approach

One method of managing this is to cycle our ETL between our expected load times. For example, we can schedule a job to run at 5 minute intervals while we expect the files to be deposited. This will achieve low latency but is restricted to a latency determined by the execution interval. For example, if we run the job every five minutes, we can expect a latency of at least 5+ minutes.

One consideration of using this approach lies in the execution history for the job. Because each job executes in its own right, the history for the job shows each execution (that is, one for every 5 minutes as below).

Reviewing the log for the job may be troublesome where we expect or desire to see a single job for each day.

A WMI Approach

Another approach is to execute the package and have it listen for changes in the target directory (for want of a better word), then the package commences the lifting component of the ETL after those changes have occurred (that is the files are created). In this situation, the package is running but effectively dormant until a file is created. Furthermore, we can get a much finer polling interval than would be achieved with the Brute Force approach. An example of this type of approach can be seen in the control flow below.

The key consideration for this package is that it executes and continuously and polls the source directory until the WMI query is satisfied (and files have been created). Once this occurs, the package can then iterate over the created files and conduct the expected ETL (which in this case is just moves the file and deletes it).

The WMI Task

The msdn reference for the WMI Task can be found here. For our purposes, we are really only concerned with the ActionAtEvent and WqlQuery components. A zero time out dictates that the task will not time out however we could use this to control how long we listen for and choose an appropriate time out action and event (log the event or log and fire the package).

The Connection Manger simply specifies the server and namespace;

With the query defined as;

SELECT * FROM __InstanceCreationEvent WITHIN 10

WHERE TargetInstance ISA “CIM_DirectoryContainsFile”

and TargetInstance.GroupComponent= “Win32_Directory.Name=\”E:\\\\Blog\\\\Process Folder\””

 

Also note that the escape characters (ie the ‘\’ preceding a special character) are required in the directory name. The WMI reference can be found here.

Query polling is defined by the WITHIN clause and indicates the number of seconds to wait before the query is run again (10 seconds in this case).

 

Extracting & Managing SSIS Connection Information

I have recently been involved in a large project that created a large number of SSIS packages which in turn lead to an extremely large number of connection managers. In BIDS this is unmanageable! True, SSIS does allows the creation of a shared data sources which can in turn be added to a package (essentially creating a project data source), however, the use of these data sources does have some drawbacks. Firstly, when the package is saved; it saves the connection string associated with the data source. When the data source is changed, the package retains the old connection until the package is reopened in BIDS. Secondly, the connection is lost when the package is published (to the file system). Furthermore, there is no convenient method of obtaining information about the connections amongst the packages. To manage the connections, I created a small app that has the following features.

The APP

The lists all packages I in a directory (listed in the right pane). In doing so it builds a library of all connections in the packages. So (for example), if we chose a connection type, only the connections that have that type (and the packages that have those connections) are retained in the connection and package list box.

If we then chose a connection, only the packages that use that connection are highlighted. Because I was looking for consistency in the connection strings, I show the first package connection string and identify if there are any different connection strings amongst the other packages (by highlighted by a salmon colour).

Since the goal is to manage the connections I’ve also included a ‘New Connection String’ change mechanism, a check to evaluate expressions (as connection strings) and the ability to output a csv report for all connections and packages in the library.

Default Load
Identifying packages using connection by type and name

Code Basics

The principle part of the code revolves around the ManagedDTS library which can be found in the following location / dll ‘C:\Program Files (x86)\Microsoft SQL Server\100\SDK\Assemblies\Microsoft.SQLServer.ManagedDTS.dll’

This assembly allows all SSIS (or dts) management including the creation, alteration and control flow (see http://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dts.runtime.aspx ).

In order to alter an existing package we need to create an Application instance and use it to load the package. The following code achieves this;

Microsoft.SqlServer.Dts.Runtime.Application _app = new Microsoft.SqlServer.Dts.Runtime.Application();
Package _p = _app.LoadPackage(this_is_my_file_name, null);

The connection managers within the package can then be managed with the connectionmanager class (for example iterating over them by)’

foreach (ConnectionManager _c in _p.Connections)
{}

or by referring to a specific connection through the connectionmanager class and specifying its index, identity or name;

ConnectionManager _c = _p.Connections[“other_oledb_1”];
ConnectionManager _c = _p.Connections[0];

Finally, we can use the properties / methods to determine the connections type, connection string, expression (connection string expression);

_c.ConnectionString; // connection string
_c.CreationName; // type
_c.GetExpression(“ConnectionString”);     //expression

Setting properties is relatively easy (just by setting the property value). The only exception to this is setting the connection string expression (which required a call of the .SetExpression method);

_c.ConnectionString = connection_string;
_c.SetExpression(“ConnectionString”, expression_value);

The interesting thing I found for this was that when the connection manager has a connection string expression attached to it, the connection string will not be changed unless the expression is changed. Then when the expresssion is changed, the connection string also changes. I thought this was a little weired.

Configuration Files

There may have been scope in the project to use configuration files however, the auditing requirement and the use of child package execution causes a failure within the child package before the configuration was applied. That is, there are some audit events that are fired before the connection can be changed and where these would otherwise err, the child package fails.

Code

I am happy to share this project with anyone that wants it. Just drop me a line and I’ll send it through (subject to the standard disclaimer!)