Biml XIII – Oledb Command Part II – Return Variables

In our prior post, we looked at the Oledb Command, a data flow transform that fires a statement for each row in the pipelines dataset. That post, we used generic parameters that were defined in our query. For example our query was;

insert into [dbo].[customers]
(customer_code, first_name, last_name)
 values
  (?, ?, ?);

 

The transform allows us more flexibility and we will look at some of that now. We will look at the execution of a stored procedure (and the variation in parameter specification) and the usage of return parameters (so that values are appended to the dataset as it flow through the buffer).

This post is broken into two components. Firstly, we quickly examine the biml just for the transform – I’m assuming that you can follow based on the prior post. Secondly, we present a more complete example so a full solution can be worked through.

The Task

Often I can delegate logic to a stored procedure, for example, the generation of audit keys. For example (and very simply), we could take the example of inserting a record (see this post), use a stored procedure to insert the data and then return information about that process. Since we are working with customers, let’s just use a procedure to insert a record and return the first letter of the customer’s name. A procedure would look like this;

Create proc [dbo].[customer_udpate]
(
	@customer_code nvarchar(15)
	, @first_name nvarchar(50)
	, @last_name nvarchar(50)
	, @last_name_first_char nvarchar(1) output
)
as 
begin

	set nocount on;

	insert into [dbo].[customers](customer_code, first_name, last_name)
	values(@customer_code, @first_name, @last_name);

	set @last_name_first_char = LEFT(@last_name,1); 

end 

If we were to execute that through an SQL Script we use some code like (with the output below);

declare @customer_code nvarchar(15)='AW00011000';
declare @first_name nvarchar(50)='Jon';
declare @last_name nvarchar(50)='Yang';
declare @last_name_first_char nvarchar(1);

exec [dbo].[customer_udpate]
	@customer_code = @customer_code
	, @first_name = @first_name 
	, @last_name = @last_name
	, @last_name_first_char = @last_name_first_char output ;

select @last_name_first_char as last_name_first_char ;

This is simple for a single SQL query, but what about in SSIS? Remember for our situation, that the oledb command runs for each row that enters the command (each row in the buffer) and we want to append the output value to our dataset (so that it can be consumed later in the dataflow). So how does our biml for the oledb command look?

<OleDbCommand Name="insert_customer_dynamic" ConnectionName="dest_stage">
	<DirectInput>
		exec dbo.customer_udpate @customer_code=?, @first_name=?, @last_name=?, @last_name_first_char=? output;
	</DirectInput>
	<Parameters>
		<Parameter SourceColumn="customer_code" TargetColumn="@customer_code" DataType="String" Length="15" />
		<Parameter SourceColumn="first_name" TargetColumn="@first_name" DataType="String" Length="50" />
		<Parameter SourceColumn="last_name" TargetColumn="@last_name" DataType="String" Length="50" />
		<Parameter SourceColumn="last_name_first_char" TargetColumn="@last_name_first_char" DataType="String" Length="1" Direction="Output" />
	</Parameters>
</OleDbCommand>
 

Here we can see a few differences since our last implementation.

  • This is given but query has changed and matches the SQL command (note the output clause for the return parameters)
  • TargetColums (for parameters) a match by name rather than the generic Param_n (as in the prior example)
  • The output parameter includes an additional Direction attribute (specifying direction out rather than the default input value)

It should also be apparent that the dataset entering the transform needs to put the return value in a column. That is, we need to append a column to the dataset to store the return value. This brings us to our ‘full’ implementation.

The Full Implementation

If we continue with the name of the prior post, and the stored procedure shown in this post, we can derive a business case by which we want to achieve the following outcomes;

  1. source the customer records
  2. fired the procedure (to insert them into the staging table)
  3. record the procedures return result (even if it is just a simple value)
  4. record the outcome of the entire load (that is, the results for each record loaded)

Finished package (dataflow) is shown in the following image below. Note that we include a derived column(s) that store the return values from our procedures in the dataset as it flows to the destination.

The complete biml for this below. There are no extraordinary considerations that need further explanation other than perhaps the choice of posting the record into the same destination table twice (ie [dbo].[customers]). The only reason this was done was for simplicity and demonstration purposes. If we queried a single customer we could see their entry and audit record in the same table. Certainly not a production solution though.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
	<Connections>
		<OleDbConnection Name="dest_stage" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=stage" />
		<OleDbConnection Name="src_adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2008R2" />
  </Connections>
	
	<Packages>
    <Package Name="oledb_cmd_02" ConstraintMode="Linear">
			<Tasks>
			
				<Dataflow Name="oledb_cmd">
					<Transformations>
						<OleDbSource Name="src_customer" ConnectionName="src_adventure_works">
							<DirectInput>
								select 
								  CustomerAlternateKey as customer_code
								  , FirstName as first_name
								  , LastName as last_name
								from dbo.DimCustomer; 
                            </DirectInput>
                        </OleDbSource>
						
						
						<DerivedColumns Name="add_first_name_place_holder">
							<Columns>
								<Column Name="last_name_first_char" DataType="String" Length="1">NULL(DT_WSTR, 1)</Column>
              </Columns>
            </DerivedColumns>
				
						
						<OleDbCommand Name="insert_customer_dynamic" ConnectionName="dest_stage">
							<DirectInput>
								exec dbo.customer_udpate @customer_code=?, @first_name=?, @last_name=?, @last_name_first_char=? output;
                            </DirectInput>
							<Parameters>
								<Parameter SourceColumn="customer_code" TargetColumn="@customer_code" DataType="String" Length="15" />
								<Parameter SourceColumn="first_name" TargetColumn="@first_name" DataType="String" Length="50" />
								<Parameter SourceColumn="last_name" TargetColumn="@last_name" DataType="String" Length="50" />
								<Parameter SourceColumn="last_name_first_char" TargetColumn="@last_name_first_char" DataType="String" Length="1" Direction="Output" />
              </Parameters>
            </OleDbCommand>

						<OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
							<ExternalTableOutput Table="[dbo].[customers]" />
            </OleDbDestination>
						
          </Transformations>
        </Dataflow>
      </Tasks>
		 </Package>
    </Packages>
</Biml>

Biml XII – Oledb Command (Dataflow transformation)

The data flows OLEDB Command executes an SQL statement for each row of the data flow. A good example of its use is the update of expired records end dates for type 2 changed dimensions. I’ve briefly discussed the logic for roll your own type 2 (changing) attributes here with the need to update expired records at point 3.3. In that post, we looked at the conditional split (the biml for its basic functionality) in a very simple example. The main point to reiterate for the OLEDB command is that a command executes for each row of the transforms input and (naturally) we have access to each columns value in that row.

As a sidebar to the use of the OLEDB Command (and the overhead of executing a statement for every row in the transforms input), an alternative method of update is to dump the input into a temporary table and then execute a single update command against the dimension table using the temporary table.

In this post we look at the biml configuration for using the oledb command to inserting rows into a table. We’ll keep the focus on customers (code, first name and last name). The completed dataflow will look like the following. The update syntax is easily based on this example.

The Biml

The package only contains two transforms – the source and the oledb command (acting as a destination). As we would expect, the oledb command requires three basic settings. These are ;

Requirement

Comment & Notes

Values for Name and Connection

All tasks and transforms require a name. Since we’re executing an SQL statement, we need the OLEDB connection.

Since they apply to the entire transformation, their attributes of the tasks root definition (ie attributes of the OledbCommand node).

The Statement to Execute

This is managed by the <DirectInput> tag’s text

Note that parameters are introduced to the statement using the ? symbol

Parameters specifications that map the input columns to command parameters

Managed by the transform Parameters node (grouping each parameter).

Each parameter is defined by the its source column, and target parameter position and a data type.

Note the parameter position (the order by which it appears in the SQL statement) is a zero based value (its index) and suffixed to the Param_ target column.

In defining the data type, only its type is required. For example, we could define the parameter as only being a String type. Where precision is not specified (that is, the length), the package will be build and execute however it will warn that columns are out of synchronisation.

 

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

  <Connections>
		<OleDbConnection Name="src_adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2008R2" />
		<OleDbConnection Name="dest_stage" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=stage" />
    </Connections>
	
    <Packages>
      <Package Name="oledb_cmd" ConstraintMode="Linear">
			<Tasks>
				<Dataflow Name="oledb_cmd">
					<Transformations>
						<OleDbSource Name="src_customer" ConnectionName="src_adventure_works">
							<DirectInput>
								select 
								  CustomerAlternateKey as customer_code
								  , FirstName as first_name
								  , LastName as last_name
								from dbo.DimCustomer; 
              </DirectInput>
            </OleDbSource>
						
						<OleDbCommand Name="insert_customer_dynamic" ConnectionName="dest_stage">
							<DirectInput>
								insert into [dbo].[customers]
								(customer_code, first_name, last_name)
								values
								(?, ?, ?);
              </DirectInput>
							<Parameters>
								<!--
									note that you do not have to specify string length 
									if not, the package will warn for column synchronization 
								-->							
								<Parameter SourceColumn="customer_code" TargetColumn="Param_0" DataType="String" Length="15" />
								<Parameter SourceColumn="first_name" TargetColumn="Param_1" DataType="String" Length="50" />
								<Parameter SourceColumn="last_name" TargetColumn="Param_2" DataType="String" Length="50" />
               </Parameters>
              </OleDbCommand>
						
             </Transformations>
            </Dataflow>
      </Tasks>
		</Package>
		 
    </Packages>
</Biml>



 

Biml XI – Conditional Splits in Data Flows

At first thought, I don’t see a huge need for conditional splits in SSIS, separating the stream (as it were) is something that I usually try and do at the source. That is, we should only bring in as much data as is required by the transform. However, in thinking about it, the transform is an absolute necessity when you create your own SCD (Slowly Changing Dimension) logic. Rather than the native SSIS SCD component which really doesn’t offer too much flexibility, building your own can offer huge performance improvements and complete control over the package execution.

The basics of a role your own SCD component work like this;

  1. the dimension table includes hash column which stores the hash for changing attributes
  2. the ETL package builds a view of the dimension and in doing so calculates the hash for the changing attributes
  3. the package looks the hash from the dimension table and then splits the data into three conditional streams;
    1. If there is no hash, the record doesn’t exist and must be added.
    2. If there is a hash and it is not the same as the ETL hash, the changing attributes have changed (from the dimension table) and the new record must be inserted (with the previous being expired).
    3. Finally, if the two hash values match, the changing attributes have not changed and no changes required to the dimension data.

It’s easy to see here why the split component is needed.

The Split in Biml

To demonstrate how the split works in Biml, we’ll continue with using customer names from adventure works. Our purpose is simply to create different outputs based on the customer last name. That is, if the customer’s last name begins with A they should be sent to an output, B to another and the remaining going to the default (or catch all) output. The executed package would appear as below. Note counts coming into the buffer (18,484) are split between A records (842), B records (1,061) and everything else (16,581).

The Biml for this package is shown below. Note that the outputs for the split (see the ConditionalSplit
tag) are defined by an OutputPath
(which is of course contained in the OutputPaths
root node of the transform). The condition for the split is defined in the Expression
nodes text. There is no requirement to define the default output.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="src_adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2008R2"/>
    <OleDbConnection Name="dest_stage" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=stage"/>
  </Connections>
  <Packages>
    <Package Name="split_transform" ConstraintMode="Linear">
      <Tasks>
        <Dataflow Name="split_transform_01">
          <Transformations>
            <OleDbSource Name="src_names" ConnectionName="src_adventure_works">
              <DirectInput>
                select
                [CustomerAlternateKey]  as customer_code
                , [FirstName] as first_name
                , [LastName] as last_name
                from dbo.DimCustomer
              </DirectInput>
            </OleDbSource>

            <ConditionalSplit Name="split_name">
              <OutputPaths>
                <OutputPath Name="Name_A" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="A"</Expression>
                </OutputPath>
                <OutputPath Name="Name_B" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="B"</Expression>
                </OutputPath>
              </OutputPaths>
            </ConditionalSplit>

            <OleDbDestination Name="Customers_Default" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Default" />
              <ExternalTableOutput Table="[dbo].[customers]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_A" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_A" />
              <ExternalTableOutput Table="[dbo].[customers_A]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_B" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_B" />
              <ExternalTableOutput Table="[dbo].[customers_B]" />
            </OleDbDestination>
            
            
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

The XML Gotcha

If you’ve used SSIS for a while, you’ll probably have forgotten the pain that you had learning that expressions are more C# based rather than formula. For example, to use an equal in a formula you’d use == rather than =. When we work with XML (html or any markup type language), the nature of the syntax may cause some issues with special characters are required. For example, XML is heavily reliant on the characters <> for tags, so how can we use those characters in a formula without the engines interpreter mistaking the symbol for a tag?

For example what would happen if we wanted to have an output like last name being from A <= M. In this case we have to substitute the special character with a code (as is exactly what happens in HTML). The snippet below shows how would achieve this by using the character codes for the symbols >, & and <. Our goal here is to use split names A-K and M-Z.

<ConditionalSplit Name="split_name">
  <OutputPaths>
	<OutputPath Name="Name_A" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="A" &amp;&amp; LEFT(last_name,1)&lt;"M" </Expression>
	</OutputPath>
	<OutputPath Name="Name_M" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="M"</Expression>
	</OutputPath>
  </OutputPaths>
</ConditionalSplit>


 

Biml X – Derived Columns

This post looks at the biml requirements and functionality of derived columns – how to apply logic to a column (replace a value) and add new columns. The scenario is pretty simple and builds on our other concepts of but now includes a little more ‘business logic’ (albeit simple) so we can look at a more typical package.

In this scenario, we start with some territory keys. Of course we could include more data but we want to keep it simple. Based on those keys, we have to look up the territory name. As luck would have it, not all territories have names and if there’s no valid name (that is, no lookup value found), we would like to assign a default value (“Unknown”). Also, we would like to keep a copy of the territories original name so we will store that as well. We will be using AdventureWorks2014 as our source database and staging as the destination (called stage).

Setup

The only setup requirements are the destination table (dbo.Territory). This can be added to the staging database with the following snippet;

create table dbo.Territory
(
	TerritoryID Int not null 
	, TerritoryName nvarchar(50) not null 
	, TerritoryNameUnadjusted nvarchar(50)
)

 

Also, we will show the completed package first – this gives an indication of the logic that we are trying to achieve with biml.

The Biml

The full biml is given in the snippet below. Since we have covered the general structure, sources, destinations and lookups in other posts, we will concentrate only on the biml for the derived columns transform.

There are a few things about the tag to note. Firstly, a column change (to the buffer) is managed through the column tag. This is a child of a columns parent which is a child of the derivedcolumns node (sounds obvious but it’s worth a call out). Perhaps this is best explained by examining the configuration for a derived column as in the image below. As we add (or change) more columns in the transform, we add more rows in the grid and (therefore) additional column tags would be added to our biml. Since XML requires containers, the columns tag is used to group these in biml.

Secondly, derivation as a new column or replacement column is handled by the ReplaceExisting property of the column tag. By default the new column is assumed to be a new column. Additionally (and related to column replacement), the DataType and Length of the column are required attributes but have no effect when the column is being used to replace an existing column. Here, the original type is maintained.

Finally, the formula is applied as text between the column tags (opening and closing tag)

The Package Biml

The Biml for this package is;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="src_oledb" ConnectionString="Provider=SQLOLEDB.1;Data Source=.\SQL2012;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;"/>
    <OleDbConnection Name="dest_oledb" ConnectionString="Provider=SQLOLEDB.1;Data Source=.\SQL2012;Initial Catalog=stage;Integrated Security=SSPI;"/>
  </Connections>

    
  <Packages>
    <Package Name ="data_conversion" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" >
      <Tasks>
        <Dataflow Name="data_conversion_01">
          <Transformations>
             
            <OleDbSource Name="src_region" ConnectionName="src_oledb" >
              <DirectInput>
                SELECT [TerritoryID] FROM [Sales].[SalesTerritory]
              </DirectInput>
            </OleDbSource>

            <Lookup Name="lookup_region_name" OleDbConnectionName="src_oledb" NoMatchBehavior="IgnoreFailure">
              <DirectInput>SELECT top 5 TerritoryID, Name  FROM [Sales].[SalesTerritory] Order By TerritoryId</DirectInput>
              <Inputs>
                <Column SourceColumn="TerritoryID"/>
              </Inputs>
              <Outputs>
                <Column SourceColumn="Name" TargetColumn="TerritoryNameUnadjusted"/>
              </Outputs>
            </Lookup>

            <DerivedColumns Name="duplicate_territory_name">
              <Columns>
                <Column Name="TerritoryName" DataType="String" Length="50">
                  TerritoryNameUnadjusted
                </Column>
              </Columns>
            </DerivedColumns>
            
            
            <DerivedColumns Name="validate_territory_name">
              <Columns>
                <!-- the DataType and Length are required however they have no effect for ReplaceExisting=true -->
                <Column ReplaceExisting="true" Name="TerritoryName" DataType="String" Length="100"  >
                  (ISNULL(TerritoryName)) ? "Unknown" : TerritoryName
                </Column>
              </Columns>
            </DerivedColumns>
   
            <OleDbDestination Name="dest_territory" ConnectionName ="dest_oledb">
            <ExternalTableOutput Table="dbo.Territory" />
            </OleDbDestination>
             
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>  
  </Packages>
</Biml>

The output (below) shows the derived column has replaced the TerritoryName with Unknown where it has been previously null.

Biml IX – The Lookup

Our previous posts biml data flows have only included extractions and destinations. We have looked at moving data (the relation way) here and error configuration here. These posts have set the framework for discussing other dataflow transformations that are commonly used. Perhaps the most common of these is the lookup that is the focus of this post. We look at biml requirements for three configurations;

  • a bare bones implementation
  • customisation for differing input column names (ie looking up based on different columns) and
  • finally handling lookup errors

The Environment

As with the previous example, well focus on something simple and continue with the customer theme. Our goal is therefore, provided with a list of customer codes (the field CustomerAlternateKey in DimCustomer), we will use biml to create the transformations look up their first and last name and then populate our staging table (dbo.customer). That staging table can be created with the following sql and unlike our prior posts, we will not enforce any constraints on it.

create table dbo.customer
(
    CustomerAlternateKey nvarchar(25) NOT NULL,
    FirstName nvarchar(50),
    LastName nvarchar(50)
);

Our proforma script to looks like the one below. This includes connections, the package (container), dataflow, source and destination components with the package and dataflow. You’ll also note that we have also left a place holder for the lookup.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

  <Connections>
   <OleDbConnection Name="src_adventure_works_oledb" ConnectionString="…" />
    <OleDbConnection Name="dest_stage" ConnectionString="…" />
  </Connections>

  <Packages>
    <Package Name="lookup" ConstraintMode="Linear">
      <Tasks>

        <Dataflow Name="dft_customer_01">
          <Transformations>

            <OdbcSource Name="src_customer" Connection="src_adventure_works_odbc">
              <DirectInput>select CustomerAlternateKey from dbo.DimCustomer;</DirectInput>
            </OdbcSource>
			
           .... some lookup ....
            
        <OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

          
</Transformations>
        </Dataflow>
	   
      </Tasks>
    </Package>
  </Packages>
</Biml>

Part 1 – Bare Bones

If we were to think about how the lookup transformation works, we can determine the minimum requirements for the lookup. There is an input to the component as a base, then we would specify the

  • source for our lookup (by data source and query)
  • how the two dataset relate to each other (matching field)
  • the columns (from the lookup data set) that are to be appended to the original dataset.

This is not complex and we would also expect the column matching (and outputs) to look like the image below. Note the match is on CustomerAlternateKey and the FirstName and LastName fields are returned.

The SQL query to lookup values (that is provide the lookup table) is defined as;

Select
CustomerAlternateKey
, FirstName
, LastName
from dbo.DimCustomer;

When we look at the biml required for this, we can see these minimal set of options are defined as child nodes of the lookup task. That is

  • The Lookup tag specifies the data source to use as an attribute and initialises the transform.
  • The source query is specified via the DirectInput tag
  • The Inputs tag specifies what fields to match on.
  • The Outputs tag specifies what column are returned from the lookup table (or lookup query)

<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb" >
  <DirectInput>
	Select 	CustomerAlternateKey , FirstName, LastName	from dbo.DimCustomer;
  </DirectInput>
  <Inputs>
	<Column SourceColumn="CustomerAlternateKey" />
  </Inputs>
  <Outputs>
	<Column SourceColumn="FirstName" />
	<Column SourceColumn="LastName" />
  </Outputs>
</Lookup>

Now may also be a good sedge way to remember how the lookup works and note that it is not a join. The data buffer flows into the transform and appends columns based on the lookup. Now, by default, the transform will fail if there is no match found however, the component is no a join.

Part 2 – Matching Column Names

In Part 1 (above), data is looked up on an auto match based on column names (Note the Columns node of the Inputs tag). However, commonly, the base column names and lookup column names are not named the same. For example, consider our input to the transform specified CustomerAlternateKey as code_lhs and the query defining the lookup code as code_rhs.

If this is the case, we must explicitly specify the Source and Target column names for the Inputs tag. The OledbSource and Lookup for this situation would be defined as in the biml below;

<OleDbSource  Name="src_customer" ConnectionName="src_adventure_works_oledb">
	<DirectInput>select CustomerAlternateKey as code_lhs from dbo.DimCustomer;</DirectInput>
</OleDbSource>            

<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb">
    <DirectInput>
		select
			CustomerAlternateKey as code_rhs
            , FirstName
            , LastName
        from dbo.DimCustomer;
    </DirectInput>
    <!-- input columns are always included in the output, if the match does not occur on name, we specify the columns -->
    <Inputs>
		<Column SourceColumn="code_lhs" TargetColumn="code_rhs" />
	</Inputs>
		<Outputs>
			<Column SourceColumn="FirstName" />
            		<Column SourceColumn="LastName" />
		</Outputs>
</Lookup>

<OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	<ExternalTableOutput Table="[dbo].[customer]" />
    <Columns>
		<Column SourceColumn="code_lhs" TargetColumn="CustomerAlternateKey" />
    </Columns>
</OleDbDestination>

Note also that (since the buffer or base table) defines the CustomerAlternateKey as code_lhs, there is no auto mapping for the data destination and therefore we must explicitly specify the mapping.

It should also be apparent that, if the lookup is based on a composite key (or more than one column), each column should be expressed as a column.

Part 3 – Configuring Match Behaviour

The most common configuration for the lookup component is what to do in the case of lookup failures (that is, what happens to the components output and operation when a match is not found for an input row). The default behaviour is to fail the task however, there are several other options which include;

  • Redirect the unmatched records to a new output (NoMatch)
  • Ignore the issue and pass null values as the values for lookup fields (of course there only apply to the unmatched rows)
  • Redirect the unmatched records to the Error output

These configurations are handled in a very similar manner and rely on the NoMatchBehaviour property for the Lookup tag. What is important is how the output buffers are used. That is, if you specify that match failures be sent to the Error, you should specify a destination that consumes it (note the InputPath specification for each destination). I’ve used the word should because such an oversight will not through an error in the package (creation or execution) but rows sent to matched buffers will be lost.

The Biml for this data flow (redirecting the Error to a no match buffer is shown below). Also note that we will force rows to that buffer since our lookup table is restricted to 100 rows.

<Dataflow Name="dft_customer_03">
  <Transformations>
	<OleDbSource  Name="src_customer" ConnectionName="src_adventure_works_oledb">
	  <DirectInput>select CustomerAlternateKey from dbo.DimCustomer;</DirectInput>
	</OleDbSource>
	<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb" NoMatchBehavior="RedirectRowsToNoMatchOutput">
	  <!-- other NoMatch Options {IgnoreFailure, RedirectRowsToErrorOutput, FailComponent, RedirectRowsToNoMatchOutput} -->
	  <DirectInput>
		select top 100 
		CustomerAlternateKey
		, FirstName
		, LastName
		from dbo.DimCustomer;
	  </DirectInput>
	  <Inputs>
		<Column SourceColumn="CustomerAlternateKey" />
	  </Inputs>
	  <Outputs>
		<Column SourceColumn="FirstName" />
		<Column SourceColumn="LastName" />
	  </Outputs>
	</Lookup>
	
<OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	  <InputPath OutputPathName="lookup_names.Match" />
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

	<OleDbDestination Name="dest_customer_errors" ConnectionName="dest_stage">
	  <InputPath OutputPathName="lookup_names.NoMatch" />
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

  </Transformations>
</Dataflow>

You’ll also note that the error destination is physically the same as the match (success) destination. This is for illustration purposes only (and some succinct database design). The execution for this is shown below;


Conclusion

This post has looked at various biml snippets for the lookup task.

 

BIML VIII – Managing Errors in Data Destinations (ADO Destination)

In a previous post we looked at the biml required to send data to an ODBC destination. Relational destinations are the bread and butter of SSIS (well the work I do anyway) and one of the most common ways to manage insert errors is to redirect the dataflow two or more forgiving destination. This post will look at how to achieve that.

Environment

The example that were going to use is pretty contrived – we will send some sample data through to a table with a primary key. Initially the table is empty and we will expect all the data load. A second data flow will then try to add all records and what we expect to find is that the number of records originally inserted will be rejected into an errors table.

It’s pretty simple example where I am relying on the primary key (and duplicates) to raise the error on the second load. Since my tables are initially empty, I know the number of errors that I am expecting (ie the number of records loaded in the first load). The SQL to create the two tables is;

create table dbo.customer 
(
	CustomerAlternateKey nvarchar(25) not null primary key 
	, FirstName nvarchar(50)
	, LastName nvarchar(50)

);
create table dbo.customer_errors 
(
	CustomerAlternateKey nvarchar(25)   
	, FirstName nvarchar(50)
	, LastName nvarchar(50)

);

The Biml

We’ll start off with the biml we left off in the previous post. There are a few small changes. You’ll notice that we’ve relying on auto-mapping of column names, are using an ado.net destination, selecting 100 customer records (source) and have changed a few names.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OdbcConnection Name="src_adventure_works" ConnectionString="Driver={SQL Server Native Client 11.0};Server=.;Database=AdventureWorksDW2012;Trusted_Connection=yes;" />
    <AdoNetConnection Name="dest_stage_ado" ConnectionString="Data Source=.;Initial Catalog=stage;Integrated Security=true;" Provider="System.Data.SqlClient" />
  </Connections>
    <Packages>
      <Package Name="copy_customer_with_error" ConstraintMode="Linear">
        <Tasks>
          <Dataflow Name="dft_customer_01">
            <Transformations>
              <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
                <DirectInput>
                  select top 100 CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
                </DirectInput>
              </OdbcSource>
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" >
                <ExternalTableOutput Table="[dbo].[customer]" />
              </AdoNetDestination>
            </Transformations>
          </Dataflow>
        </Tasks>
      </Package>
    </Packages>
</Biml>

Next, we’ll add a second dataflow (called dft_customer_02). The purpose of this flow is to reinsert (all) records into the customer table. At the moment, this dataflow is exactly the same as the first except that our source doesn’t restrict the number of records.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OdbcConnection Name="src_adventure_works" ConnectionString="Driver={SQL Server Native Client 11.0};Server=.;Database=AdventureWorksDW2012;Trusted_Connection=yes;" />
    <AdoNetConnection Name="dest_stage_ado" ConnectionString="Data Source=.;Initial Catalog=stage;Integrated Security=true;" Provider="System.Data.SqlClient" />
  </Connections>
    <Packages>
      <Package Name="copy_customer_with_error" ConstraintMode="Linear">
        <Tasks>
          
          <Dataflow Name="dft_customer_01">
            <Transformations>
              <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
                <DirectInput>
                  select top 100 CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
                </DirectInput>
              </OdbcSource>
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" >
                <ExternalTableOutput Table="[dbo].[customer]" />
              </AdoNetDestination>
            </Transformations>
          </Dataflow>

          <Dataflow Name="dft_customer_02">
            <Transformations>
              <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
                <DirectInput>
                  select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
                </DirectInput>
              </OdbcSource>
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" >
                <ExternalTableOutput Table="[dbo].[customer]" />
              </AdoNetDestination>
            </Transformations>
          </Dataflow>          
          
        </Tasks>
      </Package>
    </Packages>
</Biml>

Now, we’ll add the error configuration. If we think about what we are trying to achieve, we need to alter our dft_customer_02 dataflow to add another destination (capture the errors) and configure the first destination dest_customer to redirect errors to it. The snippet for dft_customer_02 becomes;

<Dataflow Name="dft_customer_02">
<Transformations>
  <OdbcSource Name="src_adventure_works" Connection="src_adventure_works" >
	<DirectInput>
	  select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
	</DirectInput>
  </OdbcSource>
  
  <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" >
	<ExternalTableOutput Table="[dbo].[customer]" />
	<ErrorHandling ErrorRowDisposition="RedirectRow" TruncationRowDisposition ="RedirectRow" />
  </AdoNetDestination>
  
  <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer_errors"  >
	<InputPath OutputPathName="dest_customer.Error" />
	<ExternalTableOutput Table="[dbo].[customer_errors]" />
  </AdoNetDestination>
</Transformations>

</Dataflow> 

The key components to include for error configuration is the definition of error handling (in dest_customer) and the InputPath for the errors destination. For ErrorHandeling, both attributes ErrorRowDisposition and TruncationRowDisposition are required. Also not the input path for the error table is the output path of the prior component (component name suffixed with “.Error”).

The Result Package

When we execute this biml and run the package, our output resembles the following. Note the 100 rows are correctly inserted in dft_customer_01 as we expect. Unfortunately, the second data flow (dft_customer_02) pushes all records to the error output. Since we only have 100 records in our customer table, we would expect 100 rows to be sent to the error because these are the records that violate the primary key (constraint).

This reason for this is the default configuration for our dest_customer destination (in dft_customer_02) – by default is tries a bulk load.

When a data destination is used that supports bulk inserts, and there is a fail on any insert, the entire buffer is rejected. We can see this setting in SSIS configuration for bulk inserts in the screen shot below. The same applies to fast loads on ODBC destinations which can use a Batch processing (Bulk) mode or a Row by Row insert. Naturally, the row by row methods will only reject the rows that fail.

This is a property of the destination, and our completed code (below) includes the fast load configuration (its only applicable to the dest_customer destination in dft_customer_02. When we run or package, the second data flow then rejects the 100 records that were previously inserted (remember to initialise the environment by delete records from the tables in our staging database).

Complete Code

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OdbcConnection Name="src_adventure_works" ConnectionString="Driver={SQL Server Native Client 11.0};Server=.;Database=AdventureWorksDW2012;Trusted_Connection=yes;" />
    <AdoNetConnection Name="dest_stage_ado" ConnectionString="Data Source.;Initial Catalog=stage;Integrated Security=true;" Provider="System.Data.SqlClient" />
  </Connections>
    <Packages>
      <Package Name="copy_customer_with_error" ConstraintMode="Linear">
        <Tasks>
          
          <Dataflow Name="dft_customer_01">
            <Transformations>
              <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
                <DirectInput>
                  select top 100 CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
                </DirectInput>
              </OdbcSource>
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" >
                <ExternalTableOutput Table="[dbo].[customer]" />
              </AdoNetDestination>
            </Transformations>
          </Dataflow>

          <Dataflow Name="dft_customer_02">
            <Transformations>
              <OdbcSource Name="src_adventure_works" Connection="src_adventure_works" >
                <DirectInput>
                  select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
                </DirectInput>
              </OdbcSource>
              
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer" UseFastLoadIfAvailable="false">
                <ExternalTableOutput Table="[dbo].[customer]" />
                <ErrorHandling ErrorRowDisposition="RedirectRow" TruncationRowDisposition ="RedirectRow" />
              </AdoNetDestination>
              
              <AdoNetDestination ConnectionName ="dest_stage_ado" Name="dest_customer_errors"  >
                <InputPath OutputPathName="dest_customer.Error" />
                <ExternalTableOutput Table="[dbo].[customer_errors]" />
              </AdoNetDestination>
            </Transformations>
            
          </Dataflow>          
          
        </Tasks>
      </Package>
    </Packages>
</Biml>

 

BIML VII – Extracting Data Using ODBC Destinations

In previous posts we looked at using Biml to extract data from our database tables to flat files and raw files. While these destinations can be handy, the majority of extracts that I use are to staging tables in a relational engine. This post looks at the Biml required to send our data to ODBC destinations.

The Environment

For the purpose of demonstration, I will use the DimCustomer table from adventure works. There’s nothing too complicated about that table, all I want to do is extract some data from it so we’ll just use a simple query (with code, first name and last name);

select 
CustomerAlternateKey
, FirstName
, LastName 
from dbo.DimCustomer;

For the destination, we’ll look at two options. Firstly we will rely on an auto mapping feature (so that we do not specify column mapping between the source and destination). Secondly, we will override that feature and specify column to column mapping. For this I’ll create two tables in a staging database called customer_automapping (with the same field names as the source) and customer_manualmapping (with different names). The SQL for these tables is given below;

create table dbo.customer_automapping
(
	CustomerAlternateKey nvarchar(25)
	, FirstName nvarchar(50)
	, LastName nvarchar(50)
);

create table dbo.customer_manual
(
	customer_cd nvarchar(25)
	, first_name nvarchar(50)
	, last_name nvarchar(50)
);

Proforma Template

We’ll use Biml to create two parallel dataflows in a package (the package will be called copy_customer) and the data flows will be called customer_automapping and customer_manualmapping. Our template package looks like the one below;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OdbcConnection Name="dest_stage" ConnectionString="Driver={SQL Server Native Client 11.0};...." />
    <OdbcConnection Name="src_adventure_works" ConnectionString="Driver={SQL Server Native Client 11.0};..." />
  </Connections>
  <Packages>
    <Package Name="copy_customer" ConstraintMode="Parallel">
      <Tasks>
        <Dataflow Name="customer_manualmapping">
			<Transformations>
			</Transformations>
        </Dataflow>
        <Dataflow Name="customer_automapping">
			<Transformations>
			</Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

Note we have already define placeholders for each dataflow, all we have to do is populate those (transformation) nodes. We also define our source and destination connections.

The ODBC Source

We have previously addressed an ODBC source so for simplicity will just add the code. This needs to be added to each dataflow transformation node;

<OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
  <DirectInput>
	select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
  </DirectInput>
</OdbcSource>

The ODBC Destination

We’d expect the ODBC destination to require a name (since every item requires one), connection and a table (name). These are the only requirements for an automapping feature (where the source fields are automatically mapped to the destination fields on the basis of name). The snippet for the automapping dataflow is therefore;

<Dataflow Name="customer_automapping">
  <Transformations>
	<OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
	  <DirectInput>
		select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
	  </DirectInput>
	</OdbcSource>
	<OdbcDestination Name="dest_customer" Connection="dest_stage" >
	  <ExternalTableOutput Table="[dbo].[customer_automapping]" />
	</OdbcDestination>
  </Transformations>
</Dataflow>

Note that the table is defined by the ExternalTableOutput tag. There is no need to fully qualify the table as I have done (by inclusion of the schema) and <ExternalTableOutput
Table=customer_automapping />

would suffice provided the default schema for the connection was the same as the table. As expected, our dataflow creates the source and destination components and automatically maps fields from the source to the destination based on name.

To explicitly set the source to target mappings, we need to include a columns (and column(s)) tag as a child node of the destination. The Biml for this is below but it is worth identifying that mapping will only be overwritten if it exists in this tag. For example, if two columns have the same name and there is no mapping column specification, the two fields will be auto matched. Therefore, to fully specify the mapping in the ODBC destination we would use the following snippet;

<OdbcDestination Name="dest_customer" Connection="dest_stage" >
  <ExternalTableOutput Table="[dbo].[customer]" />
  <Columns>
	<Column SourceColumn="CustomerAlternateKey" TargetColumn="customer_cd"  />
	<Column SourceColumn="FirstName" TargetColumn="first_name"  />
	<Column SourceColumn="LastName" TargetColumn="last_name"  />
  </Columns>
</OdbcDestination>

It’s clear to see that the mapping occurs between the SourceColumn and TargetColumn attributes of each column node.

The Complete Biml File

For completeness, we’ll add the complete Biml Code;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OdbcConnection Name="dest_stage" ConnectionString="Driver={SQL Server Native Client 11.0};…." />
    <OdbcConnection Name="src_adventure_works" ConnectionString="Driver={SQL Server Native Client 11.0};…;" />
  </Connections>
  <Packages>
    <Package Name="copy_customer" ConstraintMode="Parallel">
      <Tasks>
        <Dataflow Name="customer_manualmapping">
          <Transformations>
            <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
              <DirectInput>
                select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
              </DirectInput>
            </OdbcSource>
            <OdbcDestination Name="dest_customer" Connection="dest_stage" >
              <ExternalTableOutput Table="[dbo].[customer]" />
              <Columns>
                <Column SourceColumn="CustomerAlternateKey" TargetColumn="customer_cd"  />
                <Column SourceColumn="FirstName" TargetColumn="first_name"  />
                <Column SourceColumn="LastName"   />
              </Columns>
            </OdbcDestination>
          </Transformations>
        </Dataflow>
        <Dataflow Name="customer_automapping">
          <Transformations>
            <OdbcSource Name="src_adventure_works" Connection="src_adventure_works">
              <DirectInput>
                select CustomerAlternateKey, FirstName, LastName from dbo.DimCustomer;
              </DirectInput>
            </OdbcSource>
            <OdbcDestination Name="dest_customer" Connection="dest_stage" >
              <ExternalTableOutput Table="[dbo].[customer_automapping]" />
            </OdbcDestination>
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

Overwriting the Automap

There may be a situation where you want to Automap some fields but not others (say for example we did not want to populate the CustomerAlternateKey field). In this situation we can rely on auto-mapping and specify the column as not being used through the IsUsed attribute. For example, we could use;

<OdbcDestination Name="dest_customer" Connection="dest_stage" >
  <ExternalTableOutput Table="[dbo].[customer_automapping]" />
  <Columns>
	<Column SourceColumn="CustomerAlternateKey"  IsUsed="false" />
  </Columns>
</OdbcDestination>