Biml X – Derived Columns


This post looks at the biml requirements and functionality of derived columns – how to apply logic to a column (replace a value) and add new columns. The scenario is pretty simple and builds on our other concepts of but now includes a little more ‘business logic’ (albeit simple) so we can look at a more typical package.

In this scenario, we start with some territory keys. Of course we could include more data but we want to keep it simple. Based on those keys, we have to look up the territory name. As luck would have it, not all territories have names and if there’s no valid name (that is, no lookup value found), we would like to assign a default value (“Unknown”). Also, we would like to keep a copy of the territories original name so we will store that as well. We will be using AdventureWorks2014 as our source database and staging as the destination (called stage).

Setup

The only setup requirements are the destination table (dbo.Territory). This can be added to the staging database with the following snippet;

create table dbo.Territory
(
	TerritoryID Int not null 
	, TerritoryName nvarchar(50) not null 
	, TerritoryNameUnadjusted nvarchar(50)
)

 

Also, we will show the completed package first – this gives an indication of the logic that we are trying to achieve with biml.

The Biml

The full biml is given in the snippet below. Since we have covered the general structure, sources, destinations and lookups in other posts, we will concentrate only on the biml for the derived columns transform.

There are a few things about the tag to note. Firstly, a column change (to the buffer) is managed through the column tag. This is a child of a columns parent which is a child of the derivedcolumns node (sounds obvious but it’s worth a call out). Perhaps this is best explained by examining the configuration for a derived column as in the image below. As we add (or change) more columns in the transform, we add more rows in the grid and (therefore) additional column tags would be added to our biml. Since XML requires containers, the columns tag is used to group these in biml.

Secondly, derivation as a new column or replacement column is handled by the ReplaceExisting property of the column tag. By default the new column is assumed to be a new column. Additionally (and related to column replacement), the DataType and Length of the column are required attributes but have no effect when the column is being used to replace an existing column. Here, the original type is maintained.

Finally, the formula is applied as text between the column tags (opening and closing tag)

The Package Biml

The Biml for this package is;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="src_oledb" ConnectionString="Provider=SQLOLEDB.1;Data Source=.\SQL2012;Initial Catalog=AdventureWorks2014;Integrated Security=SSPI;"/>
    <OleDbConnection Name="dest_oledb" ConnectionString="Provider=SQLOLEDB.1;Data Source=.\SQL2012;Initial Catalog=stage;Integrated Security=SSPI;"/>
  </Connections>

    
  <Packages>
    <Package Name ="data_conversion" ConstraintMode="Linear" ProtectionLevel="DontSaveSensitive" >
      <Tasks>
        <Dataflow Name="data_conversion_01">
          <Transformations>
             
            <OleDbSource Name="src_region" ConnectionName="src_oledb" >
              <DirectInput>
                SELECT [TerritoryID] FROM [Sales].[SalesTerritory]
              </DirectInput>
            </OleDbSource>

            <Lookup Name="lookup_region_name" OleDbConnectionName="src_oledb" NoMatchBehavior="IgnoreFailure">
              <DirectInput>SELECT top 5 TerritoryID, Name  FROM [Sales].[SalesTerritory] Order By TerritoryId</DirectInput>
              <Inputs>
                <Column SourceColumn="TerritoryID"/>
              </Inputs>
              <Outputs>
                <Column SourceColumn="Name" TargetColumn="TerritoryNameUnadjusted"/>
              </Outputs>
            </Lookup>

            <DerivedColumns Name="duplicate_territory_name">
              <Columns>
                <Column Name="TerritoryName" DataType="String" Length="50">
                  TerritoryNameUnadjusted
                </Column>
              </Columns>
            </DerivedColumns>
            
            
            <DerivedColumns Name="validate_territory_name">
              <Columns>
                <!-- the DataType and Length are required however they have no effect for ReplaceExisting=true -->
                <Column ReplaceExisting="true" Name="TerritoryName" DataType="String" Length="100"  >
                  (ISNULL(TerritoryName)) ? "Unknown" : TerritoryName
                </Column>
              </Columns>
            </DerivedColumns>
   
            <OleDbDestination Name="dest_territory" ConnectionName ="dest_oledb">
            <ExternalTableOutput Table="dbo.Territory" />
            </OleDbDestination>
             
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>  
  </Packages>
</Biml>

The output (below) shows the derived column has replaced the TerritoryName with Unknown where it has been previously null.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s