Biml XI – Conditional Splits in Data Flows


At first thought, I don’t see a huge need for conditional splits in SSIS, separating the stream (as it were) is something that I usually try and do at the source. That is, we should only bring in as much data as is required by the transform. However, in thinking about it, the transform is an absolute necessity when you create your own SCD (Slowly Changing Dimension) logic. Rather than the native SSIS SCD component which really doesn’t offer too much flexibility, building your own can offer huge performance improvements and complete control over the package execution.

The basics of a role your own SCD component work like this;

  1. the dimension table includes hash column which stores the hash for changing attributes
  2. the ETL package builds a view of the dimension and in doing so calculates the hash for the changing attributes
  3. the package looks the hash from the dimension table and then splits the data into three conditional streams;
    1. If there is no hash, the record doesn’t exist and must be added.
    2. If there is a hash and it is not the same as the ETL hash, the changing attributes have changed (from the dimension table) and the new record must be inserted (with the previous being expired).
    3. Finally, if the two hash values match, the changing attributes have not changed and no changes required to the dimension data.

It’s easy to see here why the split component is needed.

The Split in Biml

To demonstrate how the split works in Biml, we’ll continue with using customer names from adventure works. Our purpose is simply to create different outputs based on the customer last name. That is, if the customer’s last name begins with A they should be sent to an output, B to another and the remaining going to the default (or catch all) output. The executed package would appear as below. Note counts coming into the buffer (18,484) are split between A records (842), B records (1,061) and everything else (16,581).

The Biml for this package is shown below. Note that the outputs for the split (see the ConditionalSplit
tag) are defined by an OutputPath
(which is of course contained in the OutputPaths
root node of the transform). The condition for the split is defined in the Expression
nodes text. There is no requirement to define the default output.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="src_adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2008R2"/>
    <OleDbConnection Name="dest_stage" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2008R2;Integrated Security=SSPI;Initial Catalog=stage"/>
  </Connections>
  <Packages>
    <Package Name="split_transform" ConstraintMode="Linear">
      <Tasks>
        <Dataflow Name="split_transform_01">
          <Transformations>
            <OleDbSource Name="src_names" ConnectionName="src_adventure_works">
              <DirectInput>
                select
                [CustomerAlternateKey]  as customer_code
                , [FirstName] as first_name
                , [LastName] as last_name
                from dbo.DimCustomer
              </DirectInput>
            </OleDbSource>

            <ConditionalSplit Name="split_name">
              <OutputPaths>
                <OutputPath Name="Name_A" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="A"</Expression>
                </OutputPath>
                <OutputPath Name="Name_B" IsErrorOutput="false">
                  <Expression>LEFT(last_name,1)=="B"</Expression>
                </OutputPath>
              </OutputPaths>
            </ConditionalSplit>

            <OleDbDestination Name="Customers_Default" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Default" />
              <ExternalTableOutput Table="[dbo].[customers]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_A" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_A" />
              <ExternalTableOutput Table="[dbo].[customers_A]" />
            </OleDbDestination>

            <OleDbDestination Name="Customers_B" ConnectionName="dest_stage">
              <InputPath OutputPathName="split_name.Name_B" />
              <ExternalTableOutput Table="[dbo].[customers_B]" />
            </OleDbDestination>
            
            
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

The XML Gotcha

If you’ve used SSIS for a while, you’ll probably have forgotten the pain that you had learning that expressions are more C# based rather than formula. For example, to use an equal in a formula you’d use == rather than =. When we work with XML (html or any markup type language), the nature of the syntax may cause some issues with special characters are required. For example, XML is heavily reliant on the characters <> for tags, so how can we use those characters in a formula without the engines interpreter mistaking the symbol for a tag?

For example what would happen if we wanted to have an output like last name being from A <= M. In this case we have to substitute the special character with a code (as is exactly what happens in HTML). The snippet below shows how would achieve this by using the character codes for the symbols >, & and <. Our goal here is to use split names A-K and M-Z.

<ConditionalSplit Name="split_name">
  <OutputPaths>
	<OutputPath Name="Name_A" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="A" &amp;&amp; LEFT(last_name,1)&lt;"M" </Expression>
	</OutputPath>
	<OutputPath Name="Name_M" IsErrorOutput="false">
	  <Expression>LEFT(last_name,1)&gt;="M"</Expression>
	</OutputPath>
  </OutputPaths>
</ConditionalSplit>


 

Advertisements

7 thoughts on “Biml XI – Conditional Splits in Data Flows

  1. Pingback: Biml XII – Oledb Command (Dataflow transformation) | Paul te Braak

  2. Hi Paul,

    Running as is gives following error, “The number of input columns for Customers_B.Inputs[split_name_Name_B_Customers_B] cannot be zero.”, removing alias for [CustomerAlternateKey] in the first select enables succesfull execution.

    Regards,
    Tim.

    • Hi Tim,

      I’d first check the naming of the ConditionalSplit output matches the InputPath for each destination (probably a typo and remember to qualify the transforms name). Failing that, you could send me your code and i could have a quick look.

      Regards,

      Paul

  3. Thanks for your article!

    Even though there is no requirement to define the default output for the conditional split, what would be the xml tags for doing so if one wants to change the name of that default output?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s