Biml IX – The Lookup


Our previous posts biml data flows have only included extractions and destinations. We have looked at moving data (the relation way) here and error configuration here. These posts have set the framework for discussing other dataflow transformations that are commonly used. Perhaps the most common of these is the lookup that is the focus of this post. We look at biml requirements for three configurations;

  • a bare bones implementation
  • customisation for differing input column names (ie looking up based on different columns) and
  • finally handling lookup errors

The Environment

As with the previous example, well focus on something simple and continue with the customer theme. Our goal is therefore, provided with a list of customer codes (the field CustomerAlternateKey in DimCustomer), we will use biml to create the transformations look up their first and last name and then populate our staging table (dbo.customer). That staging table can be created with the following sql and unlike our prior posts, we will not enforce any constraints on it.

create table dbo.customer
(
    CustomerAlternateKey nvarchar(25) NOT NULL,
    FirstName nvarchar(50),
    LastName nvarchar(50)
);

Our proforma script to looks like the one below. This includes connections, the package (container), dataflow, source and destination components with the package and dataflow. You’ll also note that we have also left a place holder for the lookup.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

  <Connections>
   <OleDbConnection Name="src_adventure_works_oledb" ConnectionString="…" />
    <OleDbConnection Name="dest_stage" ConnectionString="…" />
  </Connections>

  <Packages>
    <Package Name="lookup" ConstraintMode="Linear">
      <Tasks>

        <Dataflow Name="dft_customer_01">
          <Transformations>

            <OdbcSource Name="src_customer" Connection="src_adventure_works_odbc">
              <DirectInput>select CustomerAlternateKey from dbo.DimCustomer;</DirectInput>
            </OdbcSource>
			
           .... some lookup ....
            
        <OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

          
</Transformations>
        </Dataflow>
	   
      </Tasks>
    </Package>
  </Packages>
</Biml>

Part 1 – Bare Bones

If we were to think about how the lookup transformation works, we can determine the minimum requirements for the lookup. There is an input to the component as a base, then we would specify the

  • source for our lookup (by data source and query)
  • how the two dataset relate to each other (matching field)
  • the columns (from the lookup data set) that are to be appended to the original dataset.

This is not complex and we would also expect the column matching (and outputs) to look like the image below. Note the match is on CustomerAlternateKey and the FirstName and LastName fields are returned.

The SQL query to lookup values (that is provide the lookup table) is defined as;

Select
CustomerAlternateKey
, FirstName
, LastName
from dbo.DimCustomer;

When we look at the biml required for this, we can see these minimal set of options are defined as child nodes of the lookup task. That is

  • The Lookup tag specifies the data source to use as an attribute and initialises the transform.
  • The source query is specified via the DirectInput tag
  • The Inputs tag specifies what fields to match on.
  • The Outputs tag specifies what column are returned from the lookup table (or lookup query)

<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb" >
  <DirectInput>
	Select 	CustomerAlternateKey , FirstName, LastName	from dbo.DimCustomer;
  </DirectInput>
  <Inputs>
	<Column SourceColumn="CustomerAlternateKey" />
  </Inputs>
  <Outputs>
	<Column SourceColumn="FirstName" />
	<Column SourceColumn="LastName" />
  </Outputs>
</Lookup>

Now may also be a good sedge way to remember how the lookup works and note that it is not a join. The data buffer flows into the transform and appends columns based on the lookup. Now, by default, the transform will fail if there is no match found however, the component is no a join.

Part 2 – Matching Column Names

In Part 1 (above), data is looked up on an auto match based on column names (Note the Columns node of the Inputs tag). However, commonly, the base column names and lookup column names are not named the same. For example, consider our input to the transform specified CustomerAlternateKey as code_lhs and the query defining the lookup code as code_rhs.

If this is the case, we must explicitly specify the Source and Target column names for the Inputs tag. The OledbSource and Lookup for this situation would be defined as in the biml below;

<OleDbSource  Name="src_customer" ConnectionName="src_adventure_works_oledb">
	<DirectInput>select CustomerAlternateKey as code_lhs from dbo.DimCustomer;</DirectInput>
</OleDbSource>            

<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb">
    <DirectInput>
		select
			CustomerAlternateKey as code_rhs
            , FirstName
            , LastName
        from dbo.DimCustomer;
    </DirectInput>
    <!-- input columns are always included in the output, if the match does not occur on name, we specify the columns -->
    <Inputs>
		<Column SourceColumn="code_lhs" TargetColumn="code_rhs" />
	</Inputs>
		<Outputs>
			<Column SourceColumn="FirstName" />
            		<Column SourceColumn="LastName" />
		</Outputs>
</Lookup>

<OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	<ExternalTableOutput Table="[dbo].[customer]" />
    <Columns>
		<Column SourceColumn="code_lhs" TargetColumn="CustomerAlternateKey" />
    </Columns>
</OleDbDestination>

Note also that (since the buffer or base table) defines the CustomerAlternateKey as code_lhs, there is no auto mapping for the data destination and therefore we must explicitly specify the mapping.

It should also be apparent that, if the lookup is based on a composite key (or more than one column), each column should be expressed as a column.

Part 3 – Configuring Match Behaviour

The most common configuration for the lookup component is what to do in the case of lookup failures (that is, what happens to the components output and operation when a match is not found for an input row). The default behaviour is to fail the task however, there are several other options which include;

  • Redirect the unmatched records to a new output (NoMatch)
  • Ignore the issue and pass null values as the values for lookup fields (of course there only apply to the unmatched rows)
  • Redirect the unmatched records to the Error output

These configurations are handled in a very similar manner and rely on the NoMatchBehaviour property for the Lookup tag. What is important is how the output buffers are used. That is, if you specify that match failures be sent to the Error, you should specify a destination that consumes it (note the InputPath specification for each destination). I’ve used the word should because such an oversight will not through an error in the package (creation or execution) but rows sent to matched buffers will be lost.

The Biml for this data flow (redirecting the Error to a no match buffer is shown below). Also note that we will force rows to that buffer since our lookup table is restricted to 100 rows.

<Dataflow Name="dft_customer_03">
  <Transformations>
	<OleDbSource  Name="src_customer" ConnectionName="src_adventure_works_oledb">
	  <DirectInput>select CustomerAlternateKey from dbo.DimCustomer;</DirectInput>
	</OleDbSource>
	<Lookup Name="lookup_names" OleDbConnectionName="src_adventure_works_oledb" NoMatchBehavior="RedirectRowsToNoMatchOutput">
	  <!-- other NoMatch Options {IgnoreFailure, RedirectRowsToErrorOutput, FailComponent, RedirectRowsToNoMatchOutput} -->
	  <DirectInput>
		select top 100 
		CustomerAlternateKey
		, FirstName
		, LastName
		from dbo.DimCustomer;
	  </DirectInput>
	  <Inputs>
		<Column SourceColumn="CustomerAlternateKey" />
	  </Inputs>
	  <Outputs>
		<Column SourceColumn="FirstName" />
		<Column SourceColumn="LastName" />
	  </Outputs>
	</Lookup>
	
<OleDbDestination Name="dest_customer" ConnectionName="dest_stage">
	  <InputPath OutputPathName="lookup_names.Match" />
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

	<OleDbDestination Name="dest_customer_errors" ConnectionName="dest_stage">
	  <InputPath OutputPathName="lookup_names.NoMatch" />
	  <ExternalTableOutput Table="[dbo].[customer]" />
	</OleDbDestination>

  </Transformations>
</Dataflow>

You’ll also note that the error destination is physically the same as the match (success) destination. This is for illustration purposes only (and some succinct database design). The execution for this is shown below;


Conclusion

This post has looked at various biml snippets for the lookup task.

 

Advertisements

2 thoughts on “Biml IX – The Lookup

  1. Pingback: Biml XX – Automating Lookups (a fundamental dimensional process) | Paul te Braak

  2. Great article. This is very thorough. Thanks for addressing different source and target column names for Ole DB Destinations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s