BIML XIX – Creating a Staging Area using Biml


Our prior post (Part I and Part II) looked at using biml to extract data from a source system into staging tables. In that post, we focused on using defined content to create the extracts and override the default column mapping that is available in biml. That post relied on the existence of the target environment (which is a given). However, how to we create the staging environment? This post looks at how to do that using biml.

The goal of this post therefore is to create a staging environment using biml based on the system specification of a source system. Using system tables are handy for several reasons. Firstly, every database has them so we can create a generic framework to define a staging environment. Secondly, we can use the information in them to alter our output (for example, an automatic conversion from an oracle varchar2 type to a SQLServer nvarchar type). Thirdly, we can use biml script (more specifically code functions in our script) to alter values are provided (for example naming standards – the conversion of a column name ColumnName to column_name). Fourthly, we can override meta data (for example substituting schema’s). Finally, we can use biml to append fields to our target environment that do not exist in our source (for example adding additional fields).

Table Iteration

The basis of our package is the ExecuteSql task. Our output will create a package with an ExecuteSql task for each table that we want to create. Therefore our proforma biml would look like the following.

<Packages>
	<Package Name="001_Create_Staging" ConstraintMode="Linear">
		<Tasks>
			<ExecuteSQL .... some create table statement … />
		</Tasks>
	</Package>
</Package>

As a first step, we want to create a task for each table in our source system. Here, we don’t worry about defining columns – we just create the ExecuteSql place holder. In order to do that, we retrieve a table showing source schema and tables (in SQL Server, we can use the INFORMATION_SCHEMA.TABLES system table for this). Then we iterate over each row creating the ExecuteSql task (as below).

<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="2" #>
<#
	string _stage_con_string = @"Data Source=.\SQL2012;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=AdventureWorks2012";
	string _table_name_sql = "select  TABLE_SCHEMA , table_name from INFORMATION_SCHEMA.TABLES  where TABLE_TYPE='BASE TABLE'"; 
	
	DataTable _table_names = new DataTable();
	SqlDataAdapter _table_name_da = new SqlDataAdapter(_table_name_sql, _stage_con_string);
    _table_name_da.Fill(_table_names);	
	
#>


<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="staging_db" ConnectionString="Provider=SQLNCLI11;Data Source=.\SQL2012;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=stage" />
  </Connections>
	<Packages>
		<Package Name="001_Create_Staging" ConstraintMode="Linear">
			<Tasks>
				<# foreach(DataRow _table in _table_names.Rows) {	#>			
				<ExecuteSQL Name="create_import_<#= _table["TABLE_SCHEMA"] #>_<#= _table["TABLE_NAME"] #>" ConnectionName="staging_db">
					<DirectInput>						 
create table import.[<#= _table["TABLE_SCHEMA"] + "_" + _table["TABLE_NAME"] #>] ()
        				</DirectInput>
				</ExecuteSQL>
				<# } #>
            </Tasks>
        </Package>
    </Packages>
</Biml>

Here, a task is created (called create_import_schema_table) for each table in our source. The SQL we are planning to generate (see the DirectInput tag) automatically moves the table in our source to an import schema in our target environment. The name in the target environment is driven by the source tables schema and name, for example, consider the results of our information_schema query (as below)

The table Production.ScrapReason would be created as import.Production_ScrapReason in our staging environment.

Column Iteration

Naturally, our next requirement is to retrieve a list of columns for each table that we are iterating through and append this to our create table syntax. This is a similar concept to the prior post where we used an inner loop, a loop that relies on the current table to produce a column list. We can do this by adding the biml script as below. Note that here we only retrieve the column spec’s for the current table and do not do anything with it.

<DirectInput>
	create table import.[<#= _table["TABLE_SCHEMA"] + "_" + _table["TABLE_NAME"] #>]
	(
	<#
		string _col_name_sql = "select COLUMN_NAME, DATA_TYPE,  CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA='" + _table["TABLE_SCHEMA"] + "' and TABLE_NAME='"+ _table["TABLE_NAME"] + "' order by ORDINAL_POSITION ";
		
		DataTable _col_names = new DataTable();
		SqlDataAdapter _col_names_da = new SqlDataAdapter(_col_name_sql, _stage_con_string);
		_col_names_da.Fill(_col_names);

		// iterate over each column and get the definition 
		
	#>
	)
</DirectInput>

We can see how we can use the column specification to define the rest of the create table syntax. For example, consider the table ScrapReason (as below). By iterating over each row, we can append that tables columns (including datatypes) to our create table syntax.

We should have an idea of how to do this iteration already – add some logic that looks at the values in each column to produce a string. For example, the above specification might look something like this;

Create table import.production_scrapreason
(
	ScrapReasonId smallint(5,0)
	, Name nvarchar(50)
	, ModeifedDate datetime
)

However, generating the syntax like this not without issue and the above create table will fail because of the smallint(5,0) specification. What we need to do is include some logic that can take a row and, based on the datatype, produce the appropriate column specification. This would allow us to choose what is created (as a result of the source data type) and transfer between datatypes. For example, consider the following function which does this. Here we output a string with uses the column name and datatype to determine what precision and character values to use. For example, a numeric type would use both the precision and scale, a FLOAT type would use precision and text types would use the character maximum length.

public string RowConversion(DataRow Row)
{
    string _ret = "[" + Row["COLUMN_NAME"] + "] " + Row["DATA_TYPE"];


    switch (Row["DATA_TYPE"].ToString().ToUpper())
    {
        case "NVARCHAR":
        case "VARCHAR":
        case "NCHAR":
        case "CHAR":
        case "BINARY":
        case "VARBINARY":
            if (Row["CHARACTER_MAXIMUM_LENGTH"].ToString() == "-1")
                _ret += "(max)";
            else
                _ret += "(" + Row["CHARACTER_MAXIMUM_LENGTH"] + ")";
            break;

        case "NUMERIC":
            _ret += "(" + Row["NUMERIC_PRECISION"] + "," + Row["NUMERIC_SCALE"] + ")";
            break;

        case "FLOAT":
            _ret += "(" + Row["NUMERIC_PRECISION"] + ")";
            break;
    }
  
    return _ret;
}

What we would like to do is include this function into our biml script and we can do that by adding another code block (identified as <#+ …code… #>). Since the same function will be used regardless of where we are in code (the function is the same for the entire file), the snippet can be added to the top section of our biml file. Our create table syntax then becomes an activity of calling that function (as below)

<DirectInput>
	create table import.[<#= _table["TABLE_SCHEMA"] + "_" + _table["TABLE_NAME"] #>]
	(
	<#
		string _col_name_sql = "select COLUMN_NAME, DATA_TYPE,  CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA='" + _table["TABLE_SCHEMA"] + "' and TABLE_NAME='"+ _table["TABLE_NAME"] + "' order by ORDINAL_POSITION ";
		
		DataTable _col_names = new DataTable();
		SqlDataAdapter _col_names_da = new SqlDataAdapter(_col_name_sql, _stage_con_string);
		_col_names_da.Fill(_col_names);

		for (int _i=0; _i<_col_names.Rows.Count ; _i++  )
		{
			DataRow _r = _col_names.Rows[_i];
			
			if (_i == 0)
				WriteLine(RowConversion(_r));
			else
				WriteLine(", " + RowConversion(_r));								
		}
		
	#>
	)
</DirectInput>

Here I have used a slightly different technique in iteration – but the main snippet of code lies in the function call RowConversion(_r). Since I need to include commas in my column definition, and that definition may require a leading or trailing comma, I can use the rows position to determine if a comma is required (or not). That is, the first row has no commas and all subsequent rows have a leading comma. This will produce the following biml snippet.

<DirectInput>
	create table import.[Production_ScrapReason]
	(
	[ScrapReasonID] smallint
	, [Name] nvarchar(50)
	, [ModifiedDate] datetime
	)
</DirectInput>

Custom Columns

If I want to add additional columns to each table (for example, an append_dt column), we can simply add those columns as static biml in our definition (notice that the complete solution below includes an append_dt field in the DirectInput tag). Naturally, column names should be unique for each table and the command will fail should an additional column (name) exist in the source table. We can solve this with script and that may be the subject of the next post.

Final Solution

For completeness, I have included the complete solution below.

<#@ import namespace="System.Data" #>
<#@ import namespace="System.Data.SqlClient" #>
<#@ template language="C#" tier="2" #>
<#
	string _stage_con_string = @"Data Source=.\SQL2012;Persist Security Info=true;Integrated Security=SSPI;Initial Catalog=AdventureWorks2012";
	string _table_name_sql = "select  TABLE_SCHEMA , table_name from INFORMATION_SCHEMA.TABLES  where TABLE_TYPE='BASE TABLE'"; 
	
	DataTable _table_names = new DataTable();
	SqlDataAdapter _table_name_da = new SqlDataAdapter(_table_name_sql, _stage_con_string);
    _table_name_da.Fill(_table_names);	
	
#>
<#+ 
public string RowConversion(DataRow Row)
{
    string _ret = "[" + Row["COLUMN_NAME"] + "] " + Row["DATA_TYPE"];


    switch (Row["DATA_TYPE"].ToString().ToUpper())
    {
        case "NVARCHAR":
        case "VARCHAR":
        case "NCHAR":
        case "CHAR":
        case "BINARY":
        case "VARBINARY":
            if (Row["CHARACTER_MAXIMUM_LENGTH"].ToString() == "-1")
                _ret += "(max)";
            else
                _ret += "(" + Row["CHARACTER_MAXIMUM_LENGTH"] + ")";
            break;

        case "NUMERIC":
            _ret += "(" + Row["NUMERIC_PRECISION"] + "," + Row["NUMERIC_SCALE"] + ")";
            break;

        case "FLOAT":
            _ret += "(" + Row["NUMERIC_PRECISION"] + ")";
            break;
    }
  
    return _ret;
}
#> 

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="staging_db" ConnectionString="Provider=SQLNCLI11;Data Source=.\SQL2012;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=stage" />
  </Connections>
	<Packages>
		<Package Name="001_Create_Staging" ConstraintMode="Linear">
			<Tasks>
				<# foreach(DataRow _table in _table_names.Rows) {	#>			
				<ExecuteSQL Name="create_import_<#= _table["TABLE_SCHEMA"] #>_<#= _table["TABLE_NAME"] #>" ConnectionName="staging_db">
					<DirectInput>
						create table import.[<#= _table["TABLE_SCHEMA"] + "_" + _table["TABLE_NAME"] #>]
						(
						<#
							
	        				string _col_name_sql = "select COLUMN_NAME, DATA_TYPE,  CHARACTER_MAXIMUM_LENGTH, NUMERIC_PRECISION, NUMERIC_SCALE from INFORMATION_SCHEMA.COLUMNS where TABLE_SCHEMA='" + _table["TABLE_SCHEMA"] + "' and TABLE_NAME='"+ _table["TABLE_NAME"] + "' order by ORDINAL_POSITION ";

				            DataTable _col_names = new DataTable();
				            SqlDataAdapter _col_names_da = new SqlDataAdapter(_col_name_sql, _stage_con_string);
				            _col_names_da.Fill(_col_names);

				            for (int _i=0; _i<_col_names.Rows.Count ; _i++  )
							{
								DataRow _r = _col_names.Rows[_i];
								
								if (_i == 0)
									WriteLine(RowConversion(_r));
	                			else
	                    			WriteLine(", " + RowConversion(_r));								
                            }
							
						#>
						, append_dt datetime
						)
          </DirectInput>
				</ExecuteSQL>
				<# } #>
      </Tasks>
		</Package>
	</Packages>
</Biml>

Conclusion

This post has looked at using biml to create a staging area. A package is created that includes ExecuteSql statements for each table in a source database information_schema. This approach has a flexible approach to creation and offers many advantages which revolve around table alteration (different target schema, additional columns, datatype alteration etc).

Advertisements

3 thoughts on “BIML XIX – Creating a Staging Area using Biml

  1. After validating validating the code, I’m getting this error:
    ‘Production_ScrapReason’ in an unexpected token. Expecting white space.

    I really need some help making this work.
    THX.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s