Biml XV – ForEach Record in the DataSet (Control Flow)

Our prior posts showed techniques using the OLEDB Command to iterate over each row in the dataflow (see Biml XII and Biml XIII). Using that task we fire an SQL statement for each row in the dataflow. Column (field) values can be passed to (and returned from) the command so the technique may be handy when a single (and arguably simple) statement is required.

If we wish to embed more logic, tasks and more precise management into the iterative process, a different approach may be required. We could use a for loop task as an iterator. This operates in the control flow (as opposed to the OLEDB Command being a data flow task) and imbeds iteration in its own object container. Of course, we can still access field values from the dataset and obtain return values however, the use of container may improve usability (for SSIS control) because it can hold multiple tasks, implement precedence constraints and execute in entirety for each row that in a dataset.

This post looks at how to implement this solution in Biml.

The Package

The basic implementation of this technique of iteration is shown in the package below.

We don’t go deeply into the package design (since we’ll be creating it in Biml) however, there are a few key takeaways that we should note about the package and the variables used in the technique;

The first Execute SQL task (read_dim) assigns the results of a query (the recordset) into a variable called sub_cat_results. The type of the variable is a System.Object and that variable is in the scope of the package.

The recordset is simple (based on adventure works) and just returns ID’s and Names for product subcategories.

select ProductSubcategoryID  as sub_cat_id
, Name as sub_cat_name
from Production.ProductSubcategory;
The object of the next task (a for loop container titled for_each_record) is to iterate over each record in the dataset. As each ‘new’ row is encountered, field values for the row are assigned to the variables sub_cat_id and sub_cat_name. This assignment occurs in the outer part of the task so that those values can be accessed by other tasks within the bounds of the loop.

Additionally the variables (sub_cat_id and sub_cat_name) are scoped so that they only exist within the loop container. There is really no requirement to do this (and they could exist in the package scope) however such a design policy may make the package more concise and modular. You can read more about variable scoping here.

The loop container contains its own tasks which (of course) can access variables in the package and the loop container. We’ve used Execute SQL tasks in this example however, we are not limited to only these tasks.

In our simple example, our select_id task queries task is shown as;

select *
from [Production].[ProductSubcategory]
where ProductSubcategoryID = ?;

Naturally the parameter value passed is held in the sub_cat_id variable.

We have not implemented any precedence constraints in the for loop container – it is kept simple for the purposes of demonstrations.

The biml

The entire biml for this package is shown below. The key notes on the biml implementation are;

The variable sub_cat_results is defined as a child of the package. Since we want to limit the scope of the sub_cat_id and sub_cat_name variables to the loop container, they are defined as children of that task (ForEachAdoLoop).
The first SQL task (read_dim) needs to return a recordset and assign it to the variable sub_cat_results. In order to do this, we need to set the ResultSet property of the task (to Full) and configure the Results tag of the task.
Our iterator (see the tag ForEachAdoLoop) has properties which specifies the variable that holds a recordset (sub_cat_results) and the precedence constraint that is applicable to the task. Of course that constraint can be overwritten on a task by task basis.
As stated above, the variables that are to be used with the loop container are defined with the container (see the Variables child tag). Had these variables been defined as children of the package, they would not need to be defined at the task (loop task) level.

However, note the naming convention where the variables are used within the package (ie where called). The variable mapping within the loop refers to the sub_cat_id variable in a fully qualified manner (iteration_01.for_each_record.User.sub_cat_id). We could also create the package without fully qualification (ie User.sub_cat_id) if we wish.

The tasks within the loop container are held within a tasks tag. This is exactly the same as tasks are defined within the package and so we may liken the containers definition to that of the package.

The Biml

<Biml xmlns="">
        <OleDbConnection Name="adventure_works" ConnectionString="Provider=SQLOLEDB;Data Source=.\SQL2012;Integrated Security=SSPI;Initial Catalog=AdventureWorks2014" />
        <Package Name="iteration_01" ConstraintMode="Linear">
				<Variable Name="sub_cat_results" DataType="Object" />
				<!-- Get the DataSet Object -->
                <ExecuteSQL Name="read_dim" ConnectionName="adventure_works" ResultSet="Full">
					<DirectInput>select ProductSubcategoryID  as sub_cat_id, Name as sub_cat_name from Production.ProductSubcategory; </DirectInput>
					    	<Result Name="0" VariableName="User.sub_cat_results" />
				<!-- iterate over each row in the data set -->
                <ForEachAdoLoop Name="for_each_record" SourceVariableName="User.sub_cat_results" ConstraintMode="Parallel">
						<Variable Name="sub_cat_id" DataType="Int32">0</Variable>
						<Variable Name="sub_cat_name" DataType="String" />
							<VariableMapping Name="0" VariableName="iteration_01.for_each_record.User.sub_cat_id" />
							<VariableMapping Name="1" VariableName="iteration_01.for_each_record.User.sub_cat_name" />
					<!-- execute some tasks using the variables passed from the dataset -->

						<ExecuteSQL Name="select_id" ConnectionName="adventure_works">
							<DirectInput>select * from [Production].[ProductSubcategory] where ProductSubcategoryID = ?;</DirectInput>
								<Parameter Name="sub_cat_id" DataType="Int32" Direction="Input" VariableName="iteration_01.for_each_record.User.sub_cat_id" />

						<ExecuteSQL Name="select_name" ConnectionName="adventure_works">
							<DirectInput>select * from [Production].[ProductSubcategory] where [Name] = ?;</DirectInput>
								<Parameter Name="sub_cat_name" DataType="String" Length="50" Direction="Input" VariableName="iteration_01.for_each_record.User.sub_cat_name" />

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s