Traditional SSIS development requires the ETL developer to create a package and then manually add and configure tasks. While this method of visual development is fine, it suffers from reproducibility and maintenance overhead. For example, imagine (a very contrived scenario) where you create a package with a single ‘Execute SQL Task‘ and configure it to fire a procedure. Simple enough but now suppose you want to create a new package and fire a different proc – here you have to manually go in and follow all the same steps again. Perhaps not so bad but what if you create 10 packages and then decide that the Execute SQL Task should be prefixed EXSQL, you would have to manually go into each package and rename the task. This could be very cumbersome and it (by nature) carries a maintenance burden (or maintenance overhead).
BIML (Business Intelligence Markup Language) is an XML specification that (amongst other things) allows you to specify the creation SSIS packages. One of the great things about BIML is that it can also include code snippets that amplifies the ability to generate code. This first post will form part of a series that show how to use BIML to create packages and what type of outcome we can achieve. But (of course) first things first so let’s start with a simple goal, so let’s look at using BIML to generate a package that executes an SQL command.
Before we continue, a quick revisit my above comments about maintainability for a moment.
An SSIS package is really just an XML file right, so why can’t I just edit that?
If you’ve ever played around with an SSIS package, you’ll probably know that it is an XML file. You can see this by viewing the code of your packages as shown below (right click on the package and view code). So you may think that you can generate or alter this pretty easily.
However, I have found this XML to be very finicky. Suppose you wanted to redefine a connection by editing script (something which should be simple you’d think and I’ve discussed here). You would expect that you could just examine the code, find the connection string, alter it and save it. But I’ve never had any success in that approach. When you try to open the file again (using the visual designer) it is invalid so your pretty much helpless with the changes you tried to make (sorry but there’s no comeback). That’s only to change an existing package so the creation of a package is (IMO of course) is just not possible. You could use an API but that just seems out of reach for most.
Of course, newer versions of SSIS have project wide objects (like connections). Using these would naturally require less maintenance however were are still faced with task management (altering and creation) and package development. Further, if we want to define a group of packages by meta-data (say for example a list of tables to extract), there is no current way to do it in SSIS. A much simpler approach is BIML with give us a generic specification to development.
What do I Need to get running with BIML?
BIML is available when BIDS helper is installed. BIDS Helper is a community addin for SQL Server Data Tools (SSDT and BIDS) and is available on code plex. If you haven’t heard of BIDS helper before, take some time to look around the features. It really is a must have for business intelligence development in the Microsoft space.
Once BIDS helper is installed, you can add a new BIML file by right clicking on the Project (or SSIS Packages) node and selecting ‘Add New Biml File‘. The file is created in the Miscellaneous Folder (as shown below).
A single BIML file can contain the code that creates many packages (as will become apparent) but for now would we will just work with the basics and use it to create a package that fires an SQL statement.
Opening the BimlScript file shows the XML in the file (we should probably rename it to give it a decent name but we will not worry about it this time). When you start, the file just contains the root node (as below).
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> </Biml>
What BIML now allows us to do is create objects within the root (and later other nodes). For example, since we want to create a package, we naturally expect that we need to add a package child. Naming is pretty intuitive since it follows the SSIS object names. I’ve already hinted that we can use the same file to create multiple packages so it seems natural that a package tag should be embedded within a packages tag. Something like the following.
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Packages> <Package></Package> </Packages> </Biml>
Actually, I did cheat a little (and so can you). The intellisense functionality lets you know what tags are allowed at the position you are. Remember in XML that a tag is just an identifier for an object so since I want to do something in within by BIML tag (where else would I put it), I can just open a tag and see what options are available. I want to create a package (so a search for package shows I can add packages and of course within that, I can add a package).
Validation and Creation
Back to our package snippet. We’ve got the packages declaration (as above) to specify a package within a packages node. You’d expect there is a way to validate the XML to be sure it has the required information to create packages? Well, just right click on the (explorer solution) and select ‘Check Biml for Errors‘. A dialog opens with a list of errors.
If we think about it, it makes sense doesn’t it? Of course we need to give our packages a name! In contrast, the error for ConstraintMode is something that we take for granted when we generate packages visually. We define the order of tasks in a package through constraints (or connectors). However Biml has no indication and so the layout of tasks other than the order in which they appear in XML. How they relate to each other must be defined in the package declaration and this is the purpose of the ConstraintMode property. There are two allowable values for this property (either parallel or linear) which (as the name suggests) specifies that tasks will not be connected or connected sequentially.
So in order to create a package, we update the Biml code as (note that since we are only having one task in this package the constraint mode is irrelevant);
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Packages> <Package ConstraintMode="Linear" Name="package_01"></Package> </Packages> </Biml>
We now have everything that we need to generate packages from the script. Simply right click on the Biml file and select the Generate SSIS Packages (as above). You’ll see a new packages create under the SIS packages node in solution explorer. Of course there are no tasks in our package.
Adding the Execute SQL task to the Package.
We already seen that the XML nodes in naming conventions are very similar to that of SSIS (both in structure and definition). If we think about adding an Execute SQL task to the package, then, there would be a few points we could make (or at least assume);
- a task would be embedded within a tasks node
- the task would probably be defined by name (or a task type)
- an execute SQL task requires a connection
- and execute SQL task requires a command
- the connection would be required before the command.
Turns out that the Biml to add this task to our package is pretty straightforward. Of course you can use the IntelliSense to suggest what nodes are available and, when you look at the code and think about what you’re trying to do it seems to make sense (that is, create a connection, create a package, at a task to the package uses a connection and has some syntax).
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <Connection Name="SQL01" ConnectionString="Provider=SQLNCLI10.1;Data Source=.;Persist Security Info=False;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW"/> </Connections> <Packages> <Package ConstraintMode="Linear" Name="package_01"> <Tasks> <ExecuteSQL Name="fire_proc_01" ConnectionName="SQL01" > <DirectInput>Select 1</DirectInput> </ExecuteSQL> </Tasks> </Package> </Packages> </Biml>
Perhaps the most interesting thing about this snippet is the connection object. We’d expect it to be embedded within connections however, the location as a root node gives us an advantage of reuse, so adding another task with the package or additional packages and tasks can reuse that connection. For example, to create 2 packages with 2 Execute SQL commands in each (which require the first to complete and then the next), we could write;
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> <Connections> <Connection Name="SQL01" ConnectionString="Provider=SQLNCLI10.1;… (as above) "/> </Connections> <Packages> <Package ConstraintMode="Linear" Name="package_01"> <Tasks> <ExecuteSQL Name="SQL_CMD_1" ConnectionName="SQL01" > <DirectInput>Select 1</DirectInput> </ExecuteSQL> <ExecuteSQL Name="SQL_CMD_2" ConnectionName="SQL01" > <DirectInput>Select 1</DirectInput> </ExecuteSQL> </Tasks> </Package> <Package ConstraintMode="Parallel" Name="package_02"> <Tasks> <ExecuteSQL Name="SQL_CMD_1" ConnectionName="SQL01" > <DirectInput>Select 1</DirectInput> </ExecuteSQL> <ExecuteSQL Name="SQL_CMD_2" ConnectionName="SQL01" > <DirectInput>Select 1</DirectInput> </ExecuteSQL> </Tasks> </Package> </Packages> </Biml>
|This produces package_01 which looks like and we can see how the specification of ConstraintMode has caused SQL_CMD_1 to link to SQL_CMD_2.
Note the output from the second package definition (where ConstratinMode is Parallel) does not introduce precedence constraints between the 2 tasks.
This post has looked at using Biml to create packages in SSIS. It’s a pretty simple example, but we can see how the XML biml can be used to automatically create (and recreate) packages. If we wanted to change a property (say for example the SQL syntax), we could just change it in the code and regenerate the packages.
In later posts, we’ll extend this functional knowledge for more complicated scenarios.