BIML III – Extracting Database Tables to Raw Files


In a previous post, we looked at extracting a table to a flat file. Personally, I am not a huge fan of flat file output because you lose so much metadata when a table with all its data types are dumped into text. To preserve this information, I would much rather use a raw file output. This post looks at the Biml requirements to output to a raw file.

Overview

If you haven’t read the post about output to test files, it may be worth a quick read. Basically we discuss the process that we use to extract a table as the following list of activities.

  1. Create a connection to SQL Server
  2. Add a dataflow task
  3. Specify a source (let’s just use an OLEDB source) and provide the query.
  4. Add a flat file as a destination output then join the source to destination and then configure the output by creating a new flat file connection manager.

Of course, we could replace flat file here with raw file and we would pretty much have the same method. That’s exactly what is required – the only real difference is that the raw file format does not require delimiter specifications. Our proforma template remains the same;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <!-- Placeholder for connection -->
  </Connections>
  <FileFormats>
    <!-- Placeholder for structures-->
  </FileFormats>
  <Packages>
    <Package Name="CopyAccount" ConstraintMode="Linear">
    <!-- Place Holder for a Package -->
    </Package>
  </Packages>
</Biml>

Connections

Our database connection remains the same. The output (raw file) is specified as a file connection.

<Connections>
  <OleDbConnection Name="Source" ConnectionString="Provider=SQLOLEDB;Data Source=localhost;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2012" />  
  <FileConnection Name="Target" FilePath="F:\\Output.raw"  />
</Connections>

 

FileFormat

The file format (remembering that it’s a child of the FileFormats and defines a table structure) only needs to specify the format name and column output.

  
  <FileFormats>
    <RawFileFormat Name="Output" >
      <Columns>
        <Column Name="AccountKey"   />
        <Column Name="AccountCodeAlternateKey"   />
        <Column Name="AccountDescription"   />
        <Column Name="AccountType"    />
        <Column Name="Operator"   />
      </Columns>
    </RawFileFormat>
  </FileFormats> 

 

As with text formats, the order is not important, only the Name. We will utilise auto mapping between the source and target.

The Package/ Data Flow

For the text file extract, our package contained a dataflow which simply had a source and destination. The source specified an SQL statement and the destination specified the type (FlatFile) which had an embedded file format. This is an important point… the flat file destination requires a file connection and the connection requires a format.

This is not the same for the raw file output. Here the raw file destination requires both a connection and the file format. If we have a quick look at the dataflow snippet, you’ll see what I mean.

<Dataflow Name="Copy_Dim_Account">
  <Transformations>
	<OleDbSource Name="Src_Dim_Account" ConnectionName="Source" >
	  <DirectInput>Select AccountKey, AccountCodeAlternateKey, AccountDescription, AccountType, Operator from DimAccount;</DirectInput>
	</OleDbSource>
	<RawFileDestination Name="Dest_Dim_Account" RawFileFormatName="Output"  WriteOption="CreateAlways" >
	  <FileOutput ConnectionName ="Target" />
	</RawFileDestination>
  </Transformations>
</Dataflow> 

Now compare this to the flat file and notice that the file format is not required (for text).

<Dataflow Name="Copy_Dim_Account">
  <Transformations>
	<OleDbSource Name="Src_Dim_Account" ConnectionName="Source" >
	  <DirectInput>Select AccountKey, AccountCodeAlternateKey, AccountDescription, AccountType, Operator from DimAccount;</DirectInput>
	</OleDbSource>
	<FlatFileDestination Name="Dest_Dim_Account" ConnectionName="Target" Overwrite="true">
	</FlatFileDestination>
  </Transformations>
</Dataflow> 

 

Complete Code

For completeness, the full code to extract an table (or more precisely a SQL query) to a raw file is;

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
  <Connections>
    <OleDbConnection Name="Source" ConnectionString="Provider=SQLOLEDB;Data Source=localhost\SQL2012;Integrated Security=SSPI;Initial Catalog=AdventureWorksDW2012" />
    <FileConnection Name="Target" FilePath="F:\\Output.raw"  />
  </Connections>
  
  <FileFormats>
    <RawFileFormat Name="Output" >
      <Columns>
        <Column Name="AccountKey"   />
        <Column Name="AccountCodeAlternateKey"   />
        <Column Name="AccountDescription"   />
        <Column Name="AccountType"    />
        <Column Name="Operator"   />
      </Columns>
    </RawFileFormat>
  </FileFormats>

  <Packages>
    <Package Name="02_Copy_Table_To Raw_File" ConstraintMode="Parallel" ProtectionLevel="DontSaveSensitive" >
      <Tasks>
        <Dataflow Name="Copy_Dim_Account">
          <Transformations>
            <OleDbSource Name="Src_Dim_Account" ConnectionName="Source" >
              <DirectInput>Select AccountKey, AccountCodeAlternateKey, AccountDescription, AccountType, Operator from DimAccount;</DirectInput>
            </OleDbSource>
            <RawFileDestination Name="Dest_Dim_Account" RawFileFormatName="Output"  WriteOption="CreateAlways" >
              <FileOutput ConnectionName ="Target" />
            </RawFileDestination>
          </Transformations>
        </Dataflow>
      </Tasks>
    </Package>
  </Packages>
</Biml>

 

Conclusion

There is not a great deal of variation here between a flat file output and a text file so you may ask why include it? In a future post well look at a simple solution to output an entire database to raw files. Our goal in that post is to mimic the staging steps of data warehouse creation. However, in order to do that, we need the basics of file output.

 

Advertisements

2 thoughts on “BIML III – Extracting Database Tables to Raw Files

  1. Pingback: BIML IV– Extracting All Tables to Raw Files | Paul te Braak

  2. Pingback: BIML VII – Extracting Data Using ODBC Destinations | Paul te Braak

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s