Schema Definitions from an ODBC Text Source


Recently, I’ve worked on disjointed migration project where things just didn’t work the way you expect.  For example, you usually like to access data direct from its source (direct from a relational database) but in this situation, this was just not allowed and we were given a load of text files (hundreds) to load.  Perhaps not so bad, you may say however, unfortunately, the documentation around those files was incomplete and incorrect.  It seems that we spent for too much time confirming valid formats and playing too and fro with vendors – Its not really value adding and lets face it, its not really a good use of time to transfer and manually configure text import.  So what could we do?  All files were deposited in a directory, so why not use the Schema collections from an ODBC source?

The idea is to setup an ODBC source that points to the directory with files in it, create a connection to that source and then use the meta data of that connection to iterate over each table in the source and get the column information from that table.  Note that for a text driver, each file represents a table and that’s why I refer to tables above.

1. Setting Up the Driver

I think this is relatively straight forward so I’ll just add the key points with a word of warning.  There is no x64 bit driver installed by default for windows, so you may want to download the Access one from here if you need to.  Access the ODBC drivers from your settings (just type ODBC in the search bar and then choose to setup a connection or access ODBC sources), then just add a new source and choose the text driver.

image

When you click Finish you’ll be presented with the window below where you can specify the directory you want to use and give the source a name (see below).  There’s no need to look at options, but if you want to, you can specify specific extensions but for now lets assume that everything in the folder is for import.

image

2. Data Access

As a side note, after you’ve set up the driver, you can now import the data as a table into Excel.  The menu path is (Data Tab) Get External Data > From Other Sources > Microsoft Query.  I know you could import the file directly (ie Get External Data > From Text) however, you want to use the text driver and text your ODBC source.  You’ll notice that this source is available as a database (see below), and each file in the directory is presented as a table.  The source that I previously created is shown below and you can see the following list of tables.

image image

3. Using the ODBC Shema Object

As a general rule, when we create a database connection in code, the connection object is accompanied by a schema and we can use this to interrogate  the source.  As you can see from the snippet, all we do is acquire the “tables” schema (ie list the tables), then iterate over each table to get the columns in that table.  The sample output this shown below and, as you can see, it defines the column names and types.

            OdbcConnection _con = new OdbcConnection("DSN=biml_text_files");
_con.Open();

DataTable _schema = _con.GetSchema();
DataTable _restrictions = _con.GetSchema("Restrictions");
DataTable _tables = _con.GetSchema("Tables");

foreach (DataRow _tablerow in _tables.Rows)
{
Console.WriteLine("{0}", _tablerow["TABLE_NAME"]);

string[] _table_restrictions = { null, null, _tablerow["TABLE_NAME"].ToString() };
DataTable _columns = _con.GetSchema("Columns", _table_restrictions);
foreach (DataRow _colrow in _columns.Rows)
{
Console.WriteLine("\t{0},{1},{2}", _colrow["COLUMN_NAME"], _colrow["TYPE_NAME"], _colrow["COLUMN_SIZE"]);

}

}


_con.Close();

image

Actually, you don’t even need to go to this trouble.  You could just get the columns schema without the restrictions which will give you a complete list of all columns for every table.  However, in our implementation, the data was sent back to a database so others could use it.

Conclusion

Working with the Schema Information objects allows us derive information about the source.  Its conceptually similar to the INFORMATION_SCHEMA tables in most databases however, when we apply this to text files, we can use that meta-data to build our ETL.

Advertisements

2 thoughts on “Schema Definitions from an ODBC Text Source

  1. What perfect timing! I need to do this very thing today and your blog popped up in my RSS reader. Thank you for posting this!

    • Great to hear:) Actually there is a little bit of redundant code in that but I’m sure you can pick it out.
      I’ll have to get onto the next posts about implementing it in BIML, bulk inserts and bcp.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s