Schema Definitions from an ODBC Text Source

Recently, I’ve worked on disjointed migration project where things just didn’t work the way you expect.  For example, you usually like to access data direct from its source (direct from a relational database) but in this situation, this was just not allowed and we were given a load of text files (hundreds) to load.  Perhaps not so bad, you may say however, unfortunately, the documentation around those files was incomplete and incorrect.  It seems that we spent for too much time confirming valid formats and playing too and fro with vendors – Its not really value adding and lets face it, its not really a good use of time to transfer and manually configure text import.  So what could we do?  All files were deposited in a directory, so why not use the Schema collections from an ODBC source?

The idea is to setup an ODBC source that points to the directory with files in it, create a connection to that source and then use the meta data of that connection to iterate over each table in the source and get the column information from that table.  Note that for a text driver, each file represents a table and that’s why I refer to tables above.

1. Setting Up the Driver

I think this is relatively straight forward so I’ll just add the key points with a word of warning.  There is no x64 bit driver installed by default for windows, so you may want to download the Access one from here if you need to.  Access the ODBC drivers from your settings (just type ODBC in the search bar and then choose to setup a connection or access ODBC sources), then just add a new source and choose the text driver.

image

When you click Finish you’ll be presented with the window below where you can specify the directory you want to use and give the source a name (see below).  There’s no need to look at options, but if you want to, you can specify specific extensions but for now lets assume that everything in the folder is for import.

image

2. Data Access

As a side note, after you’ve set up the driver, you can now import the data as a table into Excel.  The menu path is (Data Tab) Get External Data > From Other Sources > Microsoft Query.  I know you could import the file directly (ie Get External Data > From Text) however, you want to use the text driver and text your ODBC source.  You’ll notice that this source is available as a database (see below), and each file in the directory is presented as a table.  The source that I previously created is shown below and you can see the following list of tables.

image image

3. Using the ODBC Shema Object

As a general rule, when we create a database connection in code, the connection object is accompanied by a schema and we can use this to interrogate  the source.  As you can see from the snippet, all we do is acquire the “tables” schema (ie list the tables), then iterate over each table to get the columns in that table.  The sample output this shown below and, as you can see, it defines the column names and types.

            OdbcConnection _con = new OdbcConnection("DSN=biml_text_files");
_con.Open();

DataTable _schema = _con.GetSchema();
DataTable _restrictions = _con.GetSchema("Restrictions");
DataTable _tables = _con.GetSchema("Tables");

foreach (DataRow _tablerow in _tables.Rows)
{
Console.WriteLine("{0}", _tablerow["TABLE_NAME"]);

string[] _table_restrictions = { null, null, _tablerow["TABLE_NAME"].ToString() };
DataTable _columns = _con.GetSchema("Columns", _table_restrictions);
foreach (DataRow _colrow in _columns.Rows)
{
Console.WriteLine("\t{0},{1},{2}", _colrow["COLUMN_NAME"], _colrow["TYPE_NAME"], _colrow["COLUMN_SIZE"]);

}

}


_con.Close();

image

Actually, you don’t even need to go to this trouble.  You could just get the columns schema without the restrictions which will give you a complete list of all columns for every table.  However, in our implementation, the data was sent back to a database so others could use it.

Conclusion

Working with the Schema Information objects allows us derive information about the source.  Its conceptually similar to the INFORMATION_SCHEMA tables in most databases however, when we apply this to text files, we can use that meta-data to build our ETL.

Advertisements