BISM : Importing Text Files into Power Pivot


A recent post has shown an issue with the import of some text files into powerpivot. It would appear that, for some fields the default import for text files assumes integer data types regardless of the underlying type. Because the import utility cannot be configured to specify the data type for fields, the data in columns are dropped from the import.

Text files are imported into via the [From Text] button of the ‘Get External Data’ section of the powerpivot ribbon. The import options are very limited and only specify the file name and generic file properties (delimiter and first row field headings). The advanced option only specifies text encoding.

 

This works where the fields to be imported are numeric types (as above) or capable of numeric conversion but not when the field contains a character type. For example; the following pictures show the outcome of text imports over a non-numeric field. Here, the first record (NA) is not imported and the other records are converted to numeric data.

 

After the import, specifying and reimporting the field as a text data type converts the field but does not import solve the dropping of non-numeric data.

To import the file with the expected type (and all data), we need to provide a data file with quoted data. For example, the following file will import correctly.

 

 

Advertisements

8 thoughts on “BISM : Importing Text Files into Power Pivot

  1. ok, good feedback. I had a similar error message myself too. But how do you add a double quote in a text file, each time at the start of the row, just before the next comma and after the next comma?

  2. Personally, I don’t think it is a good situation to be in and my best preference would be to change the extract. Of course, you could import the data to excel, alter and export or (for large files) run some vba over the file.

    Additionally, you could append a leading space to every row in the set. However, we are really in the same boat aren’t we?

  3. Have we come up with any “real solution” for this issue?
    I have text data files (*.csv) with 48 fields in them, some have this “mixed-type” data. But, with millions of records, It’s not really practical to “double-quote” around mixed-type fields. And, I’m not really even sure which “fields” those are?! What if the mixed-type instances occur say after record 2,367,123? How am I even going to know?

    All I know is that I’m seeing a lot of blanks inside of PowerPivot. I thought about trying to run some kind of script to change all of the instances of ,, (double-commas) to some string or set of crazy numbers, but then all I do is proliferate the problem… creating mixed-type fields out of what could have been all-numeric, or all-text, but truly missing data.

    This just gets uglier the more I think about it.
    I see it really as a “bug” in PowerPivot that needs to be addressed by Microsoft so that we have the tools to fix the issue during importing.

    • Hi Chris,

      Another way around this is to specify the structure of the file in a schema.ini file. This is just a text file which specifies the format (columns names, data types) for any file that is imported from a given folder (the schema.ini resides in the same folder as a text file).

      Unfortunately, this can only be applied when the file is initially imported.

      • Thanks, Paul. Your comment was really helpful, and I’m pursuing that.
        I found a great article/blogpost at http://brentpearce.wordpress.com/2013/12/04/powerpivot-csvs-the-sql-way/ that provides a fairly decent step-by-step approach (there’s a few items that were confusing/need changing, but am muddling my way through it.).

        The biggest thing was that using any TYPE-designation besides CHAR wouldn’t import data. So my approach has been to make EVERY field “CHAR width 255” in my schema.ini file, and then manipulate the type inside of PowerPivot. Awkward, but effective.

        Somehow we have to get the word out about this.

        Chris

      • Thanks Chris,

        Personally, I have not had any issues that cannot be overcome using the schema.ini and there has been no need to convert data types in Power Pivot. The use of Power query (Brent’s base) is a good starting point but (to me anyway) it’s really double handling the data (having said that, one would expect that such a feature would be included in the import wizard though).

        Now for a bit of shameless self-promotion. This topic is covered in depth (with samples of course) in my tabular model cook book http://www.packtpub.com/sql-server-and-power-pivot-tabular-modeling-cookbook/book

      • Well, then count me as a new customer of your book!

        I also was given some .csv files that had been saved in (I think) UNICODE, vs. regular ASCII. PowerPivot was just returning errors during import, but re-opening it either in Excel, or a UNICODE-capable text editor, then re-saving the file as Comma-delimited (*.txt,*.csv) or ASCII text from the editor, made is so that it would import.

        Lots of strange goings-on in dealing with these files.

        Chris

      • I find Excel has some pretty hand features for managing data – but it does the conversions implicitly. When it works its great but when it doesn’t it is a real pain. Take for example, the stripping of leading 0’s when it thinks the column is a number!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s