Sunday, February 08, 2015

Invalid character value for cast specification ssis

Mixed data types in Excel column

My column in Excel contains mixed datatypes (strings and numbers). The string values are shown as NULL. How can I get all values?

The Excel driver reads the first 8 rows to determine the datatype. If the majority of the rows are numbers it will choose a numeric datatype instead of a string data type and visa versa.

So if your Excelsheet looks like this. The first and second column get DT_R8 and the third column gets DT_WSTR.
The standard result in SSIS

You can change the default of checking 8 rows (1 to16) by changing the windows registry, but that doesn't change the majority rule! If you want to change that number... Start the Windows Registry-editor and search for "TypeGuessRows". The place of that key depends on the version of Windows and Office. Examples:

For the ACE provider (Excel 2007, 2010 & 2013):
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\12.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows
HKEY_LOCAL_MACHINE\Software\Microsoft\Office\15.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows

Note: Setting the TypeGuessRows value to zero (0) will result in checking the first 16384 rows in excel. 
Change at your own risk!

Import Mode
If you want to get the data of all rows, regardless the datatypes in Excel, you should add the string ;IMEX=1 to the Excel Connection String. Changing to Import Mode (0 is Export mode, 1 is Import mode and 2 is Linked mode) will result in getting everything (including numbers) as a string. Goto the properties of the Excel Connection manager and change the ConnectionString property:
Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\MyExcelFile1.xls;Extended Properties="EXCEL 8.0;HDR=YES;IMEX=1";

Provider=Microsoft.ACE.OLEDB.12.0;Data Source=D:\MyExcelFile.xlsx;Extended Properties="Excel 12.0 XML;HDR=YES;IMEX=1";

The preview of the same Excelsheet looks very different now.
The result in SSIS with IMEX=1

There is one big but.... if the first 8 rows contain a number the datatype will be DT_R8 again (despite of IMEX=1). This could be a problem if the ninth row contains a string.


No comments:

Post a Comment