For this nugget I want to talk about SSIS's powerful abilities to deal with partially structured or unstructured data files.
Sébastien Nunes has posted on the SSIS forum asking how he can load a data file that looks like this (column headings in first row):
Col 1;Col 2;Col 3;Col 4;Col 5
AAAA;BBB;CCC;;
AAA1;BBB1;CCC1;;
AAA2;BBB2;CCC2
AAA3;BBB3;CCC3
into the SSIS pipeline. As you can see there five columns defined in the header row (delimited by a semi-colon) but not all rows contain those five columns.
We can argue all day about whether this is well-structured, partially-structured or unstructured but the main point here is that if you point the SSISFlat File Connection Manager at this file then it will not be able to parse out the columns if you define the delimiter as ";". The screenshots here illustrates how SSIS interpretes it:


This is because SSIS looks for all the expected column delimiters before it looks for the end of line character (CRLF). If it doesn't see them then it will not terminate the row.
So, how to deal with this instead? Well, its quite simple really. We just need to import the entire contents of the file as a single column and then parse out the required columns one the data is in the pipeline.
In this case I used a script component to do the parsing - in most cases the Derived Column component would be sufficient. Here is that code:
Imports System
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
Inherits UserComponent
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
Row.Column1 = Tokenise(Row.Column0, ";", 1)
Row.Column2 = Tokenise(Row.Column0, ";", 2)
Row.Column3 = Tokenise(Row.Column0, ";", 3)
Row.Column4 = Tokenise(Row.Column0, ";", 4)
Row.Column5 = Tokenise(Row.Column0, ";", 5)
End Sub
'Private function that parses out the columns of the whole row
Private Function Tokenise(ByVal input As String, ByVal delimiter As String, ByVal token As Integer) As String
Dim tokenArray As String()
tokenArray = input.Split(";".ToCharArray) 'Split the string by the delimiter
If tokenArray.Length < token Then 'Protect against a request for a token that doesn't exist
Return ""
Else
Return tokenArray(token - 1)
End If
End Function
End Class
Not that difficult at all I think you will agree. If you want to have a go at this yourself then download the demo package from
here. There is one pre-requisite to running it. You have to put the source file (which is provided in the zip) into "c:\temp".
I hope this demonstrates just how easy it is to handle unstructured data within the SSIS pipeline. Sure, there's a bit of hand-coding to be done but it really isn't that difficult and I think its legitimate to expect that a bit of hand-cracking is necassary when the format is unstructured as this is.
Any questions, let me know!
-Jamie