Scene Setting
Its a common scenario when doing data integration that one has to process datasets containing redundant data. In many cases this is when data is received in flat files and all data is put into a single file for ease of transport. Perhaps this is better explained by way of illustration; below is a fragment of a file containing order details for a retail establishment:
| SalesOrderNumber | OrderDate | Product | Quantity |
| SO49169 | 01/02/2003 | Road-250 Black, 44 | 4 |
| SO49169 | 01/02/2003 | Road-650 Black, 60 | 1 |
| SO49169 | 01/02/2003 | Road-550-W Yellow, 38 | 2 |
| SO49170 | 01/02/2003 | Mountain-200 Black, 46 | 1 |
| SO49170 | 01/02/2003 | Men's Bib-Shorts, L | 2 |
Each order is uniquely identified by SalesOrderNumber and has OrderDate as an attribute. In the 5 rows above there are only 2 different orders, SalesOrderNumber SO49169 contains orders for three different products and SalesOrderNumber SO49170 contains orders for two products. Notice though that there is redundant data in this dataset; any data modeler worth their salt would much rather normalise the dataset so that it looked something like this:
| OrderID | SalesOrderNumber | OrderDate |
| 1 | SO49169 | 01/02/2003 |
| 2 | SO49170 | 01/02/2003 |
| OrderID | Product | Quantity |
| 1 | Road-250 Black, 44 | 4 |
| 1 | Road-650 Black, 60 | 1 |
| 1 | Road-550-W Yellow, 38 | 2 |
| 2 | Mountain-200 Black, 46 | 1 |
| 2 | Men's Bib-Shorts, L | 2 |
Notice that I have introduced a meaningless identifier [OrderID] that links the two normalised datasets together.
Introducing the normaliser component
Are you OK with that theory? OK good. So allow me now to introduce a new component that I've built called the Normaliser Component that does exactly the same as what I did to the dataset above, except that it does it within the SSIS dataflow. You can download the component from here:
Very simply this transformation component takes a single input and splits it into two outputs. The first output (called the "master" output) is a distinct list of all the previously redundant data (SalesOrderNumber & OrderDate in the example above) and the second (called the "detail" output) is all the non-redundant data (Product & Quantity in the example above). Both outputs will contain a column of type DT_I4 called [LinkID] that links together data in the two outputs.
Here is a screenshot of a dataflow that I have built that uses the Normaliser Component to split the exact input data seen above into the same two outputs as shown above. Note that I have used a SORT component to sort the data by SalesOrderNumber & OrderDate.

This component is very easy to use because the only configuration it requires is for you to hook up a single input and it will automatically derive its two outputs. The only requirement is that the input is sorted on the redundant data columns; if the input is not sorted then the component will fail to validate at design-time.
[N.B. It is possible to "trick" the Normaliser Component into thinking the input is sorted by manually setting the IsSorted property of the input and the SortKeyPosition property of at least one of the columns in the input. If you do this then the Normaliser Component will still succeed but more fool you because you will not get the results that you require]
I should point out that there is an obscure bug in SSIS that might make the component appear as if it is not valid when in fact it is. Read about that bug here. Don't worry about it though, the workaround is simply to detach and reattach the input which takes all of about 4 seconds.
Demo
I've recorded a demo so that you can see this working for yourself. I deliberately included installation of the component so you can see how to do that (its very easy). I have embedded the video below but if I were you I'd go and download it from here:
because its a much better viewing experience than in a small embedded player. Up to you of course.
Also, if you want to get hold of the package that I built in the demo then you can get hold of that as well:
Source Code
If you want to see how this is built then you'll be glad to know that I've uploaded the source code to Codeplex. You can get everything you need here: http://www.codeplex.com/NormaliserComponent. If you want to know how to build asynchronous components for SSIS then this should be a big help to you.
The Codeplex project includes the source code for the installer as well which I built using WiX.
Wrap-up
That's about it. I hope the Normaliser Component is useful to you. If you use it then please please please give me feedback. Is it good or bad? Can it be improved? Anything.
Thanks
Jamie
Update
Turns out there's a bug in the version linked to in this blog entry. Get version 1.1 here: http://blogs.conchango.com/jamiethomson/archive/2007/09/25/SSIS_3A00_-Normaliser-Component-bug-fix.aspx