Greg Foder of Parature emailed me recently with a method for processing different data sources using just one data-flow. In his very specific scenario there were common columns across all of the source tables that he wanted to process however they were all still different and as you may know, data-flows can only be shared on sources with identical metadata.
Greg's solution was to build a stored procedure which, when run, would build a view UNIONing some source tables. The view would contain all common columns plus all additional columns from all of the tables. Where a column is not part of a source table a NULL value would be used instead. That's my vague interpretation - here's how Greg described it:
===============================================================
SSIS really is quite an
amazing tool. One thing you have pointed out several times, however, is
its inability to reuse Data Flows -- this is a roadblock that I soon too
found myself banging my head against in my own projects. I think I have
come up with a simple solution, which may have been obvious but I have
not seen circulated throughout the blogosphere.
Basically the problem with being unable to reuse data flows is that you
are stuck binding them to a specific set of metadata. This is the case
even though you can supply a dynamic table name -- if the metadata
doesn't match up exactly over iterations, it breaks. For example, I have
one data flow that I would like to use on a Products table with a
certain set of columns, and reuse the logic in another table, say
Customers, with a different set of columns. In my scenario, the reason
the data flow itself is useful is because the two tables I am working on
(several dozen in fact) have common operations that are due to them
having common columns, though they also have columns which differentiate
them from one another. (For example, both Products and Customers have a
Status.)
What I did to solve this was wrote a custom C# stored procedure that
takes three parameters. First, the name of a View to generate(which I
call an Interface View.) Then, a comma delimeted list of tables.
Finally, a table to bind to the view. What the procedure does is uses
the metadata in the database to construct a view that has a union of all
the columns of the specified tables, and then binds the specific table
to that view. If the specific table being binded does not have a certain
column, then a NULL value is supplied casted to the appropriate data
type so the metadata in SSIS is consistent for all the tables being
binded. In a sense, this abstracts all the tables into a common (albeit
sometimes ugly) interface.
Back in SSIS land, I set up a Control Flow that has a foreach loop on an
XML file or whatever, that makes a call to my stored procedure on each
iteration binding the Interface View to a different table (passing in a
comma delimited collection of all the tables I wish to run on.) My Data
Flow is then bound to the Interface View; the stored procedure ensures
that the metadata for this view is consistent across all calls to the
data flow. This seems to work pretty well, and I am able to write one
data flow for all my dimensions tables, for example. Since I have access
to all the columns, I can even do branching and such if certain columns
have values or are NULL to isolate cases for specific subsets of tables.
===============================================================
Its a very specific solution for a very specific problem. This only works because there is much commonality between the tables. Greg hasn't just theorised this, he has built it, and it works pretty elegantly by the sounds of it!
Its interesting to see how people out in the community are changing the basic functionality of SSIS to make it fit their requirements. That's encouraging for two reasons. One - The off-the-shelf product can be manipulated to suit different scenarios and ... Two - People like the product so much that they are willing to spend time making it work for them!
Feel free to comment on this or if you want to contact Greg directly let me know and I can pass on your comments.
-Jamie