Welcome to blogs.conchango.com Sign in | Join | Help

Welcome to blogs.conchango.com

SSIS Junkie

SSIS: Utilising a data-flow for different sources

Greg Foder of Parature emailed me recently with a method for processing different data sources using just one data-flow. In his very specific scenario there were common columns across all of the source tables that he wanted to process however they were all still different and as you may know, data-flows can only be shared on sources with identical metadata.

Greg's solution was to build a stored procedure which, when run, would build a view UNIONing some source tables. The view would contain all common columns plus all additional columns from all of the tables. Where a column is not part of a source table a NULL value would be used instead. That's my vague interpretation - here's how Greg described it:

===============================================================

SSIS really is quite an
amazing tool. One thing you have pointed out several times, however, is
its inability to reuse Data Flows -- this is a roadblock that I soon too
found myself banging my head against in my own projects. I think I have
come up with a simple solution, which may have been obvious but I have
not seen circulated throughout the blogosphere.

Basically the problem with being unable to reuse data flows is that you
are stuck binding them to a specific set of metadata. This is the case
even though you can supply a dynamic table name -- if the metadata
doesn't match up exactly over iterations, it breaks. For example, I have
one data flow that I would like to use on a Products table with a
certain set of columns, and reuse the logic in another table, say
Customers, with a different set of columns. In my scenario, the reason
the data flow itself is useful is because the two tables I am working on
(several dozen in fact) have common operations that are due to them
having common columns, though they also have columns which differentiate
them from one another. (For example, both Products and Customers have a
Status.)

What I did to solve this was wrote a custom C# stored procedure that
takes three parameters. First, the name of a View to generate(which I
call an Interface View.) Then, a comma delimeted list of tables.
Finally, a table to bind to the view. What the procedure does is uses
the metadata in the database to construct a view that has a union of all
the columns of the specified tables, and then binds the specific table
to that view. If the specific table being binded does not have a certain
column, then a NULL value is supplied casted to the appropriate data
type so the metadata in SSIS is consistent for all the tables being
binded. In a sense, this abstracts all the tables into a common (albeit
sometimes ugly) interface.

Back in SSIS land, I set up a Control Flow that has a foreach loop on an
XML file or whatever, that makes a call to my stored procedure on each
iteration binding the Interface View to a different table (passing in a
comma delimited collection of all the tables I wish to run on.) My Data
Flow is then bound to the Interface View; the stored procedure ensures
that the metadata for this view is consistent across all calls to the
data flow. This seems to work pretty well, and I am able to write one
data flow for all my dimensions tables, for example. Since I have access
to all the columns, I can even do branching and such if certain columns
have values or are NULL to isolate cases for specific subsets of tables.

===============================================================

Its a very specific solution for a very specific problem. This only works because there is much commonality between the tables. Greg hasn't just theorised this, he has built it, and it works pretty elegantly by the sounds of it!

Its interesting to see how people out in the community are changing the basic functionality of SSIS to make it fit their requirements. That's encouraging for two reasons. One - The off-the-shelf product can be manipulated to suit different scenarios and ... Two - People like the product so much that they are willing to spend time making it work for them!

Feel free to comment on this or if you want to contact Greg directly let me know and I can pass on your comments.

-Jamie

Published 15 September 2005 09:04 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Helmut Knappe said:

Hi Jamie,
sounds like this could be a viable solution for companies offering services to various customers, with variations in their data but the same intentions for a data warehouse. Would you agree?
I would be interested to learn more about Gregs idea and see, if this might be a good way to get rid of some batch files currently in use. Please ask Greg, if he would be interested to discuss further.
Thanks, Helmut
September 19, 2005 07:18
 

Greg Fodor said:

Hey Helmut,

Yes, it sounds like this technique would probably be of some use for you. I'll give you a little more detail on my specific use case. I have several classes in an OOP system that are persisted to various tables, but they all share a common base class with common properties. (For example, they all have a "Last Modified Time")

This technique allows me to write one common data flow for all 8-10 tables which are used to persist the data for these classes. The data flow itself only performs transformations on the commmon data (such as "Last Modified Time" -> Time Dimension Surrogate Keys) but in theory the data flow could contain custom transforms for the various tables themselves by looking for NULL values within certain columns.

The main piece of work behind this was constructing a C# stored procedure to build the Interface View -- which takes in a comma delimeted list of tables to derive the view for, the name of the view to create, and the table to 'bind' the view to. I'd be happy to send you code for this stored procedure (although it will probably need some additions for your own purposes.)

My Control Flow essentially iterates over a list of table names (pulled from an XML file) and calls my stored procedure, passing a list of all the tables to run on and then the name of the individual table being run. This will result in it creating a view on top of my table, while not causing any metadata breakage on the data flow. Some trickery is involved in dealing with missing column data, changes in varchar lengths, etc, but in its current form my stored procedure handles the cases sufficiently for my needs. One of the main drawbacks is that in most cases the views will not have writeback capability except on the columns common across all tables (which really is the main use case, anyway.)

Hope this helps.
September 19, 2005 14:26
 

jamie.thomson said:

Hi Helmut,
If you want Greg to contact you direct you'll have to contact me with your email address via the "Contact Me" link on my homepage: http://blogs.conchango.com/jamiethomson

-Jamie
September 19, 2005 15:37
 

Great blog said:

Very useful blog. Thank you.
February 23, 2006 00:11
 

yassir said:

i came up with an easier way, but for flat files having different formats, different layouts, and i created a package that creates dynamically a destination table in SQL, and i use bulk insert with all the transformations created also dynamically.

if you're interested in more details, just email me, and i'll be glad to respond, i just have to accomplish my SSIS package.

email: yassirusa@gmail.com

June 27, 2007 02:48
 

Varughese said:

Hi ,

i would like to get claryfy some doubts abt ssis , as i ma new to this tool

Could you please give me an e mail id so that i can contact

u can reach me at pmvarughese@gmail.com

March 4, 2008 17:02
 

Mukul said:

Hi

Not sure whether my question is directly relevant to this post or not.

I need to trasnfer data for 2 tables from one SQL Server to another using a DataFlow Task in SSIS.

Is there any way, I can create a single dataflow task with source and destination specified and then create 2 different transformations from the same source and destination servers, or do I need to create 2 dataflow tasks for the purpose.

January 20, 2009 18:26

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems