I have just been watching Joy Mundy's webcast "Loading a Kimball method data warehouse using SQL Server Integration Services". I highly recommend it - Joy has done great job in outlining the basic skeleton for loading a data warehouse and covers some valuable concepts such as:
- Using a parent-child package structure (or master-child as she refers to it)
- Parent package configurations
- The value of proper Audit techniques (something of which I'm a huge proponent)
- The usefulness of variables especially when combined with expressions
- Using ROWCOUNT component
- Surrogate key lookup failures
- The use of error flows
If I think about it there hasn't been, up until this, a webcast that covers the end-to-end build of a data warehouse solution like Joy's does. For that reason I hope any newcomer to SSIS goes and watches it.
There were a few remarks made by Joy that I'd just like to elaborate on.
1. SQL Server Destination
Joy stated that the SQL Server Destination adapter should never be used because it tries to commit all rows in a single transaction whereas the OLE DB Destination does not. That's a valuable point to make however I would warn against a carte blanche approach to your choice of destination adapter - the SQL Server Destination adapter has its place and in a heads-down race between the two adapters, SQL Server Destination would win every time. My advice would be to test the two approaches yourself and see what works best for you.
2. Communication between a child package and a parent package
Joy states that it is not possible to pass information from a child package back up to the package that calls it. Instead, Joy describes a useful technique for storing information such as this in an external table that can be accessed by both packages.
Unfortunately Joy's assertion that the child package cannot pass data back to the parent package is simply not true. I have previously briefly described a technique for doing exactly this right here. To put it more succinctly, a script task can "see" any variables scoped to ancenstral containers in a parent package and therefore can write values to them (and of course read from them as well).
It is important that people understand this because it can be a very very useful tecunique when building SSIS solutions. At Conchango we use it in two places:
- To pass values written by the ROWCOUNT component back up to a custom logging mechanism in the parent package.
- To concatenate the name of the child package to a variable called PackageStack in the ultimate ancestral package - again for logging purposes.
3. Parent Package Configuration
Joy ambiguously suggests that to use a parent package configuration you have to have identically named variables in your parent and child packages. She didn't say that explicitly but anyone unfamilar with SSIS may wrongly conclude that that is what she is alluding to. I just want to clarify that this is not the case, parent package configurations don't even need ANY variables to be in the child package, let alone identically named ones.
4. Constructing dynamic SQL
Joy suggests that a script task can be used to construct a SQL statement dynamically by saving it to a variable. That is true but I would argue for the case of a different method - use an expression. To do this, set property EvaluateAsExpression=TRUE and build your dynamic SQL statement using in the Expression property.
Using an expression has the advantage of less executables (i.e. tasks) appearing in your package. On the other hand, it does raise a slight risk of SQL Injection. Its your choice which method to use.
Also bear in mind that the current (in SSIS2005) maximum length of an expression is 4000 characters so if your expression exceeds that you will have to use Joy's suggested method.
-Jamie