Project REAL is a Microsoft led initiative to build an end-to-end demonstrable BI solution on SQL Server 2005 BI tools (i.e. SSIS, SSAS, SSRS). A paper has been released documenting the experiences gained whilst building the ETL portion of this project using SSIS.
Its 61 printed pages of text and screenshots which is hard going in anyone's book so I've summarised some of the more pertinent points (as I see them) here:
1) One of the source file types is in a proprietary format. The implementors have built a custom transformation that brings the parsing of these files directly into the SSIS pipeline and eliminates the storage overhead of parsing the files and then putting the parsed output somewhere. Previously, with DTS, this had to be done external to the ETL tool.
2) Manually rebuilding DTS packages is preferable to using the migration wizard provided with SSIS. Part of the reason for this is that the migration wizard doesn't take advantage of some of the new features of SSIS. Packages should, really, build upon the powerful functionality inherent in SSIS and the migration wizard does not do this.
3) The DTS data-pump (the fundamental task of DTS) does not migrate using the migration wizard.
4) Adding partitions to an Analysis Services cube can be done from within SSIS using XMLA combined with the new Analysis Services Execute DDL Task (excellent).
5) The SQL Server Destination can be up to 25% quicker than the OLE DB Destination. This is because it runs in process with the SQL Server engine and therefore doesn't have the overhead of a connection. [Note: Because of this the SQL Server Destination can only be used when the package is being run on the server as the destination.]
6) Remove unnecassary columns from the pipeline. I can verify based on my own testing that this does result in a throughput enhancement.
7) The EngineThreads property can be altered to improve performance. There is no formula for optimal performance...experimentation is the key.
8) Tasks can be executed in parallel. Again there is no formula to determine the optimal number of tasks to execute in parallel but a rule of thumb is that the number of parallel tasks should be equal to the number of processors. Start from this and experiment out...see what the optimal configuration is. Its laborious but it will be worth it in the long run.
Other than that it basically talked about about basic features such as looping, configurations, derived column transform, logging, precedence contraints, property expressions & checkpoints...most of which I have covered on this blog, be it in detail or not.
As a beginner's guide to SSIS its a very useful article but anybody that has been using SSIS in beta for a while isn't going to get too much out of it that they didn't know already. I would recommend using this article as a reference as you experiment with SSIS yourself. As ever, the best way of learning something is getting on and doing it for yourself.
-Jamie