UPDATE: Darren has provided a synchronous example that does pretty much the same thing. As described here: http://blogs.conchango.com/jamiethomson/archive/2005/07/27/1877.aspx#1934.
The asynchronous solution would only output N rows whereas the synchronous solution would output as many rows as are input. For this reason it would be interesting to see if performance of the two solutions converged (and indeed perhaps the asynchronous solution could become become quicker) if the value of N was suitably large. Maybe I'll look into this later!!!
Someone on the SSIS forum asked recently how to do the equivalent of the T-SQL Top operator in a data-flow. That is, only work on the first N rows that come through the buffer and drop the rest of them.
Well as with most things in SSIS there's more than 1 way to go about this but probably the simplest way is to use an asynchronous script transform. I have put a together data-flow to demonstrate this. The data-flow has 2 columns:
- col1 - an integer
- col2 - a string
(although the metadata of the data-flow isn't really what's important here).
I wanted my script component to drop everything but the first N rows to come through the transform where the value of N is specified in a variable called TopN.
Here's the code I wrote in my script component that implements this:
|
Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain Inherits UserComponent
Dim i As Integer = 0 'Declare a variable to count the number of rows passing through
'Input0_ProcessInputRow is called once for each row in the input buffer Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer) If i < Me.Variables.TopN Then 'If the number of rows has not exceeded the specified amount With Output0Buffer 'Pass the values to the output buffer .AddRow() .col1 = Row.col1 .col2 = Row.col2 End With End If i += 1 'Increment the number of rows processed End Sub
End Class
|
And sure enough, only TopN rows were outputted from the component. All with just 9 lines of code - SSIS just makes it so easy for us.
Here's a link to the package I built to demo this: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20050727TopN.zip. It generates 100 rows of nonsensical data in the first instance so that we have something to carry out our Top N operation on.. You will need Konesans' Trash Destination Adapter installed to make it work.
Here's a screenshot of the data-flow along with the variable:

There's one important thing to note if you are building this. This is an asynchronous component so make sure that the SynchronousInputID of the component equals 0.
-Jamie