Welcome to blogs.conchango.com Sign in | Join | Help

Welcome to blogs.conchango.com

SSIS Junkie

SSIS Nugget: Select Top N in a data-flow

UPDATE: Darren has provided a synchronous example that does pretty much the same thing. As described here: http://blogs.conchango.com/jamiethomson/archive/2005/07/27/1877.aspx#1934.

The asynchronous solution would only output N rows whereas the synchronous solution would output as many rows as are input. For this reason it would be interesting to see if  performance of the two solutions converged (and indeed perhaps the asynchronous solution could become become quicker) if the value of N was suitably large. Maybe I'll look into this later!!!

Someone on the SSIS forum asked recently how to do the equivalent of the T-SQL Top operator in a data-flow. That is, only work on the first N rows that come through the buffer and drop the rest of them.

Well as with most things in SSIS there's more than 1 way to go about this but probably the simplest way is to use an asynchronous script transform. I have put a together data-flow to demonstrate this. The data-flow has 2 columns:

  • col1 - an integer
  • col2 - a string

(although the metadata of the data-flow isn't really what's important here).

I wanted my script component to drop everything but the first N rows to come through the transform where the value of N is specified in a variable called TopN.

Here's the code I wrote in my script component that implements this:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper


Public Class ScriptMain
    Inherits UserComponent

    Dim i As Integer = 0                    'Declare a variable to count the number of rows passing through

    'Input0_ProcessInputRow is called once for each row in the input buffer
    Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
        If i < Me.Variables.TopN Then           'If the number of rows has not exceeded the specified amount
            With Output0Buffer                              'Pass the values to the output buffer
                .AddRow()
                .col1 = Row.col1
                .col2 = Row.col2
            End With
        End If
        i += 1                                                       'Increment the number of rows processed
    End Sub

End Class

And sure enough, only TopN rows were outputted from the component. All with just 9 lines of code - SSIS just makes it so easy for us.

Here's a link to the package I built to demo this: http://blogs.conchango.com/Admin/ImageGallery/blogs.conchango.com/jamie.thomson/20050727TopN.zip. It generates 100 rows of nonsensical data in the first instance so that we have something to carry out our Top N operation on.. You will need Konesans' Trash Destination Adapter installed to make it work.

Here's a screenshot of the data-flow along with the variable:

20050727DataFlow.JPG

There's one important thing to note if you are building this. This is an asynchronous component so make sure that the SynchronousInputID of the component equals 0.

-Jamie

 

Published 27 July 2005 13:15 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Darren said:

For a synchronous solution, which I'd expect to be faster, try this post-

http://forums.microsoft.com/msdn/ShowPost.aspx?PageIndex=2&PostID=62400#66329

August 5, 2005 13:55
 

Misho said:

Very nice
September 2, 2005 05:53
 

SSIS Junkie : SSIS Nugget: Select Top N in a data-flow said:

March 9, 2007 20:14

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

Powered by Community Server (Personal Edition), by Telligent Systems