blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Conchango are busy and need talented consultants in and around London. Interested? Email me or send me a message

SSIS: Rank Transform

With the advent of SQL Server 2005, Microsoft provided three new T-SQL windowing functions that will turn out to be highly beneficial to T-SQL developers. They are:

It occurred to me that the same functionality would be equally beneficial in the SSIS pipeline and to that end Conchango are now able to provide a new component for SSIS, the Rank Transform.

This component provides the same functionality as the three T-SQL functions, all it requires is a sorted input.

Here is a demo of it at work.

Firstly, I want to provide some data to the pipeline and I accomplished that using a trusty script component. Here is the code:

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper

Public Class ScriptMain
  Inherits UserComponent

  Public Overrides Sub CreateNewOutputRows()
    Dim i As Integer
    i = 0
    Do
      With Output0Buffer .AddRow()
        .groupcol = Convert.ToInt32(Floor(i / 10))
        .strcol = Right("00000000000000000000" + i.ToString, 20)
 

        'Add 2 identical columns so as to show the difference between RANK and DENSE_RANK .AddRow()
        .groupcol = Convert.ToInt32(Floor(i / 10))
        .strcol = Right("00000000000000000000" + i.ToString, 20) End With
      End With
      i += 2
    Loop While i < 10

  End Sub
End Class

This component creates an output with two columns:

  • group_col - A column to partition by
  • str_col - A column to sort by and apply our ROW_NUMBER, RANK, DENSE_RANK functions

Second, let's create a data-flow that uses that data. The Rank transform requires a sorted input so I've got a SORT component in there (in this case the data is already sorted when it comes out of the script component but I've put the SORT component in there both for clarity and to set IsSorted=TRUE on the data path). Here's how I've configured the Rank transform:

 

You'll notice that I've checked all three of the functions. That means that we'll get three new output columns continaing appropriate numeric values.

And here's the data-flow:

The UNION ALL component is used to terminate the flow.

As you can see from the screenshot immediately above I've got a data viewer on there. Let's take a look at the data viewer when we execute this flow:

 

Here is what you're interested in:

  • ROW_NUMBER Each row has a sequential row number. Notice how easily this could be used for surrogate key generation.
  • RANK changes when the value in str_col changes. Duplicates get an equal value and a gap in the sequence is created to make up for the reuse of values.
  • DENSE_RANK changes when the value in str_col changes. Duplicates get an equal value and no gap in the sequence is created

That's about it for a demo. Hopefully by now you are chomping at the bit to get this downloaded and try it for yourself. Click here to get the installer for the Rank Transform and the demo package that I have created for this blog post.

 

The Rank Transform was created in partnership with Konesans and I would like to especially extend my thanks Konesans' Darren Green who contributed heavily to this development.

-Jamie

UPDATE: Its worth noting that the 'ROW_NUMBER' functionality in this isn't quite the same as the ROW_NUMBER() function in T-SQL. Here the number will just increment forever ignoring the groups. In T-SQL, ROW_NUMBER() has a PARTITION BY clause which we haven't implmented here. This is a bug however for reasons of backward compatibility we don't currently intend to do something about it. If you want a new version that DOES replicate T-SQL ROW_NUMBER() then let me know and we'll consider it ;)

Published 12 September 2006 13:36 by jamie.thomson
Attachment(s): RankTransform.zip

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Sam2005 said:

Hi Jamie,

I found this at exactly the right time, and it's worked like a dream for generating sequence numbers for a data migration I'm working on.

Thanks very much!

Sam

October 23, 2006 16:38
 

Niranjan said:

great work..

Previously I struggled with script component for generating surrogate keys...now it will become easy

January 11, 2007 20:31
 

Starter said:

Hi,

I'm new to SSIS.

I could not find any Rank transform in the data flow transformations in SSIS. I'm using sql server 2005. When written in sql statements, the Rank() function is identified.

Can somebody let me know where this tranform is present in SSIS designer?I want to rank the data in SSIS.

Thanks

SSIS Learner

February 26, 2008 23:34

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems