blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Conchango are busy and need talented consultants in and around London. Interested? Email me or send me a message

SSIS: OnPipelineRowsSent event

An interesting discussion today on the SSIS Forum made me realise that perhaps not all SSIS developers are aware of the OnPipelineRowsSent event in SSIS and how valuable it can be, certainly when debugging data-flows. This isn't all that surprising because Books Online is very sparse in this area. Hence I thought it would make a useful blog topic.

OnPipelineRowsSent is a type of event in SSIS which means the information that it provides can be logged by whatever log provider you happen to be using. In order to understand the information that OnPipelineRowsSent provides it is critical to understand the buffer architecture of the SSIS data-flow.

To explain that VERY simply, a buffer is an area of memory that contains rows of data as they pass through the pipeline. Each data-path in the data-flow will contain one of more buffers and each of those buffers (except for the last one) will contain the same number of rows. If you want to understand more about the buffer architecture of the data-flow then Kirk Haselden's book has a chapter devoted wholly to it.

Here is what Books Online DOES say about OnPipelineRowsSent:

OnPipelineRowsSent

Reports the number of rows provided to a component input by a call to the ProcessInput method. The log entry includes the component name.

In slightly plainer english that means that for every buffer of data that is consumed by a component, you get a log entry. So, if you have (e.g.) a Derived Column Component that receives 105000 rows and the size of the buffer is 10000 rows, you'll get 11 OnPipelineRowsSent log entries for that component (there will only be 5000 rows in the last buffer). Basically it enables you to know how many rows are passing through each data-path in the data-flow.

The available information is the same as you get for any eventhandler.  i.e.

  • event
  • computer
  • operator
  • source
  • sourceid
  • executionid
  • starttime
  • endtime
  • datacode
  • databytes
  • message

and here is a portion of a logfile containing those records:


OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1228 : Merge Join Output : 1225 : Union All 1 : 1226 : Union All Input 1 : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1228 : Merge Join Output : 1225 : Union All 1 : 1226 : Union All Input 1 : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 9936
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 1475 : Union All Output 1 : 1470 : Sort 3 : 1471 : Sort Input : 2085
OnPipelineRowsSent,CGOJATHOMSON,INT\Jamie.Thomson,Data Flow Task,{EB25C41A-EFE2-4902-9610-99C0309A3B59},{33B4926F-54A9-4E00-9B74-92720A2786E0},08/03/2007 20:49:56,08/03/2007 20:49:56,0,0x,Rows were provided to a data flow component as input. :  : 2121 : Sort Output : 2113 : Merge Join 2 : 2114 : Merge Join Left Input : 9936

The important stuff is what comes in the last field, the message so let's break that down. Here is an example message:

Rows were provided to a data flow component as input. :  : 1030 : OLE DB Source Output : 1025 : Sort 2 : 1026 : Sort Input : 9972

And here's what it is constituted of: 

  • Rows were provided to a data flow component as input. - That's consistent in each message, so to be honest they could have left it out
  • 1030 - ID of the data-path providing the buffer
  • OLE DB Source Output - Name of the data-path providing the buffer
  • 1025 - ID of the component receiving the buffer
  • Sort 2 - Name of the component receiving the buffer
  • 1026 - ID of the input receiving the buffer
  • Sort Input - Name of the input receiving the buffer
  • 9972 - Number of rows in the buffer

As you can see, this information will enable you to determine exactly how many rows are output from each component. Thus, if you are not receiving the expected number of rows at a destination this event will help you to deduce where you are losing them from.

You may not know this but you actually encounter OnPipelineRowsSent every time you run a data-flow within the SSIS Designer in BIDS. Notice how the designer shows the number of rows passing through the pipeline and through each component:

How does the designer know how many rows are passing through the data-flow? Simple! Its because the SSIS Designer consumes the OnPipelineRowsSent event that the executing package "throws" up and then presents that information visually as the numbers that you see flashing in front of your eyes as the data-flow executes.

I think that just about covers the simple stuff around OnPipelineRowsSent. If you have any questions then post them here as a comment.

-Jamie

 

Published 08 March 2007 20:44 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Greg Hyzak said:

So could one optionally use this to increment a variable to keep track of rows affected vs. using the Row Count data flow transformation?  It would seem like it could be much more efficient (using variable vs. using Row Count transformation).

Thanks,

Greg Hyzak

March 19, 2007 21:26
 

jamie.thomson said:

Greg,

In theory you could but I wouldn't recommend it. Firstly it would be difficult compared to thw simplicity of putting a rowcount transform in your flow, secondly it would create more executables in your package (because you'd have to do it in an eventhandler) therefore it would be slower, thirdly the rowcount component stored its value in a variable anyway - so what's the difference?

-Jamie

March 19, 2007 22:15
 

KanisRufus said:

I have a listener that inherits from DefaultEvents (C# project). So, I created my own event to relay all SSIS event messages to my logger. I would really like to capture OnPipleneRowsSent but I dont see it (to override). In C#, how do i get to it?

November 30, 2007 05:06
 

KanisRufus said:

Here is the code to override the OnCustomEvent event and output the OnPipelineRowsSent data. This is in a class that inherits DefaultEvents (a listener). I then use a delegate to pass a reference to an external method that I use to log the information.

Only problem that I have now is finding the final count. I'm getting tired and stupid so it is probably obvious.

public override void OnCustomEvent(TaskHost taskHost, string eventName, string eventText, ref object[] arguments, string subComponent, ref bool fireAgain)

   {

     if (eventName.Equals("OnPipelineRowsSent") && arguments.Length > 6)

     {

       Report("Rows Sent: " + arguments[7].ToString());

     }    

   }

November 30, 2007 06:28

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems