I was recently contacted by Octoni Simbolon who asked me:
"[In a data flow] can we assign each row to a
variable before load to destination oledb, so we can manipulate each row
and field ?"
Strictly speaking I suppose you can do this by assigning to a variable within a script component but I was pretty sure that Octoni was talking about SSIS variables so I replied in the negative and asked why he would want to do this. He replied with:
I have a database table source, let say Table A.
Table A
------------------------------
UserID |MachineTime
------------------------------
1 | 04/05/06 08:00 AM
2 | 04/05/06 08:30 AM
3 | 04/05/06 07:40 AM
1 | 04/05/06 17:00 PM
2 | 04/05/06 19:45 AM
Then I want to upload all the data into destination table B
------------------------------------------------------------------------
UserID |TheDate | Attendance Time |Leaving Time
------------------------------------------------------------------------
1 | 04/05/06 | 08:00 AM | 17:00 PM
2 | 04/05/06 | 08:30 AM | 19:45 PM
3 | 04/05/06 | 07:40 AM |
So my logic is before I load the data into new table, I want to hold the value of TheDate and UserID so I can decide whether MachineTime to be Attendance Time or Leaving Time. The thing is I try to use ForEach loop container but it seems has the limitation of the number of rows can be processed.
So please advice what do you think the best practice to solve this case.
That seemed like a simple aggregation problem to me -take the min and the max timestamp values per user- so I set about building a package to demonstrate to Octoni how to achieve this. Rather than just send the solution straight to Octoni I thought I would share it up here too.
First of all I used a script source component to provide the same data that Octoni specified into the data-flow. The timestamp values I provided as type DT_DBTIMESTAMP.
I said this was an aggregation problem didn't I? And so it is - so we need an AGGREGATE component in our data-flow as well. We use the AGGREGATE to derive the minimum MachineTime per user, the maximum MachineTime per user and the number of records per user.

In a DERIVED COLUMN component we can use these values to derive the values that we require for the destination.
1) Getting the date was pretty easy. Its simply a case of casting a DT_DBTIMESTAMP as a DT_DBDATE. This will drop the time precision from the value.
2) Likewise to get the AttendanceTime we simply have to cast the minimum MachineTime as a DT_DBTIME.
3) Getting the LeavingTime was slightly more difficult because we have to allow for the case where there is only one record per UserID - i.e. where the User has not 'left' yet. That is why earlier in the AGGREGATE component I counted the number of records per user. So then, by using the SSIS Expression Language Conditional Operator, we check this value, passing out a NULL value if NumberOfRecordsPerUserID == 1.

Finally, for demo purposes we just pass the data into a ROWCOUNT component to terminate the flow.
Here's the completed data-flow:

All pretty easy really. Remember, its possible to do many many things in the data-flow and aggregating is one of the most obvious. And one of the easiest.
You'll find that most things you want to do can be achieved using the out-of-the-box components - the very specific functionality within each makes it easily possible to build a data-flow that is significantly greater than the sum of its parts.
You can download the demo package from here.
-Jamie