In January of this year I posted instructions and some code to show how one could use the Recordset destination in conjunction with a source script component to negate the need for a raw file to pass data between data-flows in the same package.
I didn't understand why anyone would want to do this given that raw files are much easier to use and mean that you aren't required to write code although it did occur to me that perhaps people didn't like dropping data to disk rather than having it in memory.
That excuse doesn't really cut any ice with me though so I figured that the only possible reason to use a recordset instead of a raw file would be for reasons of performance. I didn't know which would be the quicker solution so I set out to find out.
I built a package that did the following:
- Used a script component to load an arbitrary number of rows into a pipeline. Each row was 108 bytes wide. I copied this component into another data-flow.
- In one of those data-flows, pushed the data into a raw file
- In the other data-flow, push the data into a recordset
- Introduce two new data-flows. One which consumes data from the raw file, one from the in-memory recordset. Both pushed the data into a rowcount component
- Had two sequence containers. One to group the two data-flows that populate and extract from a recordset and another to group the two data-flows that populate and extract from a raw file.
Here's a screenshot of the package:

I executed each sequence seperately by disabling the other one. I ran the package from the command-line so as to get as accurate results as possible.
Here are those results
| Number of rows |
Raw file execution time (X) |
Recordset execution time (Y) |
X / Y |
| 100000 |
2.534 |
9.794 |
25.87% |
| 200000 |
4.446 |
19.548 |
22.74% |
| 300000 |
6.479 |
30.694 |
21.11% |
| 400000 |
8.582 |
38.065 |
22.55% |
| 500000 |
10.375 |
47.579 |
21.81% |
| 600000 |
12.377 |
56.912 |
21.75% |
| 700000 |
14.24 |
66.346 |
21.46% |
| 800000 |
16.374 |
76.4 |
21.43% |
| 900000 |
18.156 |
86.344 |
21.03% |
| 1000000 |
20.189 |
98.432 |
20.51% |
and here they are tabulated:

That's fairly conclusive proof, I think you'll agree, that using a raw file is way way quicker than using a recordset. The comparison in the table above shows that using a recordset takes approximately 5 times as long to execute compared to using a raw file.
What is also interesting to note is that there are significant spikes in memory usage when using a recordset destination. When using a raw file there is hardly a ripple.
So there you have it. There is no discernible reason (that I can think of) to EVER consider using a recordset in preference to a raw file. So, don't use them. Simple as that.
If you want to try this out for yourself, download the package from here.
-Jamie