OK, a couple of days on and it seems my T-SQL poser created a little bit more interest than I expected - 7 comments and some follow-up blog posts testify to that (yes, 7 is alot for me). If you're browsing here first then go and read the previous post to understand the background here.
I promised I would post my solution and Saivendra's, and so I will. I'll also comment on some of the solutions that other people posted. Just before I do though, one thing I should have said in the previous post was that the order of the rows in the resultset was not important. I never stated that the rows had to be returned in a certain order and nor did the list of items in the comma-delimited list. However, judging by most of the responses people assumed this was part of the requirement - it wasn't. The only reason I mention this because I don't want an ORDER BY clause to interfere with execution duration when I later do a performance comparison of the various methods.
So, onward with sharing the various solutions to this problem...
#1 - A couple of CTEs
This was my solution:
WITH rownumcte1 AS ( SELECT ROW_NUMBER() OVER (PARTITION BY id ORDER BY id,name) AS rownum,id,name FROM t1 )
,rownumcte2 AS
(
SELECT rownum,id,name
FROM rownumcte1
WHERE rownum = 1
UNION ALL
SELECT r1.rownum,r1.id,r2.NAME + ',' + r1.name
FROM rownumcte1 r1
INNER JOIN rownumcte2 r2
ON r1.id = r2.id
AND r1.rownum = r2.rownum + 1
)
SELECT r2.id,r2.NAME
FROM (
SELECT id,MAX(rownum)maxrownum
FROM rownumcte1
GROUP BY id
) r1
INNER JOIN rownumcte2 r2
ON r1.id = r2.id
AND r1.maxrownum = r2.rownum
There's a couple of CTEs in there (one referencing the other, thus nested) plus the ROW_NUMBER() function. Both are new in SQL Server 2005 so this solution wouldn't have worked in SQL Server 2000. I won't go into the specifics of how it works -I'll let you do that for yourself if you're interested- but I'm hoping that this solution will scale pretty well given that it is wholly set-based.
Nick Barclay came up with a very similar solution that also used ROW_NUMBER() but made do with just one CTE - here is Nick's solution:
with ConcatNamesCTE (id, [name], rn)as(select id, [name], rnfrom( select id, [name], row_number() over(partition by id order by id) as rn from t1) awhere rn = 1union all select b.id, cn.[name] + ',' + b.[name], b.rn from ( select id, [name], row_number() over(partition by id order by id) as rn from t1 ) b inner join ConcatNamesCTE cn on cn.id = b.id and cn.rn + 1 = b.rn)select d.id, d.[name]from ( select max(rn) as rn, id from ConcatNamesCTE group by id) cinner join ConcatNamesCTE don d.id = c.idand d.rn = c.rn
Obviously Nick is a fan of using derived tables
.
#2 - FOR XML
As I said before, Saivendra came up with a much more concise solution. Here it is:
SELECT id, STUFF( ( SELECT ','+ name FROM t1 a WHERE b.id = a.id FOR XML PATH('')),1 ,1, '')
FROM t1 b GROUP BY id Pretty neat huh? The STUFF function is used to drop the opening comma, that much is easy. The trick is in FOR XML PATH('') which, prior to this little episode, I never knew about so I have definately learnt something here. This solution (or variations thereof) was also put forward by David Portas, Jamie Hunter and Rob Farley.
One thing that immediately jumps out at me is that this solution contains a correlated subquery that will get executed for every row in the table. Hence, I have some reservations about performance over large datasets but I'm going to investigate that later (I'm hanging myself out to dry here because I'm saying that before doing any performance testing).
David also points out another drawback of this technique; some values won't get presented properly. Try inserting the following row into the table
INSERT t1 values (3,'Contains a > delimiter')
Here's the output:

Not perfect I think you'll agree!
#3 - PATINDEX
A further solution was put forward by Adrian Downes. Here it is:
select res.id, max(res.name) as [name]from( select c.id, case when PATINDEX('%' + d.name + '%', c.name) = 0 and c.id = d.id then c.name + ', ' + d.name else c.name end as [name] from ( select a.id, min(a.name) as [name] from ( select y.id, case when PATINDEX('%' + z.name + '%', y.name) = 0 and y.id = z.id then y.name + ', ' + z.name end as [name] from t1 y inner join t1 z on y.id = z.id ) a group by a.id ) c inner join t1 d on c.id = d.id) resgroup by res.idUnfortunately Adrian, this one falls down a bit. It only works for the dataset that I supplied. For example I added the following extra row into the table:
INSERT t1 values (1,'Jim')
it no longer supplies the required result because it expects that you will have no more than 3 rows per ID value. Its not really the generic solution that I was looking for. My bad, I should have specified in the original post that it should work for all cases and any number of rows in the original table and I must apologise to Adrian for that who has dedicated some of his time to this. Thank you for the input though Adrian - its still valuable.
#4 - Recursive CTE
RickR has supplied another solution that leverages a CTE. On the surface this is actually my particular favourite because it solves the problem recursively even though there is no obvious way of recursing the table such as a self-referencing join. Its a very creative way of applying our T-SQL "bag of tricks". Here's the code
;WITH n1 AS ( SELECT id, CONVERT(varchar(500), MIN(name)) name FROM t1 GROUP BY idUNION ALL SELECT a.id, CONVERT(varchar(500), a.name + ',' + b.name) name FROM t1 a JOIN n1 b ON a.id = b.id AND a.name > b.name) SELECT id, max(name)FROM n1 GROUP BY idAs I just said, I really like RickR's solution however I upon digging into it further I have found some problems. Firstly, as the number of rows in the table per ID increases, the number of rows returned from the CTE increases exponentially. Even for the simple sample dataset that I posted containing 5 rows the CTE returns itself returns 6 rows which is why the GROUP BY is needed after the CTE. If I add the following row again:
INSERT t1 values (1,'Jack')
then the CTE returns 10 rows. Add another row with ID=1 (so there is now 7 rows in the table) and the CTE returns 18 rows. I think you can see the potential problem here - the performance of RickR's solution will decrease dramatically as the number of values per ID increases. I figured that this could be solved by using a GROUP BY in the recursive part of the CTE but when I tried that I got the following error message:
Msg 467, Level 16, State 1, Line 4GROUP BY, HAVING, or aggregate functions are not allowed in the recursive part of a recursive common table expression 'n1'. So, I thought that Rick's method would work with the caveat that it couldn't be used on large datasets. However, there is another problem with Rick's solution as well (sorry Rick). If a name value appears multiple times for the same ID then in the final comma seperated list it will only appear once. Try it, add the following row (which is a duplicate of an existing row) to the original table:
INSERT t1 values (1,'Joe')
The repeated value only appears once in the resultset from RickR's solution. Hence, this simply doesn't work I'm afraid. Its a shame because like I said, I really liked Rick's method of turning this into a recursive problem. Rick, if you can solve this problem of repeated values not appearing then I'll gladly re-assess it.
Thank you to everyone that replied, its great to see people coming up with diverse solutions to this problem.
#2 is definately the neatest however as pointed out it doesn't work for all textual values. I have high hopes that solution #1 containing CTEs will perform better as well
In my next post I will compare the performance of solutions #1 and #2 along with Nick Barclay's slightly modified version of #1.
-Jamie