blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

Fat man and his mixed bag of goodies

SQL generic split function

Last Friday I was discussing with Jamie one of the SQL function I was working on. At the end of it Jamie gave me some encouragement to blog it (“If I don’t see a blog on it by Monday, I will be having a word with you!” Jamie).

With that support here I am, finally writing my first blog.

My current assignment is with the world’s leading, specialist insurance market. The situation I came across was like this, I was receiving a comma delimited string in one of my stored procedures and it was meant to be used in a where clause.

e.q.

SELECT FROMWHERE ITEMID IN (Comma delimited string)

The first solution that sprang to my mind was to construct a dynamic SQL in stored procedure utilising the delimited string and executing it at the end of it. This would have taken away all the obvious benefits a stored procedure offers plus would have opened the doors to potential abuse of this stored proc. Another solution would have been to use OPENXML but I didn't have a choice of getting input string in XML format.

After some further thinking and search the better solution seemed to be a SQL function accepting comma delimited string and returning a resultset which then could be used in a WHERE clause or a JOIN.

The first function developed worked to the expectations and I was delighted with it. But after a few minutes, questions started popping up in my head about the inflexibility of this function. First it was assuming the string would always be delimited with a comma and hence the other implication was a delimiter would always be a single character. Secondly the function was returning a single column table of type int. So the input and output format were fixed (arrrrrrggh!!!).

So before letting those limitations haunt me in the middle of the night, I decided to make this function a bit more generic and what follows is the result of those efforts:



-- ===========================================================

-- Author:        Manoj Oswal (Conchango)

-- Create date:   26 Jan 2007

-- Description:   Generic Split function

--                Accepts List and separator as string

--                Returns single variant column TABLE of items

-- ===========================================================

CREATE FUNCTION [dbo].[GenericStringSplit]

(

      @List VARCHAR(8000),

      @Separator VARCHAR(10)

)

RETURNS

@ItemsList TABLE

(

      Item SQL_VARIANT

)

AS

BEGIN

      DECLARE @CurrItem SQL_VARIANT, @Pos INT

      SET @List = LTRIM(RTRIM(@List))+ @Separator

      SET @Pos = CHARINDEX(@Separator, @List, 1)

      IF REPLACE(@List, @Separator, '') <> ''

      BEGIN

            WHILE @Pos > 0

            BEGIN

                  SET @CurrItem = LTRIM(RTRIM(LEFT(@List, (@Pos  -1 ))))

                  IF @CurrItem <> ''

                  BEGIN

                        INSERT INTO @ItemsList (Item)

                        VALUES (@CurrItem)

                  END

                  SET @List = RIGHT(@List,(LEN(@List) - (@Pos + LEN(@Separator)-1)))

                  SET @Pos = CHARINDEX(@Separator, @List, 1)

            END

      END  

      RETURN

END

The above function now accepts a delimited string and a separator up to 10 char and returns a single column table of sql_variant. The consuming SQL can then cast the return of this function with the desired type.



Usage:


SELECT CAST(Item AS INT) FROM [GenericStringSplit]('1;-2;-3;-4;-5',';-')

SELECT CAST(Item AS DATETIME) FROM [GenericStringSplit]('12 feb 2005~~~10 Jan 2006~~~15 Jan 2006','~~~')


One of the limitations of this function is the input list size is limited to 8000. VARCHAR(MAX) is the possible solution for this problem but that also means some more changes in the function namely changing @CurrItem type to something like VARCHAR(100). This is due to the incompatibility of VARCHAR(MAX) and SQL_VARIANT.

Interestingly enough the same day I had to write something to do the opposite of the above mentioned function. I had to write something that returned a comma separated list of items from a table. The solution I used is as follows:


DECLARE @List VARCHAR(MAX)

SET @List = ''

SELECT @List = @LIST + ',' + COLNAME FROM TABLE

IF @List <>''

BEGIN

      SET @List = STUFF(@List,1,1,'')

END

SELECT @List


Please send any feedback/suggestions you have, to further improve this work.

Published 31 January 2007 11:02 by Manoj.Oswal

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Manoj said:

I just came across a very good article that mentions the correct way to use VARCHAR(MAX)

http://sqljunkies.com/WebLog/simons/archive/2006/09/15/Support_for_varchar_max__with_string_functions.aspx

Run the follwing two lines and see the results:

declare @v varchar(max)

select @v =  replicate('a', 9000)

select len(@v)

declare @b varchar(max)

select @b =  replicate(cast('a'as varchar(max)), 9000)

select len(@b)

February 1, 2007 10:57
 

Tim Watson said:

I had to do something very similar recently ->constructing a large string representing and MDX set from a SQL resultset. The platform was mssql 8.0 therefore varchar(max) wasn't available. Furthermore, we needed to ensure that the results came back in segments no more than 32k characters long (and analysis services 8.0 limitation).

Because you cannot create a variable of type ntext, I chose to build a temp' table (yes I know the drawbacks, but I find them pretty useful a lot of the time) with ntext storage and use a text pointer (sql server specific construct) to manipulate the field directly. This is actually quite a lot faster than straight text manipulation. I've copies some sql in below to demonstrate; it's not set up as a sproc, but I'm sure you can adapt it if you wish.

/*

Tim Watson (Sage UK Business Intelligence Division)

Sample t-sql sciprt that generates 32k batches of

text (represents mdx) using text pointers.

*/

--clear any existing tables for the session

IF ( OBJECT_ID( N'#T_Output' ) IS NOT NULL )

DROP TABLE #T_Output;

IF ( OBJECT_ID( N'#T_Results' ) IS NOT NULL )

DROP TABLE #T_Results;

--create tbl structure

CREATE TABLE #T_Output (

rowid INT IDENTITY( 1, 1 ),

txt NTEXT

);

CREATE TABLE #T_Results (

rowid INT IDENTITY( 1, 1 ),

recordLength INT,

result VARCHAR( 8000 )

);

--confirm identity insert set to off

SET IDENTITY_INSERT dbo.#T_Results OFF;

SET IDENTITY_INSERT dbo.#T_Output  ON;

--was setting text in row to on, so ptr manipulation hits the cache not the disk

--doesn't work with temp tables though, so turned it off again!

-- EXECUTE sp_tableoption '#T_Output', 'text in row', 'ON'

--get the result set from which to build the string

INSERT INTO #T_Results ( recordLength, result )

SELECT  

LEN('[Account].[Geography].[All].[' +

DWH_Account.Country + '].[' +

DWH_Account.StateProvince +'].[' +  

DWH_Account.City + '].[' +

DWH_Account.Account + '],') AS recordLength,

'[Account].[Geography].[All].[' +

DWH_Account.Country + '].[' +

DWH_Account.StateProvince +'].[' +  

DWH_Account.City + '].[' +

DWH_Account.Account + '],' AS result

FROM        

sx_eval.sysdba.USERINFO UI

INNER JOIN sx_eval.sysdba.SECRIGHTS SR

ON UI.USERID = SR.ACCESSID

INNER JOIN DWH_Account

ON SR.SECCODEID = DWH_Account.SeccodeID

WHERE    

UI.WINDOWSSID = @sid

ORDER BY DWH_Account.Country;

--declare locals

DECLARE

@row   INT,

--current row

@rowcount INT,

--max rows in source data

@segmentCount INT,

--number of segments (SX thing)

@currentSegment INT,

--current segment (for processing)

@currentRecordLength INT,

--length of the current record being inserted (in chars)

@currentOutputRowId INT,

--the row id of the record which the pointer is currently manipulating

@MAX_RECORD_LENGTH INT,

--constant value = 32750

@ptrIsValid INT,

--storage for a flag to indicate whether a pointer allocation succeeds!

@releasePointerSuccess INT,

--storage for a flag to indicate whether a pointer release operation succeeds.

@danglingChars INT,

--leftover chars after segmenting the result set...

@params   NVARCHAR( 4000 ),

--params to UPDATETEXT

@sql   NVARCHAR( 4000 ),

--storage for sql to execute UPDATETEXT

@pDestTxt BINARY( 16 );

--pointer to the text field

SET @MAX_RECORD_LENGTH = 32750;

SET @danglingChars = ( SELECT ( SUM( RecordLength ) % @MAX_RECORD_LENGTH ) FROM #T_Results );

SET @segmentCount = ( SELECT ( SUM( RecordLength ) / @MAX_RECORD_LENGTH ) AS segmentCount FROM #T_Results );

--deal with the remaining few chars, if any...

IF @danglingChars > 0

BEGIN

SET @segmentCount = @segmentCount + 1;

END

--loop through all the required segments, setting up

--an empty NTEXT value for the pointer to manipulate

SET @currentSegment = 1;

WHILE @currentSegment <= @segmentCount

BEGIN

INSERT INTO #T_Output ( rowid, txt )

               VALUES ( @currentSegment, N'' );

SET @currentSegment = @currentSegment + 1;

END

--turn off identity insert

SET IDENTITY_INSERT dbo.#T_Output OFF;

SET @params = N'@pDestTxt BINARY( 16 )';

--ptr param to UPDATETEXT

SET @sql = N'';

SET @row = ( SELECT MIN( rowid ) FROM #T_Results );

--start row

SET @rowcount = ( SELECT MAX( rowid ) FROM #T_Results );

--end row

SET @currentOutputRowId = ( SELECT MIN( rowid ) FROM #T_Output );

-- output row

--acquire the text pointer

SELECT @pDestTxt = TEXTPTR( txt )

FROM #T_Output

WHERE rowid = @currentOutputRowId;

SET @currentRecordLength = 0;

WHILE @row <= @rowcount

BEGIN

   IF @currentRecordLength >= @MAX_RECORD_LENGTH

       BEGIN

--release the pointer

EXECUTE @releasePointerSuccess = sp_invalidate_textptr @pDestTxt;

IF @releasePointerSuccess <> 0

   BEGIN

       RAISERROR( N'Failed to release textptr for NTEXT column    

                                 #T_Output.txt', 16, 1 );

                   END

--reallocate the pointer

SET @currentOutputRowId = @currentOutputRowId + 1;

SELECT @pDestTxt = TEXTPTR( txt ) FROM #T_Output

               WHERE rowid = @currentOutputRowId;

--validate pointer is ok

SET @ptrIsValid = TEXTVALID( '#T_Output.txt', @pDestTxt );

IF @ptrIsValid <> 1

   BEGIN

       RAISERROR( N'Failed to re-allocate textptr for NTEXT column #T_Output.txt on rowid %d', 16, 1, @currentOutputRowId );

   END

               SET @currentRecordLength = 0;

   END

   SELECT @sql = 'UPDATETEXT #T_Output.txt @pDestTxt 0 0 ' + 'N' + '''' +

               result + '''',

               @currentRecordLength = @currentRecordLength + recordLength

   FROM #T_Results

   WHERE rowid = @row;

   EXECUTE sp_executeSql @sql, @params, @pDestTxt

   SET @row = @row + 1;

END

EXECUTE @releasePointerSuccess = sp_invalidate_textptr @pDestTxt;

IF @releasePointerSuccess <> 0

BEGIN

RAISERROR( N'Failed to release textptr for NTEXT column #T_Output.txt', 16, 1 );

END

--truncate the target tbl

TRUNCATE TABLE Staging_Sec_Results;

INSERT INTO Staging_Sec_Results

SELECT rowid as SegmentNumber, txt as ResultString

FROM #T_Output;

DROP TABLE #T_Output;

DROP TABLE #T_Results;

/* EOF */

February 1, 2007 12:27
 

Tim Watson said:

Sorry, I don't know what happened to the formatting there!?

February 1, 2007 15:48
 

Manoj said:

No worries Tim :-)

Thanks for giving some good pointers.

I think it gave me renewed enthusiasm to make the function more efficient.

February 1, 2007 16:36
 

Binh said:

I found this really useful, thanks!

February 2, 2007 03:16
 

anon said:

thanks so much for this, really helped and saved me an hour or two :)

August 16, 2007 13:36

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server (Personal Edition), by Telligent Systems