We recently used the lookup transform to retrieve foreign keys in a data warehouse project, and were plain lucky to notice the following issue:
When looking up padded varchar strings, the records the lookup transform considers equal differs depending on what CacheType has been selected.
As shown in the example below, using CacheType = None or Partial will match 4 out of 4 records, while using CacheType = Full will match only 2 out of 4 records. Now, it could be argued that both cases are correct, depending on your definition of string equality, but the insidious problem is of course that changing the CacheType changes what records get matched, which is quite unintuitive and could lead to hard to find bugs in your packages.
Thankfully, this only affects padded varchar strings (which are fairly uncommon), and doesn't affect padded char strings (which are much more common.)
Update 2: Microsoft says this behaviour is by design, and comes from SSIS (CacheType = Full) comparing strings differently from SQL Server (CacheType = None or Partial). Sounds like we're stuck with it; read on for details and workarounds.
Demonstration package
We'll demonstrate this by creating two tables, each with four records, corresponding to the four possible combinations:
- Unpadded lookup string & Unpadded reference string
- Unpadded lookup string & Padded reference string
- Padded lookup string & Unpadded reference string
- Padded lookup string & Padded reference string
The SQL table creation code contains:
CREATE TABLE Lookup (
ID INT IDENTITY(1,1),
String VARCHAR(10),
LookupDescription VARCHAR(200)
)
GO
INSERT Lookup (String, LookupDescription) VALUES ('string1', 'This lookup string was NOT padded')
INSERT Lookup (String, LookupDescription) VALUES ('string2', 'This lookup string was NOT padded')
INSERT Lookup (String, LookupDescription) VALUES ('string3 ', 'This lookup string WAS padded')
INSERT Lookup (String, LookupDescription) VALUES ('string4 ', 'This lookup string WAS padded')
CREATE TABLE Reference (
ID INT IDENTITY(1,1),
String VARCHAR(10),
ReferenceDescription VARCHAR(200)
)
GO
INSERT Reference (String, ReferenceDescription) VALUES ('string1', 'This reference string was NOT padded')
INSERT Reference (String, ReferenceDescription) VALUES ('string2 ', 'This reference string WAS padded')
INSERT Reference (String, ReferenceDescription) VALUES ('string3', 'This reference string was NOT padded')
INSERT Reference (String, ReferenceDescription) VALUES ('string4 ', 'This reference string WAS padded')
The data flow uses an OLE DB source connecting to a SQL Server 2005 to get the records from the Lookup table, and looks up the four strings in the Reference table with a lookup transform.
CacheType = None or Partial
When setting CacheType to None or Partial, all four records match, i.e. the lookup transform disregards any padding the strings have:

CacheType = Full
When setting CacheType to Full, padding is considered part of the string and only two records match:

At a guess, the two types of behaviour comes from Full CacheType using a different SQL query when getting all reference records up front, compared to None and Partial CacheType which get the reference records only when needed, one by one.
Workarounds
Simplistic
The (too) simple remedy is of course to just be aware of this behaviour, and use the appropriate CacheType depending on what string equality version is needed. This is however both error prone and restrictive, we really want to be able to choose the appropriate CacheType depending on number of records etc.
Remove padding on the fly
A fairly good alternative is to remove the padding inside the data flow, i.e.:
- Trim the incoming strings either at the source component with SQL RTRIM(fieldname), or in a Derived Column component
- Trim the Reference records by adding RTRIM(fieldname) to the Reference query inside the lookup component
This is more flexible than the first alternative, but requires you to remove the padding every time you do lookups on these fields, again somewhat error prone.
Remove padding ahead of time
In our project we instead made sure that all varchar string fields were consistently unpadded when imported into the data warehouse, which works great when you have the luxury of being able to change the lookup and reference records.
End game
Long term, I hope the lookup transform will be updated to remove this 'feature', I'll feedback this to MS.
Update: Reported to Microsoft.
Try out
To see this in action, download the package, configure the connection for your own database, run with debugging with different CacheType settings, and watch either 2 or 4 records get matched.
Conclusion
We've looked at an error prone feature of the lookup transform when matching padded varchar strings. Workarounds include removing padding from the strings before and as part of the lookup, or being very specific about which CacheType must be used for achieving correct behaviour in each instance the lookup transform is used.