In this blog entry I am going to outline the function of the Value Inclusion profile request in the new Data Profiling Task in SQL Server 2008. BOL states about this profile request:
| Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables. This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the ProductID column of a Sales table and discover that the column contains values that are not found in the ProductID column of the Products table. |
Or to put it another way, it tells you if one set of values is a subset of another.
Again, I don't really have any good test data for this so I'm going to create some of my own. If you've read the other blog entries in this thread then you'll know that the [HumanResources].[Department].[GroupName] field has got 6 distinct values in it. I'm going to load 4 of them into a new table:
select top 4 GroupName
into ValueInclusionTest
from HumanResources.Department
group by GroupName
Here's how I configure the task:
I've highlighted the important bits. We're finding out which values in [HumanResources].[Department].[GroupName] do not exist in [dbo].[ValueInclusionTest].[GroupName]. Also note that I've set InclusionThresholdSetting='None' in order to ensure that we get some meaningful data back. You can fine tune that and other properties to make sure that the task works for you. Here's the results:
So, we know that not all of the values in [HumanResources].[Department].[GroupName] will exist in [dbo].[ValueInclusionTest].[GroupName] and hence that's why we've got an inclusion strength of 68.75%. We also get an 'Inclusion Violations' section showing us the offending values.
This concludes my brief summary of the Value Inclusion profile request in the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction
-Jamie
Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.