Back in February and March I wrote a series of blog entries explaining how to use the various profile requests in the new Data Profiling Task that is coming in SSIS 2008. All of those blog entries explained how to profile some data and analyse it offline using the Data Profile Viewer which is all well and good but quite often you need to use the results of the data profile later in the package, perhaps in a conditional precedence constraint. That is what this blog entry is intended to address.
Fundamentally you need to to extract some information from the data profiling output and given that that output is an XML Document we are going to require the use of the XML Task. Here is the profile output after running the Column Null Ratio profile request.
<?xml version="1.0"?>
<DataProfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">
<DataSources>
<DtsDataSource ID="LocalHost.AdventureWorks" Name="LocalHost.AdventureWorks">
<DtsConnectionManagerID>LocalHost.AdventureWorks</DtsConnectionManagerID>
</DtsDataSource>
<DtsDataSource ID="LocalHost.AdventureWorks1" Name="LocalHost.AdventureWorks1">
<DtsConnectionManagerID>LocalHost.AdventureWorks1</DtsConnectionManagerID>
</DtsDataSource>
<DtsDataSource ID="profile_output.xml" Name="profile_output.xml">
<DtsConnectionManagerID>profile_output.xml</DtsConnectionManagerID>
</DtsDataSource>
</DataSources>
<DataProfileInput>
<ProfileMode>Exact</ProfileMode>
<Timeout>0</Timeout>
<Requests>
<ColumnNullRatioProfileRequest ID="NullRatioReq">
<DataSourceID>LocalHost.AdventureWorks1</DataSourceID>
<Table Schema="Person" Table="Contact" />
<Column IsWildCard="false" ColumnName="MiddleName" />
</ColumnNullRatioProfileRequest>
</Requests>
</DataProfileInput>
<DataProfileOutput>
<Profiles>
<ColumnNullRatioProfile ProfileRequestID="NullRatioReq" IsExact="true">
<DataSourceID>LocalHost.AdventureWorks1</DataSourceID>
<Table DataSource="." Database="AdventureWorks" Schema="Person" Table="Contact" RowCount="19972" />
<Column Name="MiddleName" SqlDbType="NVarChar" MaxLength="50" Precision="255" Scale="255" LCID="1033" CodePage="0" IsNullable="true" StringCompareOptions="0" />
<NullCount>8499</NullCount>
</ColumnNullRatioProfile>
</Profiles>
</DataProfileOutput>
</DataProfile>
The important information here is this bit:
<NullCount>8499</NullCount>
That is the value that we need to extract out of the package and store in a variable within our package so that we can use it in an expression. So, how do we do it? As I said earlier, we need to use the XML Task which can use an XPath expression to extract the data that we require. It would be nice if the following XPath expression worked:
/DataProfile/DataProfileOutput/Profiles[1]/ColumnNullRatioProfile/NullCount
But unfortunately it does not. The reason is that the XML above contains a namespace (http://schemas.microsoft.com/sqlserver/2008/DataDebugger/) that cannot be declared in the XML Task - this is a limitation of the XML Task. The workaround is to edit the XPath expression like so:
/*[local-name()='DataProfile']/*[local-name()='DataProfileOutput']/*[local-name()='Profiles' and position()=1]/*[local-name()='ColumnNullRatioProfile']/*[local-name()='NullCount']
Here we use the local-name() XSD extension function to return the non-qualified element that we are interested in. Take a moment to look at our new XPath expression and notice the similarities and differences between it and our original XPath expression. [Thanks to Matt Masson and David Noor from the SSIS product team for helping me out with this bit]
So now we have the following tasks:
More importantly, let's try and understand how we have configured the XML Task:
That's really all there is to it. It is important to point out the pertinent property settings:
- OperationType = 'XPATH'
- SaveOperationResult = 'TRUE'
- SecondOperandtype = 'Direct input'
- SecondOperand = <XPath expression from above>
- PutResultInOneNode = 'False'
- XPathExpression = 'Values'
Also notice that we are storing the result in a variable called @[User::Result]
Here's a very short video of my running this (I have used a script task to output the parsed value):
Pretty easy really. if you have any questions let me know in the comments below.
This concludes my brief summary of parsing the output of the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction
-Jamie
Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.