The February CTP of SQL Server 2008 delivered a usable version of the Data Profiling Task that is being introduced into SQL Server Integration Services 2008.
I am putting together a series of blog entries that highlights the functionality available with this task and the blog entry that you are reading now is an introduction to that series.
Before I delve into the detail of each type of data profile request that can be run I want to quickly show a list of some high-level bullet points that you are going to need to know about if you are going to use this task.
| The first thing to note is that you can only profile data in a SQL Server 2000 database or a later version. I don't know why but nevertheless, that's the way it is. |
|
When you run a package that contains a Data Profiling Task you have the option to output that data to be viewed later with the Data Profile Viewer tool. This tool can be found at
- %programfiles%\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe
At the time of writing there is no start menu shortcut to the tool although I suspect that will be coming soon enough. I do not yet know whether this tool is installed with 'SQL Server Integration Services" or "Workstation Tools" from the SQL Server installer menu but I would assume it to be the latter. When I find out I'll update this blog entry accordingly. It is installed both along with BIDS and SSIS itself (see comment below from Matt Masson). |
| The Data Profile Viewer does not allow you the ability to perform your own data profiling from within the tool and that is disappointing. It would be great to see a stand-alone tool that allowed you to perform data profiling against a data source without having to go into Visual Studio. |
| In order to use the Data Profiling Task you have to use ADO.Net Connection Managers. I don't know why this is and I find it rather annoying because in SSIS 2005 I've used OLE DB Connection Managers almost exclusively. I'm endeavouring to find out why this is the case. You also cannot create new connection managers from within the task which is a minor irritation. |
| When opening a file in Data Profile Viewer it filters to only show .xml files. Therefore your connection manager used by the Data Profiling task for outputting results should probably create such a file. |
I aim to produce a blog entry dedicated to each type of profile request available with the Data Profiling Task. Here is a list of those entries thus far:
All of the examples make use of the AdventureWorks database that can be downloaded from Codeplex.
-Jamie
Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.