blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Shifting data in a northern accent

Interested in working at Conchango? Email me

IM me Get alerted when a new blog entry is posted Windows Live Alerts Translate this page

SSIS: Data Profiling Task: Part 1 - Introduction

The February CTP of SQL Server 2008 delivered a usable version of the Data Profiling Task that is being introduced into SQL Server Integration Services 2008.

image

I am putting together a series of blog entries that highlights the functionality available with this task and the blog entry that you are reading now is an introduction to that series.

Before I delve into the detail of each type of data profile request that can be run I want to quickly show a list of some high-level bullet points that you are going to need to know about if you are going to use this task.

The first thing to note is that you can only profile data in a SQL Server 2000 database or a later version. I don't know why but nevertheless, that's the way it is.

When you run a package that contains a Data Profiling Task you have the option to output that data to be viewed later with the Data Profile Viewer tool.  This tool can be found at

  • %programfiles%\Microsoft SQL Server\100\DTS\Binn\DataProfileViewer.exe

At the time of writing there is no start menu shortcut to the tool although I suspect that will be coming soon enough. I do not yet know whether this tool is installed with 'SQL Server Integration Services" or "Workstation Tools" from the SQL Server installer menu but I would assume it to be the latter. When I find out I'll update this blog entry accordingly. It is installed both along with BIDS and SSIS itself (see comment below from Matt Masson).

The Data Profile Viewer does not allow you the ability to perform your own data profiling from within the tool and that is disappointing. It would be great to see a stand-alone tool that allowed you to perform data profiling against a data source without having to go into Visual Studio.
In order to use the Data Profiling Task you have to use ADO.Net Connection Managers. I don't know why this is and I find it rather annoying because in SSIS 2005 I've used OLE DB Connection Managers almost exclusively. I'm endeavouring to find out why this is the case. You also cannot create new connection managers from within the task which is a minor irritation.
When opening a file in Data Profile Viewer it filters to only show .xml files. Therefore your connection manager used by the Data Profiling task for outputting results should probably create such a file.

 

I aim to produce a blog entry dedicated to each type of profile request available with the Data Profiling Task. Here is a list of those entries thus far:

All of the examples make use of the AdventureWorks database that can be downloaded from Codeplex.

-Jamie

 

Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Column Length Distribution profile request

March 2, 2008 20:18
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Column Null Ratio profile request in the

March 2, 2008 20:58
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Column Pattern profile request in the new

March 2, 2008 21:24
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Column Statistics profile request in the

March 2, 2008 22:49
 

jerome said:

Good article(s)!

I don't understand why the profiler generator is not a standalone tool(?)

creating a package, adding tasks and running it just to produce XML files... it's a long process for a task which is generally made 1 time...

and specially when the profile viewer is a standalone tool!!!

March 2, 2008 23:01
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Column Value Distribution profile request

March 3, 2008 02:19
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Functional Dependency profile request in

March 3, 2008 03:44
 

Jason Haley said:

March 3, 2008 14:35
 

Matt Masson said:

DataProfileViewer.exe comes with both the Integration Services install, and the Business Intelligence Design Studio (BIDS) install.

Note that "Workstation Tools" has been split into separate entities in Katmai. It's now possible to install BIDS by itself, for example.

March 3, 2008 17:42
 

jamie.thomson said:

AHA! Thanks Matt, I'll update the post accordingly.

-Jamie

March 3, 2008 17:56
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Candidate Key profile request in the new

March 4, 2008 04:13
 

SSIS Junkie said:

In this blog entry I am going to outline the function of the Value Inclusion profile request in the new

March 7, 2008 05:23
 

BI Thoughts and Theories said:

SQL Server 2008 introduces a new task that allows you to profile the data in a table. Jamie Thomson has

March 11, 2008 03:45
 

SSIS Stuff said:

There's been some great feedback coming in about the new data profiling task. Jamie Thompson has a set

March 11, 2008 16:21
 

Noticias externas said:

There's been some great feedback coming in about the new data profiling task. Jamie Thompson has

March 11, 2008 17:13
 

Tod means Fox | ETL Subsystem 1: Data Profiling said:

March 18, 2008 11:57
 

SSIS Junkie said:

Back in February and March I wrote a series of blog entries explaining how to use the various profile

April 16, 2008 07:42

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems