blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Shifting data in a northern accent

Subscribe to the main Conchango RSS feed at http://blogs.conchango.com/MainFeed.aspx

IM me Get alerted when a new blog entry is posted Windows Live Alerts Translate this page

  • MDX: Getting the current quarter

    I've been wrestling with MDX in PerformancePoint today and was attempting to do something that I thought would be pretty simple. Work out what is the current quarter. No amount of googling turned anything up either.

    It took me a while to figure out so I thought I'd post it up here in case I need it again or if anyone else ever needs it.

    member [Time].[Calendar YQM View].[Current Quarter] as 'StrToMember("[Time].[Calendar YQM View].[Month].[" + Format(Now(), "MMM") + " Year " + Format(Now(), "yyyy") + "].Parent")'

    There doesn't seem to be a format string that, when used with VBA's Format() function, will return the current quarter which seems, to me, to be a glaring omission. Hence I had to get the current month and ask for its parent.

    Obviously you'll need to mess about with some string manipulation to suit whatever format you are using. In my case the label for the current month looks like this: "May Year 2008" which (I think) is PerformancePoint's doing. It seems utterly ridiculous to me. Surely "May 08", "May 2008" or even "200805" would be better? Still, who am I to complain.

    Hope this helps. if anyone knows a better way feel free to let me know.

    -Jamie

  • SSIS: Absolute and relative paths

    SSIS uses absolute paths to point to objects on the file system. What does that mean? In short it means that in order to point to a file I would have to use the full absolute path, I couldn't give a path that is relative to where the packages resides.

    A common complaint about SSIS is that it does not allow the use of relative paths. The most useful application of this feature would be the ability to execute packages (using the Execute Package Task) where all the packages are (as is typical) stored in a single folder and there have been many people complaining about the absence of this feature over on the SSIS forum. I do not know why relative paths are not supported in SSIS but I suspect that it is because packages do not always reside in the file system, sometimes they reside in SQL Server's msdb database and in this instance there is no location for a file path to be relative to.

    Due to the reason just stated I have always agreed that stipulating the use of absolute paths within SSIS was the right thing to do (and indeed I have championed it) however of late I have changed my mind. Support for relative paths would greatly simplify package deployment and package management and for this reason I have made the following submission to Connect:

    Description

    A common complaint about SSIS is that file references cannot be relative to wherever the package resides. Relative paths would greatly simplify package deployment and management, especially when using the Execute Package Task.

    Proposed Solution

    Please introduce a new system variable called [System::PackageLocation] (or similar) that contains the location of the package on the file system. This variable could be used (in conjunction with BLOCKED EXPRESSION to provide support for relative paths.
    A developer would be able to deploy all of his/her packages to the same location and be confident that through the use of [System::PackageLocation] the Execute Package Task could always easily find the packages that it was executing.

    https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=341880

    [This is in addition to a number of existing submissions for similar functionality.]

     

    If you require similar functionality today then you can ape the [System::PackageLocation] behaviour that I just described by telling the package where it is when it executes. I describe a strategy for doing this in my blog entry Common Folder Structure from January 2006.

    What do you think? Should SSIS support relative paths? (That's probably a rhetorical question but I'll ask it anyway.)

    -Jamie

  • SSDS and Mesh

    SQL Server Data Services (SSDS) and Live Mesh are the two technologies that are uppermost in my mind at the moment. [My thoughts on SSDS are here and on Live Mesh here.] Hence then it didn't take me long to start wondering about the value of the two of them being integrated together. SSDS is a data storage platform and Live Mesh is a data synchronisation platform so put them together and you have the ability to synchronise data stored in SSDS to...well...anywhere that Live Mesh can run. Perhaps Mesh could even be used to synchronise data from one SSDS authority to another?

    In order for this to happen an adapter would need to be written that exposes an SSDS data store as a "virtual" device. I sent a message to the SSDS team via their blog site The Long Term Storecast and I got a reply basically saying they would consider it so here's hoping that this gets implemented. I suspect they have got plenty of time because I don't expect SSDS to get released out of beta until 2009.

    What sort of applications do you envisage being built on an SSDS-Mesh tag team? I have a couple of embryonic ideas that need fleshing out a wee bit (euphemistically speaking) but in the meantime I have no doubt there's going to be some killer apps built on these platforms.

    -Jamie

    [crossposted here]

  • Do you friendfeed?

    Is anyone out there using friendfeed? I got switched onto it about a month ago and have been "using" it ever since to publish my various blog, photo and random paraphernalia.

    image

    What I really like about it is that "using it" consists of setting up all your various feeds and then just leaving it. In fact, unless you want to go and find out what your friends are up to there's no need to visit the site at all - definitely a good thing.

    I'm also a big fan of the Facebook Friendfeed application.

    Are you on friendfeed? If so, feel free to follow me!

    -Jamie

  • SSIS: Deleting a group of files

    A question I quite often see on the SSIS forum is:

    Is there a way to delete a group of files within a directory? ie: I want to delete all files in directorty c:\logs\*.txt

    (http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3196521&SiteID=1)

    Well its actually quite easy, you just need  ForEach Loop container and the FileSystem Task. Here's how you set up the ForEach Loop container:

    In the 'Collection' tab you need to specify the enumerator, the folder from which to delete, and a fie mask to specify the files that you want to delete.

    image

    In the Variable Mappings tab you need to store the enumerated filename in a pre-created variable. The index will be 0 in this case because we are only interested in the file name which gets returned in the first column of the zero-based enumeration.

    image

     

    After that we have the simple task of wiring up the File System task that we do like so:

    image

    Note that we have used the variable "[User::FileName] that we populated using the ForEach Lloop container.

     

    And that is pretty much it. Very simple indeed when you know how. I have made a demo package available here:

    -Jamie

  • SSIS: Data Profiling Task: Part 10 - Parsing the output

    Back in February and March I wrote a series of blog entries explaining how to use the various profile requests in the new Data Profiling Task that is coming in SSIS 2008. All of those blog entries explained how to profile some data and analyse it offline using the Data Profile Viewer which is all well and good but quite often you need to use the results of the data profile later in the package, perhaps in a conditional precedence constraint. That is what this blog entry is intended to address.

    Fundamentally you need to to extract some information from the data profiling output and given that that output is an XML Document we are going to require the use of the XML Task. Here is the profile output after running the Column Null Ratio profile request.

     

    <?xml version="1.0"?>

    <DataProfile xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://schemas.microsoft.com/sqlserver/2008/DataDebugger/">

      <DataSources>

        <DtsDataSource ID="LocalHost.AdventureWorks" Name="LocalHost.AdventureWorks">

          <DtsConnectionManagerID>LocalHost.AdventureWorks</DtsConnectionManagerID>

        </DtsDataSource>

        <DtsDataSource ID="LocalHost.AdventureWorks1" Name="LocalHost.AdventureWorks1">

          <DtsConnectionManagerID>LocalHost.AdventureWorks1</DtsConnectionManagerID>

        </DtsDataSource>

        <DtsDataSource ID="profile_output.xml" Name="profile_output.xml">

          <DtsConnectionManagerID>profile_output.xml</DtsConnectionManagerID>

        </DtsDataSource>

      </DataSources>

      <DataProfileInput>

        <ProfileMode>Exact</ProfileMode>

        <Timeout>0</Timeout>

        <Requests>

          <ColumnNullRatioProfileRequest ID="NullRatioReq">

            <DataSourceID>LocalHost.AdventureWorks1</DataSourceID>

            <Table Schema="Person" Table="Contact" />

            <Column IsWildCard="false" ColumnName="MiddleName" />

          </ColumnNullRatioProfileRequest>

        </Requests>

      </DataProfileInput>

      <DataProfileOutput>

        <Profiles>

          <ColumnNullRatioProfile ProfileRequestID="NullRatioReq" IsExact="true">

            <DataSourceID>LocalHost.AdventureWorks1</DataSourceID>

            <Table DataSource="." Database="AdventureWorks" Schema="Person" Table="Contact" RowCount="19972" />

            <Column Name="MiddleName" SqlDbType="NVarChar" MaxLength="50" Precision="255" Scale="255" LCID="1033" CodePage="0" IsNullable="true" StringCompareOptions="0" />

            <NullCount>8499</NullCount>

          </ColumnNullRatioProfile>

        </Profiles>

      </DataProfileOutput>

    </DataProfile>

    The important information here is this bit:

    <NullCount>8499</NullCount>

    That is the value that we need to extract out of the package and store in a variable within our package so that we can use it in an expression. So, how do we do it? As I said earlier, we need to use the XML Task which can use an XPath expression to extract the data that we require. It would be nice if the following XPath expression worked:

    /DataProfile/DataProfileOutput/Profiles[1]/ColumnNullRatioProfile/NullCount

    But unfortunately it does not. The reason is that the XML above contains a namespace (http://schemas.microsoft.com/sqlserver/2008/DataDebugger/) that cannot be declared in the XML Task - this is a limitation of the XML Task. The workaround is to edit the XPath expression like so:

    /*[local-name()='DataProfile']/*[local-name()='DataProfileOutput']/*[local-name()='Profiles' and position()=1]/*[local-name()='ColumnNullRatioProfile']/*[local-name()='NullCount']

    Here we use the local-name() XSD extension function to return the non-qualified element that we are interested in. Take a moment to look at our new XPath expression and notice the similarities and differences between it and our original XPath expression.  [Thanks to Matt Masson and David Noor from the SSIS product team for helping me out with this bit]

     

    So now we have the following tasks:

    image

    More importantly, let's try and understand how we have configured the XML Task:

    image

     

    That's really all there is to it. It is important to point out the pertinent property settings:

    • OperationType = 'XPATH'
    • SaveOperationResult = 'TRUE'
    • SecondOperandtype = 'Direct input'
    • SecondOperand = <XPath expression from above>
    • PutResultInOneNode = 'False'
    • XPathExpression = 'Values'

    Also notice that we are storing the result in a variable called @[User::Result]

    Here's a very short video of my running this (I have used a script task to output the parsed value):

     

    Pretty easy really. if you have any questions let me know in the comments below.

     

    This concludes my brief summary of parsing the output of the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

    -Jamie

    Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

  • SSIS: Package parts

    As SQL Server 2008 is practically now a done deal my thoughts have turned to what will come in future versions. One feature I would like is something I'm notionally calling "package parts". Let me explain.

    Today in SSIS we build, distribute and execute these things called packages. Packages are fairly large, verbose, non-descript BLOBs and that has some rather negative implications such as:

    • Changing anything in the package, even something as small as moving a container one pixel to the left, causes the package to get checked out (assuming you have your packages under source code control)
    • Support for multiple-developer teams is limited. If you have two people building dataflows then they have to be working on separate packages.
    • The only method of reusing tasks is copy-and-paste
    • Comparing two packages to check for differences and hence merging them together is not possible

    I'd really like these problems to be alleviated in a future version of SSIS and this could be done by breaking the package into separate parts. In other words, containers (and remember, tasks are a type of container) exist as separate files in solution explorer. This would allow:

    • Check in/out of containers rather than a whole package
    • A developer could be building a dataflow without preventing another developer from working on another dataflow
    • Containers could be instantiated in different places. Define once, use many times - isn't that the premise of code reuse?
    • Compare tools could be built that understand how a container gets defined once it is serialised as XML without having to worry about extraneous matter such as workflow and designer formatting


    Does that sound like a good idea? Let me know in the comments and if there is enough interest then I'll submit this to Connect.

    -Jamie 

    UPDATE: OK, 2 days on and there have been 18 comments saying "yes" and none saying "no". Hence, I've submitted this here.

     

  • SSIS: MVP Summit 2008

    Next Monday (14th April) brings with it the start of the Microsoft MVP Summit 2008. This is an opportunity for all of the (give or take a few) 4000 MVPs from around the world to travel to Seattle and engage with Microsoft product teams with a view to helping them improve their products. Someone somewhere has deigned that I have the honour of being an MVP bestowed upon me for 2008 and therefore I too shall be boarding a Seattle-bound plane in a couple of days time where I will be spending a lot of time with various folks from the SQL Server Integration Services product team.

    As I see it one of the assumed responsibilities of an MVP is to represent the views of the community as well as provide one's own opinions and with that in mind I'd like to invite readers of this blog to leave a comment here answering the questions "What should the SSIS product team be concentrating on for future versions?".

    I realise that I'm opening this blog up to swathes of abuse and derision so I would ask that if you have any opinions to proffer then they be constructive, well explained and concentrate on alleviating a particular problem that you have with the product today. For example, my big wish for future versions of SSIS is:

    "Other than modularising functionality over multiple packages SSIS doesn't have a very good story around reuse. I would like to see this improved in future versions by giving us the ability to instantiate pre-configured tasks/components in multiple packages."

    That's a lot better than saying:

    "Please allow more reuse in SSIS"

    You get the idea I'm sure. Be realistic as well, "Make it run on LINUX" isn't ever going to happen no matter how much you might want it.

     

    So, let me know what irks you about SSIS today. Logging? Reuse? Design-time experience? Deployment? Multi-developer support? Platform architecture? Metadata support? Better support for beginners? Community engagement? Those are all worthy areas for improvements along with many others - let me know what you think and hopefully there will be a common consensus.

    I am obligated to point out that all summit attendees are bound by a non-disclosure agreement so I'm afraid I won't be able to report back on what gets discussed next week. Sorry.

     

    -Jamie

  • SSDS: Amazon SimpleDB's "Eventual consistency" model will not be inherent in SSDS

  • SSIS: Some new small features for katmai

    In the past few days I have twice received some good news about small enhancements that are making it into katmai.


     Here's the first one:

    Output config info to the output window

    Request: I have recently inherited some packages from someone else. In trying to figure out what they actually do I was getting foxed because configurations were being applied when I opened the package up in BIDS.
    I could make changes to a ConnectionString, close the package, open it up again, and my changes would have disappeared. The ConnectionString would revert back to what it was before I changed it.

    This was obviously because configurations were being applied but it wasn't obvious in the UI and I would worry about any inexperienced SSIS developers coming along and being completely confused about what was going on.

     

    Reply: We've incorporated this and added a message to the "Messages" group in the task list when a configuration is being applied in the designer. (We chose not to place it i the output window, since that's more often used for build and execution time messages).

    That's great news. Small enhancements like this can sometimes make a world of difference, especially to inexperienced SSIS users. Just make sure you keep that task list panel open in Visual Studio or else you'll never see these messages. (N.B. I don't see a messages group in the VS task list so perhaps this should be the Error List panel. We'll see when it ships.)


    And the second:

    New @[System::ParentContainer] variable please

    Request: I want to be able to see WHY a task executes. What is the context in which that task executes? This is particularly valuable when a sub-package is executed multiple times within an ETL job - currently its very difficult -nay, imposible- to understand the order in which thigns execute, especially when things are executing in parallel.

    The current flat list of events that the log providers return is useful but not useful enough. Its not CONTEXTUAL. I would like to (for example) build a log provider that indented the name in the logging tabble X number of times where X is the number of ancestral containers that that container has. For example, something like this:

    \Package1 Starts
    \Package1\SequenceContainer1 Starts
         \Package1\SequenceContainer1\DataFlowTask1 Starts
         \Package1\SequenceContainer1\SequenceContainer2 Starts
             \Package1\SequenceContainer1\SequenceContainer2DataFlowTask2 Starts
         \Package1\SequenceContainer1\DataFlowTask1 Ends
             \Package1\SequenceContainer1\SequenceContainer2DataFlowTask2 Ends
         \Package1\SequenceContainer1\SequenceContainer2 Ends
    \Package1\SequenceContainer1 Ends
    \Package1 Ends

    Maybe this isn't exactly how I'd present the information but hopefully you get the idea. This is MUCH more useful, much more contextual, than a flat list

    I would like a new system variable, scoped to every container, called @[System::ParentContainerGUID]. It would contain, obviously, the GUID of the parent container (except in the case of the package container itself where this property would probably be NULL).

     

    Reply: We are going to add a new system variable "ParentContainerGUID" for each task, sequence container, For loop, Foreach loop, and Package in SQL Server 2008.

    Knowing a container's parent container (and thus ancestral containers) is really valuable information for anyone that is implementing their own custom logging solution so I'm delighted to see this put into the product.


    The turnaround time on these connect submissions has been less than 2 weeks (for the first one it was 3 days). See, using Connect really does work.

    Expect to see both of these features in an upcoming CTP.

    -Jamie

  • Conchango blogging on the upturn

    An email from Paul Dawson went around one of Conchango's internal distribution lists a few weeks ago with a call to action for people to contribute more to our blog site. Apparently our tag cloud wasn't fully representative of the things that we are good at:

    clip_image001

    I would agree with that. Whilst it is gratifying for me personally that the blogging I do features so heavily I know that I don't represent the work that we do and have always been keen for more people in the company to pick up the blogging baton. Of late that has been happening as Paul himself, Michelle Flynn, Richard Wand, James Saull, Anthony Steele, Paul Galvin, Tayler Cresswell, JRJ (that's what everyone calls him), Derek Dunlop and others (sorry to those I haven't mentioned) have all visibly increased their output. Most notable is the rise of our interactive media team in this friendly rivalry between the different teams within Conchango which is great to see because that is probably the area in which differentiate ourselves the most. Is it having an effect? Let's take a look at the tag cloud as it appears today on the brand new Conchango.com:

    image 

    At first glance it might not look too different but clearly Web 2.0 activity is increasing, as is recruitment. I would expect it to change more in the coming weeks - I hope it does.

    Of course, you won't want to subscribe to all of these blogs individually so in order that you don't miss the wit, knowledge and inane waffle of this highly-tuned machine make sure that you stick our blogs site's all-encompassing RSS feed onto your reader. Go on, you know you want to.


    In other related news I've been asked by one of our retail dynamos, Howard Pull, to publicise an event that we've got coming up in the next few weeks.

    clip_image001[5]

    Conchango is running a free invitation-only practical workshop for retailers to identify how to increase Customer value, Product profitability and business performance using Business Intelligence (BI). Many retailers have existing data assets that they are not using to full effect, and we will focus on how you can accelerate your ROI using investments you may already have made.

    The half-day workshop will be run by Conchango Retail and BI Consultants, along with our guest speaker, Michael Azoff, a leading BI Analyst from Butler Group. We will focus on the areas you identify via a 7 question survey as your most important.

    Conchango is working with many of the UK’s leading retail organisations such as Tesco, Comet and Marks and Spencer to help them realise the true benefits of BI. We work with them to identify how to exploit their data to support strategies for Supplier Management, Store Management, SKU rationalisation, Cost Optimisation, increased Customer Profitability or just business Performance Measurement.

    Let us help you by joining us in this hands-on workshop, where we’ll expect you to walk away with some practical solutions to your data issues. 

    What: Business Intelligence in Retail - half-day workshop

    Where:                Energy Clinic, 132 Commercial Street, London, E1 6NG

    When:                  09 April 2008, 8:30am – 12:30pm

    Register:             To register for a FREE place please email talktous@conchango.com

    Agenda:

    • A Welcome Breakfast
    • Introduction – Iyas AlQasem, Technology Director and Head of BI Consulting, Conchango
    • BI Market View – A market overview on the latest BI Trends from Michael Azoff a leading BI analyst for Butler Group
    • Workshop - Part 1 – Addressing your Top 3 BI concerns
    • Performance Measurement & Management - David Ellis, Head of Performance Management at Conchango
    • Workshop - Part 2 – Planning your BI strategy
    • Range Planning Demonstration using Microsoft PerformancePoint
    • Summary - Round up of the days findings & close

    We would like to hear what your BI and data issues are by asking you to complete a 3 minute online survey, consisting of only 7 questions. By way of thank you we will share the survey results with you (excluding the company names of course) so you can see what your industry peers are thinking.  

    To complete the survey click here: http://www.conchango.com/retailbisurvey

    It should be a great event. If nothing else you get to meet my boss, Iyas AlQasem, and ask him what a pain in the @rse I am to manage :)

     

    We also involve ourselves heavily in community events as well as corporate ones and Michelle is the main driver behind this. In addition to hosting the London .Net user group earlier this week and the London SQL Server/BI user group meetings last April we are also going to be hosting the inaugural UK Silverlight User group meeting at our office on the London southbank. Paul has more info.

    Finally we are, as ever, recruiting for more talented people to join our team so if you want to see your name up here let me know. Remember, all of our hiring is done through direct contact. We don't use recruitment consultants.

    -Jamie

  • SSIS: Connecting to Oracle just got a whole lot better

    There was some great news this week for SSIS developers that are building solutions involving databases other than SQL Server; Microsoft will, later this year, be releasing new connectors for Oracle, Teradata and SAP BW. Check out Darvey Lavender's announcement on the SSIS connectivity wiki or read Matt Masson's blog entry for more details. I should point out that you will need Enterprise Edition of SSIS in order to use them and I also don't know whether the connectors will work on SQL Server 2005 or whether they will solely be for SQL Server 2008. Hopefully by the time you read this someone in the know will have left a comment here filling in the gaps.

    For Oracle you now get the following:

    • Oracle connection manager
    • Bulk load destination

    This signals a marked change in policy from Microsoft. Precisely three years and one day ago Darvey's predecessor Donald Farmer posted a blog entry entitled Oracle and SQL Server Integration Services where he stated:

    I do not expect that Microsoft will write an Oracle fast loader - currently it comes in around number 999 in my list of 1000 features for next version, just slightly ahead of recompiling for Linux.

    Clearly this feature got raised up the priority list. I wonder if this was Darvey's influence that led to this change in direction? Or perhaps the european commission's?

     

    For those who either can't wait until the connectors are released or can't afford enterprise edition there are still options available to you. Rather than repeat myself you should head to my previous blog entry Persistent Oracle Connector.

    -Jamie

    P.S. For more information about Oracle and SSIS go here: http://blogs.conchango.com/jamiethomson/archive/tags/Oracle/SSIS/default.aspx

  • SSIS: Populate the Lookup component cache from a flat file

    Its a commonly held belief that the Lookup component cache has to be populated from a relational source.  Well, that's true, but there is a pretty handy workaround. SQL Server 2005 introduced the OPENROWSET(BULK) command which can return the contents of a file as a recordset hence something like the following will successfully populate the Lookup cache:

    • SELECT a.* FROM OPENROWSET( BULK 'c:\test\values.txt', FORMATFILE = 'c:\test\values.fmt') AS a;

    Thanks to Grant Dickinson for letting me know about this little tip. I've actually been meaning to blog about this since he told me about it way back in February 2007 but I never got round to writing a substantial blog post around it. So instead this evening I just thought, to hell with it, just throw SOMETHING up there and have done with it. So here you go.

    As I'm sure many of you aware by now, this workaround won't be necessary in SSIS2008 due to the enhanced Lookup functionality.

    Hope that helps!

    -Jamie

  • New SSDS presentation at VisitMix.com

    Have you noticed yet that I'm getting excited about SSDS?

    More information is trickling out about SSDS day by day. Yesterday there was a presentation on SSDS at MIX08 by Nigel Ellis which was more in-depth than anything I'd seen or heard before because it talks about the underlying architecture that will be supporting this. Take this typical architecture diagram for instance:

    image

    There's a plethora of useful information in the presentation so if you've got any interest at all in SSDS I recommend you go and take a look. I can't find a permalink to the video but just go to http://sessions.visitmix.com and search for "Introducing SQL Server Data Services" (they're in alphabetical order). UPDATE: Here's the permalink: http://visitmix.com/blogs/2008Sessions/BT05/

    There is also a blog at http://blogs.msdn.com/ssds including an entry that tries to convince us that this is something different to Amazon SimpleDB.

    -Jamie

  • SSIS: Data Profiling Task: Part 9 - Value Inclusion

    In this blog entry I am going to outline the function of the Value Inclusion profile request in the new Data Profiling Task in SQL Server 2008. BOL states about this profile request:

    Computes the overlap in the values between two columns or sets of columns. This profile can determine whether a column or set of columns is appropriate to serve as a foreign key between the selected tables.

    This profile also helps you identify problems in your data, such as values that are not valid. For example, you profile the ProductID column of a Sales table and discover that the column contains values that are not found in the ProductID column of the Products table.

    Or to put it another way, it tells you if one set of values is a subset of another.

    Again, I don't really have any good test data for this so I'm going to create some of my own. If you've read the other blog entries in this thread then you'll know that the [HumanResources].[Department].[GroupName] field has got 6 distinct values in it. I'm going to load 4 of them into a new table:

    select    top 4 GroupName
    into    ValueInclusionTest
    from    HumanResources.Department
    group    by GroupName

    Here's how I configure the task:

    image

    I've highlighted the important bits. We're finding out which values in [HumanResources].[Department].[GroupName] do not exist in [dbo].[ValueInclusionTest].[GroupName]. Also note that I've set InclusionThresholdSetting='None' in order to ensure that we get some meaningful data back. You can fine tune that and other properties to make sure that the task works for you. Here's the results:

    image

    So, we know that not all of the values in [HumanResources].[Department].[GroupName] will exist in [dbo].[ValueInclusionTest].[GroupName] and hence that's why we've got an inclusion strength of 68.75%. We also get an 'Inclusion Violations' section showing us the offending values.

     

    This concludes my brief summary of the Value Inclusion profile request in the Data Profiling Task that is being introduced with SQL Server Integration Services 2008. For more posts in this series please refer to SSIS: Data Profiling Task: Part 1 - Introduction

    -Jamie

     

    Disclaimer: The information in this blog post is correct at the time of publication. It may have changed before SQL Server 2008 is released.

More Posts Next page »

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems