blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Conchango are busy and need top level Technical Architects for Microsoft & Open Source platforms in and around London. Interested? Email me or send me a message

SSIS: The backlash

UPDATE: I have posted a response from Denise Draper, head of the SSIS team, here: http://blogs.conchango.com/jamiethomson/archive/2007/07/30/SSIS_3A00_-A-response-from-Microsoft-to-the-growing-criticism.aspx 

A colleague recently pointed out this thread to me from Oren Eini's blog. The post is entitled SSIS' 15 Faults and boy-oh-boy does this guy have it in for SSIS. So do alot of the people leaving comments.

I have to admit that Oren and his commenters raise some great points that I wholeheartedly agree with. A far-from-perfect design UI, overly verbose errors, painful deployment strategies, VERY poor expression builder, less-than-perfect support for other platforms, too many mouse clicks required to do many things, maintaining different packages that do basically the same thing save for differering metadata, no 'UNDO' functionality, packages are effectively binary objects in regard to source control, dead hyperlinks to help pages, awfully crappy SQL editor...the list is pretty long and doesn't make great reading for the SSIS team. I would suggest that fixing these pain points should be a priority for katmai and that the powers that be give the SSIS team the manpower they need to do that - unfortunately I have it on good authority that that is not happening.

There are other points on there though that I don't feel are warranted and I think it only fair to reply to them which I'm going to do below. Not all of them are Oren's points either - many of them are from his blog commenters.

  • I wish I had a dollar for every time that SSIS kept track of something it shouldn't. Be it the old configuration, hard coding the connection string inside the package and completely ignoring the configuration JT: I have never seen this happen in three years of using the product. If it seems as though configurations are not being used then they have been setup wrongly. That is not to say that the process of setting them up couldn't be improved.
  • Security? Who needs that JT: Is this a serious comment? the Trustworthy Computing Initiative was instigated in Microsoft to ensure products were MORE secure. As a result SQL Server is now considered to be the most secure database platform on the planet (http://news.com.com/2008-1002_3-6164785.html?part=rss%26tag=2547-1_3-0-20%26subj=news).
  • I should also mention that SSIS packages requires sysadmin rights to run when scheduled as a job JT: This is completely untrue. It is possible to setup proxy accounts that are not sysadmin in order to run packages.
  • Bad configuration scheme: This bring me to the configuration scheme itself, yes, you can put the configuration anywhere you want, from incomprehensible XML files with hard coded paths to SQL tables to which you will have to have an hard coded connection string, from environment variables (WTF?!) to stored inside the package. The problem with all those options is that SSIS seems to choose between those at random. At times it would choose the connection that it had assimilated inside the package, at times it would go to the wrong file, get the old configuration, or just complains that it is not having fun and that I should baby sit it again. JT: Back to my point above, if this is happening then the configurations have been setup wrongly. It NEVER chooses configurations at random. It would be good if the person making the point could make some suggestion as to how it could be improved because if people are experiencing this then there needs to be improvement somewhere. And what's the issue with environment variables?
  • Today I had a dynamically generated SQL inside an SSIS package that was causing an error. I had no way of finding out how to get this dynamic SQL (which was stored in a variable) JT: Use a watch window (http://blogs.conchango.com/jamiethomson/archive/2005/12/05/SSIS_3A00_-Using-variables-to-store-expressions.aspx).
  • I lost count of the many things that I consider a given that SSIS doesn't have [such as] the basic UPSERT support JT: Hmm...not sure about this one. UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it? Not sure why this is SSIS's fault. Perhaps I'm misunderstanding in which case I'm happy to be put straight.
  • connection "managers" cannot cope with simulataneous multiple data flows JT: Shame this wasn't elaborated on so I'll answer the best I can. Connection Managers only maintain connections to external data sources so if the package is performing badly it could be because of those external data source - that's not the fault of SSIS. The other most likely cause is bad package design.
  • My biggest gripe though is how SLOOWWWW it is. If you have a solution with say eight complicated scripts in it it can take 5mins to load on my dual-core Pentium-D 930 with 3Gb RAM JT: There is room for improvement here in the bloated VS shell but mainly its important to understand WHY this is happening. When a package opens up it tries to validate all external connections. If this is taking a long time then the blame is on the external connections and the network inbetween, not on SSIS. It is possible to turn off this validation by selecting 'Work Offline' from the SSIS menu.
  • Setting 'sort order' on a column, does not actually sort the data JT: Its not supposed to. RTFM
  • No throttle control. It does allow you to set some thread settings, and write out files to a temp path as it does what it wants, but I had cases where it would absolutely swamp a well-built dev machine so much that the machine became absolutely unusable as it ran. JT: This can be solved by better package design and performance tuning. Its a fact of life that the more work a CPU has to do then the slower it will do stuff so design your package accordingly. I think the kind of throttling that the commenter was after can be achieved using raw files.
  • stupid proprietry datatypes JT: The guy that complained about this complained about it on the MSDN forum as well. He wanted all the SSIS data types to be the same as SQL Server. I disagreed with him then and I disagree with him now. SSIS is a tool for integrating data from heterogeneous places - something that, by and large, it does pretty well. Hence, the data types are generic enough to allow this, they are not tied to any one particular platform. If that means they are 'proprietary' then so be it.
  • There really should be a configurable store option for NO SECURITY, the protect sensitive and whatever else are fine but really don't lend themselves to working in a dev team where you need to share these files.  JT: See earlier comment about security. I feel the guy's pain on this but trust me, SSIS can easily be used in a multi-developer environemnt. I know this because I'm currently working in one.


That's all. I'm going to post a comment on Oren's blog with a link back to here so I'm fully expecting some angry comments. Watch this space.

-Jamie


UPDATE: Oren has posted a response here: http://ayende.com/Blog/archive/2007/07/27/SSIS-The-backlash.aspx

I tried to reply but for some reason his blog barfed when I hit submit (I guess cos its too big) so here it is below instead:

 

Hi Oren,
I'll take those in turn (bit difficult in unformatted text but here we go)

Oren: "Here is a simple story, I had canceled the package configuration, reconfigured a data source to point to the test database, and run the package. It executed itself against the production database!! Once I found that out, I managed to see it do it twice (while the data source point to the test database!), but I haven't been able to consistently reproduce it since"
JT: I really wish I could explain this but I'm non-plussed. As I alluded to in my post the whole interface to configurations could be improved but I still maintain that the behaviour is predictable. I wish I could come to your desk and see it happen for myself. I refer you to Sam Loud's comment on my blog entry where he states "Point I'm making is that it's easy for beginners  to make mistakes with configurations. Once you've know what you're doing, they work just fine." I'm not trying to tag you as a beginner Oren, just emphasize the point that the behaviour of configurations IS predictable, whilst accepting that grasping it isn't simple. If I had witnessed the same behaviour as you then I wouldn't be writing this now but I can honestly say that in 3 years of using this thing I've never seen it. Sorry. I guess we're not going to see eye-to-eye on this.

Oren: "the very next statement acknowledged the existence of proxy solutions, I still want to understand why this exists."
JT: But this is a SQL Server Agent thing, not SSIS. I feel your pain but if you're commenting on SSIS then its only fair to comment on things that are in the scope of SSIS right? If you're complaining about SQL Server in general and you have grievances then I suggest you address it to the correct people. I've no doubt there is an explanation for SQL Server Agent having to run as sysadmin - I just don't know what it is. SQL Server Agent aint my bag.

Oren: "The issue with environment variables is this is actually something that I would never consider for configuration. Putting a connection string in an environment variables is strange. "
JT: I would agree with that. I use env vars alot but definitely not to store connection strings. Is the fact that SSIS offers you an alternative that you are happy to disregard really grounds for complaint though? If you don't want to use it, don't use it. There are plenty of other options available (which I concede you are unhappy about also).

Oren: "I want to point to a configuration file that is in the same directory as the package, it doesn't let me handle it."
JT: SSIS doesn't allow relative paths that is true and there are good reasons for that (can be expanded on elsewhere if you wish). There are teqhniques to get around that and I'd be happy to share them with you. In fact, I've already blogged about them: http://blogs.conchango.com/jamiethomson/archive/2006/01/05/SSIS_3A00_-Common-folder-structure.aspx
I guess you will construe this as a bloated method and I realise trying to change your mind on that is futile. Again, we're just going to have to agree to disagree.

Oren: "It quite trivial to allow update / insert based on a given set of key fields, and it is certainly something that I would expect to see in an ETL product. Given the common need for this."
JT: Oh ok, I gotcha. This is eminently possible through use of the OLE DB Command component. No doubt you will complain about the crappy interface of the OLE DB Command and I am 100% with you on that one. As I said on my blog post and as you have observed (in slightly stronger tones Smile ) the design UI needs LOTS of work.

Oren: "Um, there is something that is called a background thread, and it is used to do work without freezing the UI"
JT: Fair enough - I don't have enough knowledge about the VS shell to comment here.

Oren: "Working offline is not a valid option, because then you get a whole lot of validation errors"
JT: So what? As long as you know that they're due to you working offline what's the problem? Just ignore them. I know I know...we won't agree on this.

Oren: "how do you handle two developers working on the same package? How do you handle branching and merging?"
JT: I hope to expand on this in a dedicated blog post soon (someone replied to my post asking me to do so as well) so I'll talk more about it there. Headline though is that we have a framework that doesn't require multiple developers to work on the same package because we split "work" over seperate packages and that approach works really well for us. If your complaint is around the lack of ability to work on a task in isolation then again I am 100% with you and I have been complaining about that since I first got my hands on the product.
Branching and merging? Never had a requirement to do that yet but if/when we do and if/when I run into problems I'll (try to) remember to let you know.
A key point here is that multi-user development is possible by seperating work over separate packages and I would view that approach as best practice anyway.


My colleague Howard Van Rooijen is keeping me abreast of your updates but if you respond and we miss it, let me know at jamieDOTthomsonATremovethisConchangoDOTcom or by hitting the 'contact me' link on this blog.

I would really like someone from the SSIS team to respond to these complaints because I feel like I'm doing their job for them. I know from past experience though that they respond alot better to constructive criticisms than flaming.

Although Oren, his co-complainants and myself are failing to agree on a lot of things I think this is a really good discussion and is bringing some important issues out into the public domain. I hope it proves beneficial.

-Jamie

UPDATE: David Reed from Microsoft (but not from the SSIS team) has posted his own reply here: http://blogs.msdn.com/reedme/archive/2007/07/29/re-ssis-15-faults.aspx

 

Published 27 July 2007 06:23 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Philmee95 said:

I have run into many of these issues and at least it is not just me. I can especially vouch for the ignored config files and hard coded connection strings. I can't seem to find a good config article that works, especially not on microsofts sites. It has been pointed out by redmond developer magazine that  MS documentation is getting worse and worse and I agree (they are not the only ones Adobe :). SQL 2005 is not in alpha or CTP status anymore. I like that they are trying to involve the community in documentation and examples, or are they really just outsourcing the documentation team to the user community.

We pay a small fortune for all these products, and so yes we do expect a lot.

I tried Talend Studio out, but saw that is was comparable to SSIS. So I stuck with the beast I kind of knew since the money was already spent.

July 27, 2007 07:52
 

Sam Loud said:

I find the complaints about configurations somewhat...lame.

I use configurations a heck of a lot - I can't think of a package where I don't use them. I use environment variables, xml files, dB tables, the whole 9 yards.

In every occasion they perform as expected. Saying "..The problem with all those options is that SSIS seems to choose between those at random.." is crazy. Believe me, I would have noticed if packages "randomly" wrote data to our production servers.

HOWEVER...when you first start using configurations, you make loads of mistakes.  The number of times I 've changed a connection manager in a package, then sat there cursing as it ignored my changes. Of course, I'd forgotton that the connection manager was being populated from a configuration.

Point I'm making is that it's easy for beginners  to make mistakes with configurations. Once you've know what you're doing, they work just fine.

July 27, 2007 09:01
 

adolf garlic said:

So first you say that SSIS isn't tied to the server:

SSIS is a tool for integrating data from heterogeneous places

And then you say it is:

UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it?

So which is it?

That's right, it's a telephone toaster.

If two things are trying to write to the same flat file via a flat file connection manager, it breaks. Hardly "managed" is it?

Sort Order - this is just not intuitive.

I notice you only address 12 of the "complaints"

I counted at least 20 :)

Why bother defending something that is so obviously lame? Compare the amount of time the .net team had to develop their product and their UI. It far superior, it is a world class product that no-one speaks ill of, now in its 3rd IDE iteration. Compare this to SSIS. As you have previously stated, SSIS is the most popular MS forum. Why? Because no-one can figure out how to use it!

I am going to recommend to anyone I work with not to use SSIS until at least the 2008 version. It's just not ready yet.

It's the windows 98 of the database world.

July 27, 2007 10:56
 

jamie.thomson said:

Hello Adolf,

Thanks for taking the time to comment.

"

So first you say that SSIS isn't tied to the server:

SSIS is a tool for integrating data from heterogeneous places

And then you say it is:

UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it?

So which is it?

"

I don't see how these two statements are contradictory. Whatever operation you happen to be performing on a destination then that destination has to support it. Simple but crude example - you wouldn't try and create a text file on an OLE DB Destination would you?

"If two things are trying to write to the same flat file via a flat file connection manager, it breaks. Hardly "managed" is it?"

Interesting point of view. You are still at the mercy of whatever you are inserting to and if the filesystem doesn't support 2 connections going after the same resource then there isn't much SSIS can do. I take your point though, if one connection has a hold of a file then the other connection should "wait" right? Is that what you mean?

"Sort Order - this is just not intuitive."

Neither is the train of thought that says "This property is called SortOrder therefore if I change it it will change the soring order". Its a property, not a method. [I presume you are talking about SortKey by the way, there is no SortOrder property] If you are making an assumption about the behaviour of something and that behaviour turns out to be false then more fool you I'm afraid.

"I notice you only address 12 of the "complaints""

Like I said at the top of this article I agree with a lot of them and I was addressing the ones that I felt were unwarranted.

"Why bother defending something that is so obviously lame?"

I'm defending the things that are unfair criticisms so that someone reading can form their own opinion. Is it wrong to present two points of view? Your assertion that it is 'lame' is your opinion. My opinion that it is NOT 'lame' is my opinion. I'm presenting the facts so that other people can judge for themselves.

"Compare the amount of time the .net team had to develop their product and their UI. It far superior, it is a world class product that no-one speaks ill of, now in its 3rd IDE iteration. Compare this to SSIS."

OK, I will. SSIS is on its first iteration, as you say the .Net products are on their third. Also, I can promise you that the budget the .Net group have will far outstrip (by orders of magnitude) the budget of the SSIS team and if you think that isn't significant then you're not living in the real world. The SSIS team has approx 20 people on it - i suspect the developer division under Scott Guthrie and S Somesagar (sp??) has hundreds.

"As you have previously stated, SSIS is the most popular MS forum. Why? Because no-one can figure out how to use it!"

I certainly won't disagree with that.

I want to reiterate the fact that there are lots and lots of good points in Oren's article and the SSIS team would do well to heed them. In this blog post I'm addressing those that I believe are unwarranted.

Thanks again. This is a healthy discussion.

-Jamie

July 27, 2007 12:42
 

jamie.thomson said:

P.S. Don't think that I am oblivious to the shortcomings of SSIS, I am only TOO aware of them. If you want proof then check out the 100+ suggestions and bug reports that I have made at connect.microsoft.com and the criticisms that I have made on this site in the past.

I beleive there is a right way and a wrong way to present complaints/feedback and venting your spleen isn't the right way. Constructive criticism is the most beneficial approach.

-Jamie

July 27, 2007 13:03
 

Kory said:

Jamie- you mentioned you've worked with SSIS in a team environment successfully.  If you could point me to a good list of resources on the best practices how to use SSIS in team environment, that would be great.  I usually develop ETL on my own in the places I've worked, but occasionally have to manage the same packages with a few other people and I haven't found a happy medium.  It seems we're always entering passwords for many packages or re-entering connection strings, or the package is stripping off credentials everytime a new user opens it.  I've tried to use configurations to store credentials for all the packages, but it still seems to want to validate the stored credentials first before reading the config file...

Just a simple "Best Practices for SSIS Team Development" is what i need :)

July 27, 2007 14:14
 

Luke Melia said:

We've found working with SSIS to be a relatively unpleasant experience and I agree with most of Oren's points.

A big part of the frustration for me is the file format. SSIS packages are complex XML documents which mix VS Designer information with ETL logic in a way that is nearly impossible to use with standard diff tools and understand via SCM logs what changes were made to an SSIS package in a particular revision. This affects how developers are able to collaborate, playing into the team development discussion.

Cheers from NYC,

Luke

July 27, 2007 18:45
 

jamie.thomson said:

Kory,

This is a good idea for a blog post. I'll try and put something together soon. No promises mind you :)

-Jamie

July 27, 2007 19:03
 

jwelch said:

I agree that the format of dtsx files can be a pain, particularly the way it can change the ordering of items abitrarily. However, I have found a lot of the collaboration issues can be worked around by keeping packages small and focused, and using multiple smaller packages rather than one big one.

July 27, 2007 19:06
 

Phil Brammer said:

To back up Jamie:

http://www.ssistalk.com/2007/07/27/ssis-15-faults-rebuttal/

To Luke's point, have you worked with Informatica?  I'm not sure they use text files that are readily "diff-able".  That is, this is a product in its infancy, give it time to grow.

To Adolf...  You wrote, "It's the windows 98 of the database world."  B.S.!!  It's free.  Go buy Informatica or something.  Jamie's comment regarding the budget is true.  These guys built a product right out of the gate to fulfill an ETL space in their tool belt.  They did it, and packaged it with SQL Server.  You buy SQL Server you get SSIS.  Pretty neat, huh!  So until you have to pay for it, it's a touch absurd to say this product sucks.  It's pretty dang powerful (you mean I can program in .Net INSIDE my SSIS package?!?!?) for a first generation product.

July 27, 2007 19:28
 

AJ said:

Jamie - came across that post today and went through to your counterpost. Keep up the good work. I too agree that since the beta we have identified numerous issues with the implementation of SSIS...with that being said, all of this arguing that it is 'lame' obviously comes from individuals that do not know how to properly use the product. I am using SSIS on numerous projects ...everything from data warehousing to simplistic desktop processes that run to give me reporting data to keep my employees on track....in fact BI is becoming the fastest growing business driver where I am at.

I just believe that everyone who wants to use a new technology needs to understand that there is going to be a learning curve. Yes, the learning curve for SSIS is steep in some aspects but once mastered opens up a whole world of opprotunities...

Keep the up the good work ...

Cheers,

Aj

July 27, 2007 19:36
 

jamie.thomson said:

Hi Aj,

Thanks for the comment.I'm not sure those individuals would be too happy with your assertion that "all of this arguing that it is 'lame' obviously comes from individuals that do not know how to properly use the product" but that's an argument I'll gladly steer clear of :)

Your point about SSIS having a steep learning curve is true. Very very true indeed.

-Jamie

July 27, 2007 20:09
 

jamie.thomson said:

Further to my reply to Adolf above. I take back what I said about Soma and Scott Guthrie's division having hundreds of people in it. Its actually got thousands:

"The largest deployment is our own Developer Division server supporting 2,500 users. "

http://blogs.msdn.com/somasegar/archive/2007/06/18/so-what-does-microsoft-use-for-software-development.aspx

Do you agree that comparing SSIS side-by-side with .Net products is unwarranted?

-Jamie

July 27, 2007 20:35
 

Kyle S said:

Jamie,

Keep up the good work.  

I usually don't post, but I felt compelled.  On my current project we are using SSIS to communicate with 12 databases on 3 different servers (one on the west coast, one in the midwest and one in, you guessed it, the east coast).  We call one package 12 times, just pass a different connection string.  We are also connecting to Oracle and using some views.

Don't get me wrong, it took some time to configure it to work correctly, but once you know the order of execution for configurations it's a breeze.

(http://weblogs.sqlteam.com/dmauri/archive/2006/04/02/9489.aspx)

Believe it or not, the link above helped a lot in understanding the order of things.

Anyways, I guess my point is... one can't expect any piece of software to be "perfect." But we can provide our complaints / constructive criticism and hope for the best.  :-)

SSIS is a great, FREE product.  And when the client hears "free" or that they already have the product, they just don't know it... that makes their day.

Just my .02 cents.

July 27, 2007 20:41
 

Ayende Rahien said:

> Oren: "Working offline is not a valid option, because then you get a whole lot of validation errors"

> JT: So what? As long as you know that they're due to you working offline what's the problem? Just ignore them. I know I know...we won't agree on this.

because then I have to go to each of the components and refresh them, meaning that I still have the perf issue, but I also have invalid package.

July 27, 2007 20:41
 

Ayende Rahien said:

> Do you agree that comparing SSIS side-by-side with .Net products is unwarranted?

No, because supporting basic operations like  source control is something that I would expect from a team of one, frankly.

> Headline though is that we have a framework that doesn't require multiple developers to work on the same package because we split "work" over seperate packages and that approach works really well for us. If your complaint is around the lack of ability to work on a task in isolation then again I am 100% with you and I have been complaining about that since I first got my hands on the product.

That is the classic Source Safe model, and it breaks down if you have any common thing that you need to change. This means that if I spot a bug in your package, I need _you_ to fix it, because there is no way to merge the changes.

> Branching and merging? Never had a requirement to do that yet but if/when we do and if/when I run into problems I'll (try to) remember to let you know.

A key point here is that multi-user development is possible by seperating work over separate packages and I would view that approach as best practice anyway.

Production database and dev database, both under active development, that seems to be fairly common scenario to me. I need to make changes to one part of the package (fix a join predicate) (production) and add a column to the query in another (dev).

No way to merge those changes manually.

What is worse, there is no way to actually diff the original version to the current one and know what has changed.

July 27, 2007 21:13
 

jamie.thomson said:

Hello again Oren,

As I have alluded to I agree the lack of merge capability is a problem. A frustrating one. We solve this problem by engraining into each developer the need to do a Get Latest ... before they start to work on a package.

Its crude but it works for us. If someone forgets to 'Get latest' then we don't have too much trouble backing up and applying the changes into the latest version.

I can only speak from experience and that is that I am working on a 3-SSIS-developer project which went live last Wednesday after 10 months (yeah, its been a long haul) of development and we have not had any insurmountable issues, or even temporarily blocking issues, arising from having multiple SSIS developers.

"Production database and dev database, both under active development, that seems to be fairly common scenario to me. I need to make changes to one part of the package (fix a join predicate) (production) and add a column to the query in another (dev)."

I fully understand your point here. The branching and merging thing does worry me a bit but we haven't had tackle that one. Yet.

How about all the other points that I responded to? Or is it best just to let them lie for fear of falling out? :)

As an aside, this blog entry has resulted in more activity in one day than any of the other 473 posts I have ever written so for that, I thank you.

-Jamie

July 27, 2007 21:31
 

SimonS' SQL Server Stuff said:

I recently visited a client that was achieving wonders with DTS, I was there to give them pointers on...

July 27, 2007 23:17
 

Ayende Rahien said:

Jamie,

Here is what I didn't respond to so far:

Configuration - I disagree with you, but I don't see much point continuing the discussion in this subject. At least until I have a repro that I can post.

Jobs & SSIS - You say it is a SQL Agent issue, I don't care what is the root cause, I care for the results. It is still an annoyance that I see only with SSIS.

Envrionment Variales for configuration - No further comment.

Relative paths - I'll take a look at this, but it is probably too late for this project, and I don't intend to use SSIS again.

Slow VS - We agree here.

July 28, 2007 19:00
 

Mack Davis said:

>>JT: Hmm...not sure about this one. UPSERT is an operation that would have to be supported by the database platform being inserted/updated wouldn't it?<<

MS unceremoniously dropped upsert support (in the form of the MERGE statement) from the production version of SQL 2005.  To date I haven't seen any explanation...  it was in the pre-release versions, and then *poof* it was gone.  I even have an old book ("First Look at SQL Server 2005"?  or something to that effect) that touts this feature before it was ripped out.

I have personally written custom ETL tools in C++ for SQL 2000, 7.0, and even 6.5 (using Bulk Copy API and temporary tables) that do have "upsert" capability.  I've even created ETL tools that allow you to specify the action to take with flat-file records on a per record basis (insert, update, merge, delete).  Presumably this could be duplicated in SSIS in a fairly efficient manner, though it may be more complex than necessary.  I haven't had opportunity or reason to try it in SSIS myself yet.

As for some of the other issues raised, I haven't had the opportunity to create very complex SSIS packages yet.  That's why I find it a little surprising that I've run into so many of the problems Oren and his responders mentioned.  I am relieved to know I'm not the only one having so much trouble with SSIS.

Fortunately I've learned (the hard way) that early adoption of MS software does not pay great dividends in most cases.  It's great if your business can afford to spend a lot of time and money in R&D with the newest stuff, or if you're a consultant being paid on an hourly basis and your client has plenty of money to throw at the problem.

Microsoft has conditioned many of its customers to adopt a very measured (almost cynical) response concerning adoption of their new releases.

July 29, 2007 03:37
 

david Cohen said:

Hey Jaimie,

I have the problem about external validation.  And I believe I have come up with a good soltion. I ahve seen near 50% increases in package execution times.

1) always use a variable as a select command.  The variable string should start with a select statment in it.  The select statement should be the same as the eventual statement but with and "1=0" in the where clause.  This will let validation happen extremely fast, as validation happens with the initital value. (in fact, the1=0 select statement does return metadata, just not any rows)

2) During the execution,before the statement executes, change the value in the variable, and remove the 1=0.  Now your select runs as supposed to, and validation is done almost instantly.  

July 29, 2007 14:44
 

jamie.thomson said:

Oren,

This seems like a good place to end it. Thanks again for your work here - you've brought some important issues out into the open.

Mack,

I didn't know that MERGE was slated for Yukon. That's interesting to note. I guess that there's is only so many things they can do and other engine features (e.g. Service Broker) were considered higher priority. Note that MERGE is very much part of katmai and is in the current available CTP.

As I said above, upsert capability is very much supported in SSIS2005 through use of the OLE DB Command.

Interesting point about Microsoft v1 products. Given my experience over the past 7 years I'm inclined to agree. Certainly traditionally Microsoft's priority is to "get something, anything, out the door".

David,

Interesting comment. Given what I know about validation I'm surprised that "WHERE 1=0" has an effect but obviously that's something I need to look into. Thanks for making me aware of it.

-Jamie

July 29, 2007 18:12
 

Denise Draper (MSFT) said:

Wow --- there's a lot of good material in here, and it will be required reading for the SSIS team, along with Oren's original post and Phil Brammer's response as well.  Everyone has done a great job of covering a lot of the specific details already (thank you Jamie, Phil and others), so I won't try to cover everything.  I'll just add some thoughts that come to mind:

DTS was pretty popular --- in fact it was hands down the most popular ETL utility out there.  But we knew the design of DTS wouldn't scale to upper-0end ETL requirements, and we heard from customers that they wanted and expected us to do that.  So we bit the bullet and built a completely new product.  There are lots of aspects of the product that I'm very proud of (I say "I" but it wasn't me who shipped it, it was my predecessor) --- the clean separation of control from data flow, the way error flows are handled within data flows, and the open extensibility through scripting and coding are some of my favorites.  There are some places where we didn't get mature enough in the design and we tried in those cases to have a very open solution, so that our users could adapt the product to their own needs --- package management and deployment, configuration and logging all fall into those categories.   In all of these cases, there is confusion (justified) as to how to use the product, but it is usually possible to adapt to the needs of any particular installation.

So one question I ask is: did we achieve a platform on which we can continue to build functionality that will scale up to data integration generally?  Is this the right architecture?  I think the answer is yes.  I know customers who have built astoundingly large solutions --- large in terms both of the amount of data handled and of the number and complexity of the packages that they manage.  And SSIS works for them reliably.   I know of lots of things that I want to change in the product, but I think SSIS was the right step in the right direction.

Some architecture choices are not so obvious.  For example, we have chosen to use generic open interfaces to access data.  On one hand, this lowers our cost (and hence your cost) greatly --- the portfolio of special-purpose connectors is a big part of the (huge) price of upper-end ETL.  But the downside is that we inherit whatever issues come with the data source drivers, which are not under our control.  We get a lot of issues about Oracle connectivity, for example, that have nothing to do with SSIS.  But if you are an SSIS user, that distinction doesn't really matter, does it?  This is an area we are still debating --- it really is expensive to own and ship our own connectors.

Another example of a choice that isn't obvious is the whole question of integration with Visual Studio and the development environment generally: I get feedback from some customers that SSIS is too "developer oriented" but from some developers that it isn't developer oreiented enough.  It looks as though folks on this thread are more from the developer camp, but let me ask:  in an ideal world, do you switch back and forth between graphic ETL design and code, or do you never see code, or something else?  Do you prefer very simple tools that you scale up to large solutions (by creating subroutines, macros, etc.), or do you prefer tools that have enough smarts in them to know most of the problems you want to solve?

There is still lots of things we want to improve --- we wouldn't be software engineers if we didn't feel that way, would we :-)   The next release of SQL Server is a small one, but it will fix some big annoyances: better scripting support, better debugability, better wizard.  But some areas, like deployment, were just too big to fit into this release --- we'll try to provide better guidelines in those cases.

Better organization to the documentation and better help getting started generally --- these are things that we know are needed and do intend to keep working on.  I look at something like Oren's original criticism of SSIS configuration and that is what I see --- it may be that there is something funky in Oren's setup, or it may be that he's come across a bug --- either way, what is clear is that the expected behavior of SSIS isn't clear enough to him for him to tell what is going on.

We have a lot of good plans on the drawing board.  Feedback like this helps us --- we hear about issues that we maybe didn't know about, and we use it to prioritize all the stuff we'd like to work on.  I know it takes longer to get new stuff out than any of us want, and that it's frustrating to deal with the flaws in the meantime.  But I still think there's already a lot of great value in SSIS once you get to learn it, enough to make it worth learning it.

July 30, 2007 06:42
 

adolf garlic said:

It's not free - how much of the company's time has it cost trying to figure out obscure errors?

To PB:

If a developer spends a month developing a calendar control vs. buying one off the shelf, which is cheaper? It would appear the extra purchase is more expensive, but if the developer can be coding the app from day 1, it is much cheaper.

london lite is free, so is metro :)

RS is a way better first generation MS product than SSIS.

To JT:

Do you work for Microsoft? No? Then how do you know how much is spent on team size, resources and so on. This at best speculative and at worst wrong. I can't imagine the RS team is too big and their stuff works.

July 30, 2007 09:49
 

Andy Clark said:

"Oren:Setting 'sort order' on a column, does not actually sort the data JT: Its not supposed to."

This clearly describes my key gripe with SSIS, the UI is often not intuative. The power is there to make SSIS do much more than we could with handcoded transfers or with DTS but if we can't work out how to use them then we won't select SSIS as our tool of choice.

The one thing that really flumoxed me was dealing with corrupt data in a Foxpro table which was happily skipped by DTS but caused SSIS to reject the whole lot.

July 30, 2007 09:49
 

jamie.thomson said:

adolf,

I don't know how much is spent but I know how many people there are. Were we discussing offline then I could give you all their names if you like.

I would be doing a disservice if I based what i say on assumptions so you have my word that if I were knowingly assuming something I would explicitly say so.

Point taken about the RS team. Remember I'm only trying to defend what I think is unwarranted criticism - as I said at the top I agree that there are a whole host of things in SSIS that need improving.

Regards

Jamie

July 30, 2007 15:12
 

Raul Black said:

The expression builder, SQL editor, etc. must have been designed and built by some genius over in China who is only two-years-old. I'm pretty sure that's why it sucks so bad. But that goes along with MS' motto of thinking outside the box and hiring the youngest employees possible with the highest IQ's.

But despite MS' lack of sense when it comes to hiring practices, SSIS is the ONLY MS product I have ever condoned using. I feel that its strengths more than make up for its faults. Perhaps you  have to be a code tech in order to fully appreciate the environment. And that of course leads to maintainability issues. So there the vicious circle begins. But really, what do people expect? Are they forgetting that SSIS if first and foremost a Microsoft product?

July 30, 2007 17:58
 

Brandie Tarvin said:

Jamie,

Actually, I understand Oren's comments about the config files and connection managers.  For the first three months we used SSIS, we had similiar issues out the ying-yang.

The main problem seems to be learning where all the little bits of data are hiding.  You can change a connection manager to point to a different server, but if you forget to change the config file, the connection manager gets overridden the instant you go into Debug mode.  Yet, when you look at the ConnMgr, it says what you told it to connect to.  Additionally, if you're using multiple config files for different environments, you have to double & triple check all your paths, including in your DTEXEC utility or the command line in the SQL Server job running the package.  Another trouble point is if you're using expressions to build your connections and then you save those variables to a config file.  

The major problem with SSIS is there are so many little nooks and crannies to search for that one rotten piece of information that is killing your package.  The package explorer, the advanced editors, the meta-data in the precedence links, the variables, the expressions, the multiple config files and don't ever forget the error handling.  Change one thing and suddenly you're spending the rest of the day fixing your package until you're sure you've fixed every last little bit of related data.  This lesson we learned the hard way while trying to figure out how to promote our config files (along with the packages) up through our environments.

It is a major pain.  But once you figure out what the problem is with your particular package, I can guarantee you'll never make the same mistake again.  If only because solution was branded into your fingertips from all the typing & mouse-click you did when you fixed it. @=)

July 30, 2007 19:04
 

SSIS Junkie said:

Last week was an interesting week in the SSIS community instigated fairly and squarely by Oren Eini&#39;s

July 30, 2007 19:26
 

David Cohen said:

Hey Jaimie,

 The problem occurs with implementations of Ole-DB and and ODBC drivers that are used as source.  

If you ever run a profiler on Sql Server, while you are validating a table load (from a sql server data-source) you can actually notice that the sql server ole-db driver (or maybe some other part) will actually replace the where clause with a 1=0 (I was looking into why Sql server validation was so much faster than non-sql server)  after this discovery my validation times were reduced to next ot nothing

July 31, 2007 03:41
 

Raul Black said:

Okay, first I apologize for the little stab I took at MS. But being fairly new to SSIS, my initial impression, metaphorically speaking, is that it seems to come from a place far away, so you really don't know who created it, at times the architecture seems to be built by brilliant minds, and at other times, well, it seems that someone forgot to contact professional end users before the design was finalized. Thus, my previous colorful analogyJ

Truthfully, the status quo these days is to not release finished software in V1.0, so I really don’t expect MS to break from the herd anytime soon. That being said, I am actually quite impressed with SSIS as an ETL tool.

Brandie brought up some points above about using SSIS, and I also experienced these shortcomings. However, this forced me to contemplate alternative design methodologies in order to circumvent these and the many other problems I encountered using the SSIS tool. What I found is that I was able to build an ETL system that far exceeded my initial expectations.

The dynamic possibilities of SSIS are the first thing that lit my fire. For instance, I built an ETL package that uses a ForEachADO loop that loops through a control table that provides the control info to populate some 60 DW tables. This way, if my organization decides to change our overall ETL process, we only have to modify one package and it affects all 60 tables (my boss suggested this idea, which really challenged me at first). Of course, we have a few different processes that require different looping packages, but the point is, it is much better to maintain two or three packages than it is to maintain over a hundred proprietary packages—one for each table in your DW.

The package design is simple and straightforward and requires no parameter changes when moving from test to production. Nor do the control tables or associated objects. Yet, it is heavily parameterized so that it can be used in other environments or as a starting template for building other processes.

You might be thinking, “Yes but that would never work on my system because we…” But the point I am making is not that my design will work in your organization; it is that SSIS has a robust enough feature-set to allow for doing some really impressive stuff. But it does require considering alternative approaches to what you might have become accustomed to.

The main shortcoming I see with SSIS at this point is that MS doesn’t appear to be interested in it enough to devote enough resources to finish creating the product. Despite this, I remain a convert. All in all, my opinion is that it represents a positive step forward.

July 31, 2007 07:16
 

Wanderer said:

I haven't seen it mentioned yet - a lesson I learnt regarding XML configuration files and Connections is that they are case SenSitive.... So if you rename your connection, neaten up the name etc, AFTER having created your configuration file - your configuration file no longer configures that connection string. The error messages (for example, when you open the dtsx in BIDs, and work through them) DO tell you this ... "Unable to configure..." Those errors messages were enough to make me realize this, and I'm hardly the shining light of SSIS.

Many of the other things I can't comment on, but I think(hope), if we all visit this thread in a years time, some of us will be able to say: "ahh - well, now I know why" to SOME things, and other we will say: "Yeah - remember how bad it was!". Hopefully, there won't be too many where we say "man, it is still bad at/can't do XXX or YYY"

I agree, editing in BIDs is very slow, and I too would like a smart diff'ing of SSIS packages - does anyone know if Data Dude - or whatever MS called the Visual Studio for Database Developers - Team Edition (is that the full name) can handle diff'ing on SSIS packages?

July 31, 2007 18:20
 

rbellamy said:

I am currently dealing with a very real problem with indirect configuration using Environment variables.

In a 64-bit Windows 2003 box which is also a Terminal Server, neither the CmdExec step of a job run by the SQL Server Agent Service account nor devenv run under my account is picking up the environment variables.

When looking at the devenv process image using procexp, it very clearly has the proper environment attached to it... and just as clearly, all the variables that SHOULD be set using indirect configuration are ALL showing their values as they were set during development, instead of the correct values.

I'm reasonably certain this is an issue with 64 vs 32 bit processing. Regardless, this is a concrete example of SSIS indirect configuration being completely borked. I have screenshots if you wanna see...

August 1, 2007 07:22
 

ST said:

In regards to the comments below about multi-developers working on the same package along with Source Control (Banching and Merging), I would remind everyone that you do not have to use the SSIS UI to create ETL Processes. The SSIS is fully exposed within .NET and therefore you can create your own .NET Assemblies using C# or VB.NET to produce your own custom ETL Package (exe or dll). If you do not like the SSIS UI, then don't use it as the MS Team has given us an alternative means to create our ETL Process.

Oren: "how do you handle two developers working on the same package? How do you handle branching and merging?"

JT: I hope to expand on this in a dedicated blog post soon (someone replied to my post asking me to do so as well) so I'll talk more about it there. Headline though is that we have a framework that doesn't require multiple developers to work on the same package because we split "work" over seperate packages and that approach works really well for us. If your complaint is around the lack of ability to work on a task in isolation then again I am 100% with you and I have been complaining about that since I first got my hands on the product.

Branching and merging? Never had a requirement to do that yet but if/when we do and if/when I run into problems I'll (try to) remember to let you know.

A key point here is that multi-user development is possible by seperating work over separate packages and I would view that approach as best practice anyway.

August 3, 2007 13:29
 

Alex Crockett said:

I use the Slowly Changing Dimension to facilitate UPSERTS.  Its works fine on low volumes (<10,000,000 rows) but anything above needs perf tuning attention.  

August 6, 2007 16:49
 

Brian said:

Regarding the use of environment variables for configurations...

I picked this up from Jamie's blog (http://blogs.conchango.com/jamiethomson/archive/2005/11/02/2342.aspx) but I use the environment variable to point to the config file that I want to use and nothing else.  This works great for me.  Anyway, saw the conversation of environment variables drop off without mentioning that and it's made the whole config piece much easier for me at least.

August 6, 2007 19:35
 

SimonS SQL Server Stuff said:

I recently visited a client that was achieving wonders with DTS, I was there to give them pointers on

August 12, 2007 00:48
 

HappyCow said:

rbellamy: I think I have the same problem. Check out this thread:

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1935358&SiteID=1

August 16, 2007 08:16
 

Lee Cascio said:

I'm late on this thread but found it after googling "SSIS BIDS slow" trying to find relief from my own pain.

I'd say the biggest improvement to SSIS needs to be BIDS which is an achilles heal of the whole product.  I completely agree with the slowness which I think, along with the expression editor, are the worst problems.  We've noticed that the combination of merge joins, lookups and many column dataflows (e.g. several hundred columns) seems to cause an exponential degredation on BIDS.  BIDS becomes so slow, just moving a component, let alone making a change to a merge join or something, may cause the design surface to take minutes to refresh and/or you keep getting the infuriating "...visual studio is busy..." message.  From what I've seen, it is not a connection problem as the CPU will spike to near 100% for several minutes, BIDs memory will increase to over a gigabyte, and there be no Db activity.  As a result we've moved some of the dataflow into database views (e.g. using DB joins vs. SSIS merge joins), and have broken up dataflows into smaller ones that do not necessarily make sense from a process perspective but let us modify packages without waiting 5 minutes between changes.  From a runtime perspective SSIS has more than enough capability to handle the complex dataflows, it is just that BIDS can't handle the specification of them, which to me is the definition of an achilles heal.

September 12, 2007 14:54
 

jamie.thomson said:

Hey Lee,

This is bad and the SSIS team need to be aware. Have you reported it at http://connect.microsoft.com?

-Jamie

September 12, 2007 16:30
 

Lee Cascio said:

I'm reporting it, thanks for the link.  I just put together the steps to recreate.  I am surprised that others have not hit the problem unless it is uncommon to have the wide dataflows that my current client has.  Either that, or we're missing a patch or something.

That said, SSIS has been a fabulous tool set.  The 15 "gripes" do not appear to be a fair assessment and there are ways to work around most of the shortcomings.  For example, I've developed a 3 tier architecture to packages maximumizes code re-use and greatly reduced the "cut, paste, and copy" templating where you end up with many packages that, as you noted above, have essentially the same components, just different metadata.

Another nice to have improvement would be for the variable pane to also show whether the variable is an expression, the expression, and any config file(s) that populate the variable.  In training people to develop with SSIS, variable mgmt seems to be one of the more confusing activities because they have to bounce around between the variable pane, the properties pane ifthe variable is expression driven, and the package configurations dialog if it is populated from a config.

BTW, the performance bug is very easy to re-create:

(1) Just create two tables with 500 columns each (haven't played with it to see if only one has to be large or if datatypes play a role).  

For example:

Create Table Test1

(

Column1 varchar(10),

Column2 varchar(10),

...

Column500 varchar(10)

)

Create Table Test2

(

Column1 varchar(10),

Column2 varchar(10),

...

Column500 varchar(10)

)

(2) Then create an SSIS package with 1 dataflow.  In the dataflow create an OLE DB source for each table, a sort for each flow (I use Column1, Column2, and Column3 as my "pretend key"), and a merge join (I use a left join and select all columns in the 1st table to pass).  Voila, when you click OK on the merge join, it will take 5-10 minutes to complete whatever operation it is doing and CPU will spike (in my case at 25% on a 4 CPU box so I assume that is 1 full CPU).

I put no data in the tables.  Simply trying to re-open the MergeJoin will cause it occur let alone making a change like selecting a column from the second table to pass.  Merge has the same problem and possibly other components.  It seems like the metadata update is caught is some kind of inefficient recursion or loop.

September 14, 2007 14:52
 

SimonS Blog on SQL Server Stuff said:

Well the first that is close to my heart is Integration Services and its really annoying. I tried to

November 6, 2007 15:12
 

Mark Godfrey said:

SSIS is a horrible product.  I'm not just trying to jump on the bandwagon, I loved DTS and was anxious to see the next-gen product.

Wow.

Even simple things, like seeing what is in a deployed package, is impossible.

What a shock.  It is the clumsiest software I've worked with in years.  To say it is non-intuitive, bloated, slow, and unnecessarily-complicated is to give it way too much credit.

Why did MS fix what wasn't broken in the least?

April 9, 2008 23:09
 

Albert Benjamin said:

Once you understand SSIS you will see that it is a rough diamond.  All it need is a good cut and polish then it could stand proud next to other jewel in ETL crown.

Sure it is bugy (tell me what software which doesn't have one) but it would be foolish to ignore it's potential.  I am sure there is a lot of customer out there who can appreciate an ETL solutions without needing to spend big bucks for the tools.

The question is how important is SSIS for Microsoft in its pursue to be among the leaders in the BI arena.  This in turn will transalte to how much effort (budget) that will be poured in for the development.

At this point allow me to thanks everyone who worked hard to build SSIS and individual like Jamie Thomson whos tirelessly dig deep for hidden gems then shared them with us free of charge.

I can't wait till SSIS 2008, keep up the good work !!!

July 4, 2008 07:27
 

Albert Benjamin said:

Oh, and one more thing:

Remember a wise man said:

You can have it quickly, You can have it cheap, You can have it good; choose two.

July 4, 2008 07:33
 

jamie.thomson said:

I like that quote Albert. Noted for the future :)

July 4, 2008 09:48
 

walkair said:

Wow, quite impressive - “good” things that started by Oren in 2007 are still relevant in mid 2008.

JT,

Let me thank you (and other “conchangees”) from all of us (developers and Microsoft) for providing a tremendous help with SSIS. SSIS adoption levels are heavily dependant on you and many other mostly non-MS contributors.

SSIS BIDS slow described by LC

Looks like some Visual Studio 2005 memory leak - it happens to me quite often. Surprisingly enough, just moving controls or constraints a couple of pixels could take 30-60 seconds (mostly in a Data Flow screen). I think that this is more Visual Studio 2005 related problem, since I don’t see the same problem in my Visual Studio 2008 Integration Services projects.

Free SSIS argument by Phil

Phils arguments that SSIS is free is not acceptable. First and foremost SSIS is included feature of SQL Server, not a free tool. The only way to get SSIS is by buying a SQL Server license.

SSIS sucks

The main reason why I think that SSIS is NOT sucks is that you have too many people blogging/talking about it. This “army” of disappointed high-end users will rather highlight the flaws of underperforming product than waste their time for a useless product. Nonetheless, it does have a side effect - its leaves a bad smell after all.

SSIS Future

Now, what SSIS needs is a careful review by Microsoft.

The real questions are (it’s really the same question):

1. Does Microsoft wants to invest more time/money and take SSIS to a new level?

2. SSIS was created to lure new SQL Server customers or to keep some existing customers happy?

Thanks,

Steve.

P.S. Give me some credit - English is my third language :)

July 11, 2008 04:29

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems