blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Conchango are busy and need talented consultants in and around London. Interested? Email me or send me a message

SSIS: Concurrently running packages

Updated at the bottom

Concurrent/parallel execution is not a particularly hot topic in SSIS. In the future I expect functionality to be introduced into SSIS to enable things such as spreading parallel executing tasks over multiple boxes and seperating data in the data-flow over multiple boxes. Until then though I shall sit and hope.

In the meantime, its obvious that concurrency can be achieved currently by executing tasks in parallel or having multiple data paths in a data-flow. However, you should also be aware that it is possible to run whole packages in parallel by executing them from the Execute Package Task.

Taking this one step further, its also possible to execute a package in parallel with itself. This could be useful if, for example, you wanted to run a package on multiple disparate but identical sources in parallel. Its this scenario of executing a package in parallel with itself that I want to expand on a little bit here. There are a couple of things you should be aware of if you intend to attempt this.


Cannot distinguish between tasks in the concurrent packages

A fairly common approach to logging in SSIS is to have a parent package catch all events fired by itself and all descendant packages. The method of distinguishing different events fired by the same container is to use the @[System::SourceID] variable which contains the global unique identifier of the container raising the event.

However, if the package is running concurrently with itself events fired by containers in those seperately executing packages will be raised with the same value in @[System::SourceID]. It is quite conceivable that you may want to track all events raised by the same container and in this concurrent execution scenario you are going to run into problems.

For example, you have a custom logging method for which you want to insert a record in an audit table when a container starts to execute and then update that record with the container execution duration when the container completes. If multiple instances of a package were executing concurrently then you would not be able to tie the OnPreExecute and OnPostExecute events fired by the same container together.

I have heard there are plans afoot for a future version of SSIS in which a ForEach loop can execute all of its iterations in parallel. This would cause more problems of the type already explained.

There is not really a workaround to this anomoly. It is just something that you need to be aware of.

 

Checkpoints files cannot be shared

Another problem regarding a package being executed multiple times is that any checkpoint files created by the package would not, by default, be attributed to a specific execution instance of that package. Hence you have the situation where a package could incorrectly use a checkpoint file that it was not supposed to.

There IS a workaround to this problem. If you can give each execution instance some form of ID (typically passed from a parent package) that will be repeated when that instance of the package is next fired. That ID could be concatenated to the name of the checkpoint file using an expression, thus making a checkpoint file execution dependant!

 

These are just some random musings. Perhaps they might help people be aware of some issues they may enounter when calling packages multiple times.

-Jamie

 

Update:

Joachim made a good point in the comments section below. The @[System::ExecutionInstanceGUID] variable can be used to distinguish between different executions of the same package.

Unfortunately this wouldn't be any good in the situation where a package is called multiple times from the same parent package that is used for centralised logging. This is because any reference to that variable in the parent package would use @[System::ExecutionInstanceGUID] that is scoped to the parent package.

 

This variable could also not be used in the name of a checkpoint file because system variables are not persisted to a checkpoint file and hence each execution of a package would have a different @[System::ExecutionInstanceGUID].

No doubt there are situations where @[System::ExecutionInstanceGUID] would prove useful and indeed Joachim states that he is using them successfully.

Definately something to be aware of! Thanks for the comment Joachim!

 

-Jamie

 

Published 03 April 2006 21:38 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Davide Mauri said:

Hi Jamie i've got some little problems with SSIS Logging too, expecially when having packages that runs other packages and where the outer package should catch all event to log them into a daatabase.
I've wrote two articles here: http://weblogs.sqlteam.com/dmauri/.
Anyway i think that the best way to achieve logging is to create a customzied DTExec tool. In this way you can log everything you need (even information about in which container is placed a task!) and you don't have to define log during package design but you can manage it just before execution.
Here you can download the tool i'm writing:
http://www.davidemauri.it/files/CustomSSISLogging_Beta1.zip
as soon i'll finish to do all my experiments i'll put it on my blog for free.
April 4, 2006 07:29
 

Anthony D'Angelo said:

We implemented Custom Logging inside our SSIS Custom Components by calling the Microsoft Enterprise Library (Jan 2006-Custom Logging) directly from our code:

http://www.microsoft.com/downloads/details.aspx?FamilyId=0325B97A-9534-4349-8038-D56B38EC394C&displaylang=en

This helps us manage individual task performance and errors.  The location of the log file is in a SSIS variable, which can be set using SSIS config files.
April 4, 2006 16:33
 

jamie.thomson said:

Interesting stuff Anthony. We are using Enterprise Library on my current project as we are calling .Net components from teh Script Task and Script Component. We have not gone to the extent that you have though.

Is there any way you can formally publicise your experiences?

-Jamie
April 4, 2006 16:54
 

Joachim said:

We are doing such logging as you mention in the first problem, but we use a combination of @[System::SourceID] and @[System::ExecutionInstanceGUID].

Haven't much experimented yet with concurrency but would that not solve the distinguishing problem?

J.
April 5, 2006 08:02
 

jamie.thomson said:

Joachim,
Unfortunately that wouldn't work in the scenario that I talked about above. I have edited the original post with a reflection on your comment.
I do definately think though that there is value in using @[System::ExecutionInstanceGUID], just not in the scenario that I mentioned!

Thanks very muich for the comment.

-Jamie
April 5, 2006 09:07
 

Joachim said:

Jamie,

Thank you for your comments. Indeed we don't use a parent package, we use an external application to manage package execution. So it didn't occur to me that the ExecutionGUID is not that much unique in case you were describing.

You mention the use of a user variable set by the parent package (in the part about checkpoint files). I guess the two pieces together might work: logging @[System::SourceID] in combination with a custom execution identifier generated and set by the parent package.

The same could be applied in the case of a container executing tasks (like iterations) in parallel.

Joachim.
April 5, 2006 09:36
 

jamie.thomson said:

Joachim,
Yeah, that's an idea. It'd be nice if you didn't have to do this workaround though.

The obvious answer is for each container to raise its own Execution identifier (e.g. [System::SourceExecutionInstanceGUID]) as well as [System::SourceID]. I have previously requested that here: http://lab.msdn.microsoft.com/productfeedback/viewfeedback.aspx?feedbackid=c8572b18-0e4d-4ceb-8cf5-57fd09ee7091

-Jamie
April 5, 2006 09:49
 

matt said:

Hello,

I'm new here, but have been lurking about for a while now as this seems to be a great place to get information on SSIS.  In anycase, I have a problem dealing with running the same package in parallel with different parameters.  Can anyone suggest a way to do this.  

Basically, I have to run the exact same query against multiple databases (56) every 5 minutes.  By creating 4 identical packages that divy up the work increases my performance significantly.  Duplicating packages seems like a hack, is there some way to do this - seems like I can not do it with Parent Package configurations.

Thanks

Matt

November 28, 2007 14:42
 

jamie.thomson said:

Matt,

Essentially all you need to do when you execute the package is pass it a parameter which indicates what work it is supposed to go and do. (There are a number of ways of doing this but I do't think its in the scope of what you're asking.)

What you seem to be asking is "how do I pass in a parameter". I'm curious why you think parent package configurations can't be used although I do accept that parent package configurations are not precisely analogous to passing in a value at execution time. Unfortunately, there is no way to do exactly that using the Execute Package Task but you CAN do it using the /SET option of dtexec.exe. Perhaps that is an option for you?

The inadequacies of the Execute Package Task have been highlighted here: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=295885

-Jamie

November 28, 2007 15:20
 

Umesh said:

Hi,

 I am a newbie here. I have a problem with executing a package parallel with itself. I am using App.package.execute() to execute the SSIS package. The problem is i need to do it in a loop for 12 diff countries. When the loop runs only once, the package takes 5 mins to run. But when it runs for 12 it takes 5 times 12. Note, I am kicking of new thread for each country...All of them pause at the same package.Execute() step. which obviously suggests that the thread are waiting for something & not runnning the package. I was doing the same with DTS, it worked a lot better than this. Any suggestions anyone ? Thanks in advance.

PS: Sorry if this is not the right place to post this.

May 29, 2008 11:55

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems