blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Conchango are busy and need talented consultants in and around London. Interested? Email me or send me a message

Dryad

I've been meaning to talk on here about Dryad for a while but I've been busy elsewhere and probably a tiny bit lazy as well. So what's Dryad I hear you say?

Dryad is a research project within Microsoft Research and "is an infrastructure which allows a programmer to use the resources of a computer cluster or a data center for running data-parallel programs. A Dryad programmer can use thousands of machines, each of them with multiple processors or cores, without knowing anything about concurrent programming" or so it says here.

Where Dryad gets interesting is when you read about one of its proving grounds. Get a load of this:

  • SSIS on Dryad executes many instances of SQL server, each in a separate Dryad vertex, taking advantage of Dryad's fault tolerance and scheduling. This system is currently deployed in a live production system as part of one of Microsoft's AdCenter log processing pipelines.

That sounds very interesting. Its hard to decipher what is actually going on here but it sounds as though these guys have used Dryad to distribute SSIS workload across multiple cluster nodes. Unfortunately detail is a bit lacking so its hard to know exactly what has been done but if they have managed to parallelise a SSIS dataflow across the cluster then this is truly exciting stuff and raises SSIS into the echelons of Ab Initio which leads the way in terms of raw processing power with its highly distributed parallel architecture.

I'll keep my ear to the ground about Dryad and if I find out more I'll let you know. The real question is, will this be available for Microsoft customers (i.e. you and I) to use as well? I'll endeavour to find out.

Just to whet your appetite some more here's a little bit more detail from the pages of their research paper Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks:

SQL Server Integration Services (SSIS) supports workflow-based application programming on a single instance of SQLServer. The AdCenter team in MSN has developed a system that embeds local SSIS computations in a larger, distributed graph with communication, scheduling and fault tolerance provided by Dryad. The SSIS input graph can be built and tested on a single computer using the full range of SQL developer tools. These include a graphical editor for
constructing the job topology, and an integrated debugger. When the graph is ready to run on a larger cluster the system automatically partitions it using heuristics and builds a Dryad graph that is then executed in a distributed fashion.
Each Dryad vertex is an instance of SQLServer running an SSIS subgraph of the complete job. This system is currently deployed in a live production system as part of one of AdCenter’s log processing pipelines.

Thanks to Howard for the tip about Dryad.

-Jamie

Published 13 November 2007 04:19 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dryad said:

November 13, 2007 04:44
 

Matt Masson said:

"... if they have managed to parallelise a SSIS dataflow across the cluster then this is truly exciting stuff ..."

Yes, that's pretty much what they've done. They use a set of heuristics to decide how to split up their dataflow across the grid. We got a demo of it sometime last year, and it was very impressive. It's very cool stuff, but of course, is completely dependent on Dryad. Hopefully the technology progresses to a point where it can be easily deployed... definitely a project worth keeping an eye on.

November 21, 2007 05:53

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems