blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Pervasive BI for London commuters

Recently I have begun a new project working in the centre of London which means I am back to a daily train commute following my slightly more laid-back time in Aberdeen which involved Taxi-ing everywhere. As such I am about to purchase an Oystercard to cover my travel expenses.
For those of you not living around and about London -which is most of you- Oystercard is a pre-payment swipe card used for London's tube/underground/subway* network. As far as I know it is administered by Transport for London (TfL).

Oystercard is a fairly new innovation having recently appeared as an alternative to paper tickets. What got me thinking was the plethora of information that TfL now have available to them. Previously they would have known how many people enter and leave the tube network at which stations but what I suspect that DIDN'T know (or didn't capture) is exactly what journeys people are making and, crucially, who is making them.

A conservative estimate is that there are 3million people travelling into London every day of the working week. Assuming that half of those people use the tube at some point and also assuming that they all make two journeys a day then you're already conservatively talking about 3million tube journeys every working day of the week. Given that every journey comprises of two customer interactions (swipe-in and swipe-out) then you're talking about really huge data volumes and, more than that, huge potential for discerning really useful information from that data. In a nutshell a great opportunity to leverage business intelligence.

I'm loath to get into political arguments but its probably fair to say that TfL aren't the most popular public service organisation in London. That got me thinking - are they making enough use of this vast amount of data that they have available to them. Think of some of the questions that they could be getting answers to:

  • There are 275 tube stations on the tube network. By my reckoning that is 75350 point-to-point potential journeys that a tube customer could take. Which of those is the most/least popular journey and therefore which routes should TfL be investing money into?
  • Where are the potential bottlenecks when routes go out of service (quite regularly - ask any London commuter about the dreaded "Signal Failure" and you will know what I mean)? There is potential for real-time monitoring of where the hotspots are and customers can be redirected accordingly.
  • The potential for demographic analysis of TfL's customer base is enormous. Sure, they can ask simple questions like "How many men and women journey into London?" and "What is the age of people entering London?" but this gets much more interesting when you combine it with the places that people are journeying to. Take tourist attractions for example, most of which have an affinity with a tube station. Is it possible to look at the types of people that visit, say, Tower Hill (i.e. the dropping-off point for The Tower of London) and then jointly market special offers to that demographic? Do families or couples travel there together?
  • Data Mining and customer modelling has the potential to uncover many characteristics of people's travel habits. Are we generally creatures of habit or do we like to vary our journeys? Is it possible to predict the implications of a single train going out of service unexpectedly and in those events would a different course of action have been preferable? If I make a journey how long is it likely to be until I make the return journey? How is the network going to stand up to large sporting events at (for example) Wembley, Lords or Twickenham?
  • Other organisations would no doubt be interested in making use of this mass of data so there is even the potential of selling the information on. I'm sure people at the Police, Congestion Charging company, train operators, security organisations & retailers all have a vested interest in knowing where people are going to be, where they were, and at what time.
  • Alot of stations don't force people to swipe-in and swipe-out. By analysing how many apparently incomplete journeys occur TfL could assess the potential revenue loss that they face not to mention loss of that valuable data for analysis.


When you consider that Oystercard is also valid for London's bus network then the staggering amount of information that is retrievable increases two-fold. It is possible to even track a person's entire end-to-end journey.

Supermarkets have of course been doing this type of analysis, particularly demographic-type analysis, for years and for all I know our travel companies are already doing it as well. I do suspect however that there is much more information to be gleaned than what they are currently harvesting - information that could improve the inherent problems that London commuters face every day.

 

We constantly hear marketing-speak from big tech companies about the value a pervasive BI strategy can make to an organisation but rarely do we hear of organisations exploiting this to its full potential. When I look around my every day life I see the potential for BI in ways that probably haven't even been considered yet. The problem isn't capturing the data, its already there. The problem is educating people as to what to do with it!


Quite apart from the business opportunities that exist a geek like me inevitably starts to think about the implications of having to manage this amount of data on a day-to-day basis. I would love the opportunity to throw this much data at a SSIS/SSAS/ProClarity solution and see how it performs.

  • How would SSIS perform when implementing the algorithm for matching three million journey-begin events with three million journey-end events?
  • How would SSAS cope with multiple users querying over 2billion rows per year of underlying fact data?
  • And how would SSAS cope storing a Customer dimension with millions of members?
  • What sort of hardware would be required to facilitate this?

I would love the chance to find out.


OK, have you digested all that? Now consider how much more could be learnt (and how much more data would need to be processed) by embedding a RFID tag in each Oystercard and having strategically placed RFID readers throughout the network. Phew - frankly the mind boggles doesn't it? Instead of knowing which stations someone travels between we can know where they actually are within those stations. Perhaps I'll save those thoughts for another day!

 

The real point I want to make here isn't what TfL should be doing -I've merely used them as an example. The opportunities to exploit the masses of data that companies are now warehousing and don't know what to do with are huge. I feel that we're reaching a precipice and we don't know which way we're going to drop. Are people really going to buy into this idea of Pervasive BI or are we forever going to be shackled by a maelstrom of data that nobody has a clue what to do with? I hope its the former and I know that if it is Conchango have the expertese to help companies achieve whatever they want to achieve.

-Jamie


*Delete as appropriate

Published 20 June 2006 08:44 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

No Comments

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems