So far in this series I have introduced you (somewhat) to SoBI and now I'd like to introduce you to one of the enabling initiatives that we have undertaken. First, a bit of history.
In the summer of 2005 the purveyors of SoBI realised that in order for the architecture to work it was necassary to have some agreement on what identified each entity in the business on which we would be providing data. I'm usually better at explaining things with simplistic examples so let me do that here...
Imagine we have 2 service facades. One of them provides information about how much oil each oil well sucks out of the ground every day - let's call that WellProductionService. The other tells us what maintenance jobs have been scheduled on each oil well in the coming week/month/year/whatever - we'll call this service facade WellMaintenanceService. Below the facade these two types of data each draw their information from a seperate system of record (SoR) and the oil well identifiers in those two systems are not the same. The data consumers do not want to use different identifiers when they query these facades, they want to use the same identifer for an oil well regardless of whether they are querying WellProductionService or WellMaintenanceService. So, we have a problem - how can we use the same identifier for a well when the underlying SoRs refer to them differently?
As Roger Wolter says here: “In addition to having all your systems agree on what a customer message looks like [in a SOA], they are going to have to agree on customer identifiers before the systems can exchange data in a meaningful way."
Or to quote from SoBI: "For any entity aggregation service withina Service Oriented design it is important to reach agreement on common meanings for the entities that the services will operate on"
Or Easwaran G. Nadhan: "It is clear from the experiences in the (relatively small) number of organizations that have moved aggressively into SOAs that coordinating reference data is the required first step toward service orientation" (see Further Reading below)
Clearly the master data problem was going to occupy alot of our time, and so it did. And still does.
Just to digress slightly for a moment - this is a cultural problem as well as a technical one. If you are going to change the identifer for a particular entity then most likely you are going to upset someone. The group of people that use the well production SoR wouldn't want to have to change the data that they use in their daily working lives, likewise neither would those using the well maintenance SoR. Roger Wolter (again) expands on these cultural problems in his blog post MDM and EAG and CDI Oh My! (Part 2).
Anyway, back to the technical problem at hand. The main guy on the SoBI team that was wrestling with this problem was, that man again, Mick Horne. Mick had to come up with an answer to the question of how we could leverage a common well identifier that will be understood by all SoRs. Also, how could we leverage a common relationship between those wells and their natural classification within the business. The solution that was chosen, and of which Mick has been the main driver ever since, was to build a new, seperate, system in order to carry out a number of important functions:
- Define an agreed identifer for each instance of a business entity (which in this scenario is an oil well but of course there are many others). The business needs to be fully engaged in identifying what that identifer should be. We chose the term common reference data to identify this group of identifiers.
- Provide a translation between our common well identifiers and the well identifers stored in each SoR.
- Define a single hierarchy to organise all the wells. The hierarchy could be based on location, usage type, longevity...anything really. As long as there was one single hierarchy that all groups in the business could use.
- Provide a list of attributes for each entity
Given these fairly loose requirements we set about building a system that could do what we needed. At the same time as we were doing this a new discipline within our industry was starting to gather pace and approximately a year ago, about 6 months after SoBI had been architected, the term Master Data Management (MDM) began to enter the mainstream. If you look around the web you'll find alot of descriptions about what MDM actually is but I rather like the following one which sums up my own conceptions fairly accurately and concisely:
"Master Data Management (MDM) ... is a discipline that focuses on the management of reference or master data that is shared by several disparate IT systems and groups." - Bitpipe.com, http://www.bitpipe.com/rlist/term/Master-Data-Management.html
The important words are at the end. "Shared by several disparate IT systems and groups". That's exactly the situation that we were faced with.
At this point it is worth pointing out the fundamental difference between an MDM system and other systems. Namely, an MDM system does not own any transactional data. It only owns reference data i.e. data about which you may capture transactional data. So a product can be considered master data. A customer can be considered master data. The action of a customer purchasing a product is transactional data. It helps to think of master data as nouns, and transactions as the verbs that link those nouns together.
We quickly realised that the system that we had been building was a MDM solution and it was heartening to see that we were already doing alot of the things that people were at that time (and still are) largely only theorising about. The only thing we did differently was refer to our identifiers as common data rather than master data. At the time (back in 2005) we did weigh up whether to use the term common or master and it was I that insisted on the common terminology so I have to take the blame for that one. :)
The MDM system that we have built has gone through many incarnations and in my next post I will tell you more about them and hopefully share some of the lessons that we have learned along the way.
-Jamie
Further Reading: