blogs.conchango.com

welcome to the conchango blogging site
Welcome to blogs.conchango.com Sign in | Join | Help
in Search

SSIS Junkie

Service Orientation in Oil and Gas (Part 3) - An industry standard data model

My last post provided a concise overview of what I think SoBI is. It talked about service facades, BI caches and data consumers, and it placed an emphasis on data quality in the Systems Of Record (SoRs).

In that post I talked about how data that is required for BI purposes is cached above the service facades and I now want to talk a little about the data model that we have used on my present project in order to realise that cache. As such, this post isn't so much about Service Orientation but IS fairly and squarely about Oil and Gas, as you shall soon see.

 

Business Intelligence (BI) professionals will instantly see a parallel between the SoBI cache and a traditional data warehouse. The data in the cache is there primarily for BI purposes and in an ideal world would be stored in a tried and tested dimensional model. The distinction between the two comes when you consider the SoBI tenet that the delivered solution does not own any data.

When I started on the project back in September 2005 I was told "Don't worry about designing the data model for the cache, someone else has built it for us". The reason being that a decision had been made to leverage the Public Petroleum Data Model (PPDM). PPDM is a data model governed and published by the not-for-profit organisation of the same name and is owned and funded by over 100 Oil and Gas business companies. The data model is an attempt to incorporate industry knowledge and requirements into a single standard data model that can be leveraged by all. According to their website PPDM provides many benefits:

  • eliminates the need to develop, evolve and maintain internal data models and custom software
  • reduce take-up time for new software applications,
  • lower systems costs to update and maintain duplicate information,
  • improves the quality, quantity and timeliness of information,
  • effective business processes through clarification of data ownership,
  • reduces risk through improved reliability with clear, concise data definitions,
  • minimize data transfer between software applications or multiple databases.

Sounds good doesn't it? Unfortunately my experiences give me a slightly different opinion. PPDM is an attempt to be everything to all people (within the industry) but therein lies the problem. I applaud any attempt at standardisation but in trying to suit everyone it suits no-one. Well, not us anyway.

For example, PPDM contains a table called WELL. No great surprise there. This is an Oil and Gas data model after all and storing information about oil wells is central to the Oil and Gas business. The WELL table contains 110 columns, the majority of which are all the attributes that the PPDM data modeller *thought* we would ever need. When we came to use it we only actually wanted to store 8 attributes and the WELL table only had columns available for 4 of them. So, not only were the vast majority of those 110 columns useless to us but we had to extend the model to suit our requirements anyway.

PPDM also tries to enforce relationships upon us rather than letting us define them for ourselves. It isn't flexible enough to allow for the different requirements of different companies, or even different business units within the same company. Allow me to explain. In Scotland (where our project began) oil production is an offshore business whereas in California where we are now it is onshore. In Scotland oil fields are physically split into areas (many areas to one field) whereas in California the opposite is true, a group of oil fields collectively make up a physical area (many fields to one area). PPDM is not flexible enough to allow storage of these two differing business models so instead it allows a many-to-many relationship between fields and areas and that isn't satisfactory for either the Scottish or Californian business models. Furthermore, PPDM tries to enforce the entire oil field hierarchy upon us and does not allow for storage of ragged hierarchies (i.e. entities of the same type appear with differing numbers of nodes between them and the root of the hierarchy). We have had to spend alot of time learning the data model and then adapting it to suit requirements, a task that, arguably, has taken longer than designing from scratch.

PPDM in its current incarnation contains 1519 tables. I'll write that again just so that you know that is not a typo. One thousand five hundred and nineteen tables. In fact, a complete PPDM intallation contains over 11000 database objects. The overhead of managing and deploying that many objects put a burden on us from day one even before we really started to use it. Furthermore, some of those tables have more foreign keys pointing to them than the query optimiser of our RDBMS can handle. Indeed, one table is referenced by itself and all the other 1518 tables in the model. Deleting data from this table would mean having to drop all but 264 (the maximum number of foreign keys that our RDBMS allows allows) of those 1518 keys, deleting the data, and then laboriously recreating them again.

That type of complexity is alien to me - I am a fan of simplicity. When I was in Scotland we only had use for about ten of those tables and yet due to declarative RI defined within PPDM we had to manage the population of over 50 of them, most of which were completely useless to us - we were just satisfying someone else's data model. To make it worse we even had to introduce new tables to hold behavioural based safety (BBS) information - and that is data that is central to the oil and gas industry. We have had similar experiences in California.

There are some glaring data-modelling no-nos in there as well. There is a table called LEGAL_NORTH_SEA_LOC for storing information pertinent to, unsurprisingly, the North Sea. That means that domain specific information is included in the object name, Database object names should never contain data. Period.

My other big complaint is the physical model isn't condusive to high performance. For example, one of the tables that we use most often, PDEN_VOL_SUMM_OTHER, contains a primary key that is 180 bytes wide.

I won't go on because I think you get the idea. And besides, I already have a reputation as a PPDM naysayer so I have no need to enhance that reputation anymore here.

 

In short PPDM is, as far as I am concerned,  a solution looking for a problem and we have been tasked with fitting that solution to our requirements. That's never a good idea but regardless, I do understand the motive behind it. The company that we are working for is implementing many IT projects as part of a global program of modernisation and, whilst our experiences have been acknowledged and understood, PPDM has been chosen to maintain consistency between those multitude of projects. The benefit of this consistency in a company the size of which we are working for cannot be underestimated and for that reason alone I support the decision to use PPDM. Its a fine line though.

This post has introduced PPDM. In the future I'll talk more about it and delve into the detail - both good and bad.

-Jamie

 

Further Reading:

Published 04 February 2007 19:13 by jamie.thomson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

SSIS Junkie said:

For most of the past 18 months I have been working on a series of projects for one of the world's

February 7, 2007 05:09
 

SQL Server, BI said:

February 7, 2007 12:14
 

SSIS Junkie said:

Recently on this blog I mentioned in passing that I have been working for some considerable time now

August 6, 2007 20:57

Leave a Comment

(required) 
(optional)
(required) 
Submit

This Blog

Syndication

News

Powered by Community Server (Personal Edition), by Telligent Systems