Robert's Blog


Friday, April 11, 2008

A DB2 Perspective on Database Separation

First, an explanation of what I mean by "database separation."

One of the key concepts of SOA (service-oriented architecture) is something known as "loose coupling." Basically, the idea is to reduce the number of dependencies between functional components of an application system. The "why" of loose coupling is primarily about application change isolation. If the major parts of an application system are tightly coupled (the term "monolithic" is sometimes used to describe the architecture of an application with very tightly coupled components), a change made to one part can necessitate changes to other parts - and if these co-requisite changes (and you're usually talking about program code changes) are not made, chances are that something's gonna break. [Matt Humphrey, a friend of mine and an application architecture expert, calls this the "Jenga effect," after the popular board game - will the tower tumble when the next block is removed?]

I think of loose coupling in two ways: what I call "horizontal loose coupling," by which I refer to the reduction of dependencies between functional layers of an application (i.e., presentation, business logic, and data access - often represented pictorially as differently-colored blocks separated by horizontal lines), and its cousin, "vertical loose coupling," the latter referring to loosening, in a dependency sense, different application subsystems within an enterprise (e.g., order management and inventory management application subsystems, which might be shown in a picture as being "next to" each other, separated by a vertical line). Discussions of vertical loose coupling often get into the importance of applications "owning" their data (example: if a program that's part of the order management application needs access to data that "belongs" to the inventory management application, it doesn't go directly to that data - instead, it accesses the data by way of an interface to the inventory management application), and here some folks (especially those with a "systems" bent) can become rather uncomfortable with the thrust of the conversation.

Why the discomfort? Well, application ownership of associated data implies a separateness of databases (if databases are involved), and a lot of people who are less familiar with DB2 than they are with other relational database management systems jump to the conclusion that said separateness has to be physical in nature (i.e., what might today be one database used by two applications will have to become two physically separate databases in an SOA environment). As another of my application architect friends, Eric Drudge, once told me, "I think that there is a fear of a lot of... small [databases] popping up all over the place that logically are related, thus making referential integrity a challenge." The RI challenge that Eric mentioned has to do with the fact that DBMS-defined RI constraints can't cross database lines. Another potential problem when one shared database becomes two (or more) physically separate databases is related to recovery/restart. Think about a data-changing logical unit of work that now spans two databases following a data separation action performed in the course of loosening the coupling between two application subsystems. As a result if the database separation there are now two commit scopes within the one logical unit of work (i.e., the application updates database A and commits that update, then updates database B and commits that update). What happens if the system crashes after the database A update has been committed, and before the database B update has been committed? Here's what will happen: following system restart, there will be a data integrity problem because databases A and B will be out of synch, reflecting a partially done piece of work. There are ways of dealing with this, including the use of distributed two-phase commit (if provided by the DBMS) and the coding of idempotent database update logic (logic that if re-executed with the same inputs will not generate an error and will not violate data integrity - useful if your approach is to just re-drive the whole logical unit of work following restart, without worrying about where it was when the failure occurred), but there are costs associated with both of these work-arounds (distributed two-phase commit tends to increase CPU consumption, and making database update operations idempotent could require a lot of code re-write).

So, the concerns of IT systems people about database separation are legitimate, but I mentioned that these concerns are more likely to surface amongst people who are not familiar with the DB2 database management system. What does DB2 have to do with this? Well, the aforementioned RI and recovery/restart issues go away if the databases in question are only LOGICALLY separated (i.e., if tables "belonging" to two different applications are divided into two different schemas within one physical database), and this kind of logical-within-physical separation is more easily accomplished with DB2 than it would be in some non-DB2 environments (note to DB2 for z/OS people: I'm using the term "database" to refer to what you might call a DB2 subsystem). DB2 facilitates logical database separation within one physical database because:
  1. When you create a DB2 table, it’s easy to provide a qualifying name (the schema name) that has NOTHING to do with table “ownership,” and
  2. Even with multiple schemas (collections of related tables) within a single DB2 database, it is easy to use unqualified table names in SQL statements: if the SQL is static, you supply the schema name via the QUALIFIER option of BIND, and if the SQL is dynamic, you provide the schema name by way of a SET SCHEMA statement.
People who have worked exclusively with DB2 tend to take these things for granted. Folks, believe me when I tell you that logical database separation without physical database separation is NOT such an easy thing in some non-DB2 environments. I’ve worked with people who’ve gotten into DB2 after working previously with some other DBMSs, and I've seen in those situations a tendency to physically separate databases when that is not necessary. Sometimes, some explanation is required to help these experienced database people who are new to DB2 to understand that a logical-only separation of databases is a perfectly viable - and often preferable - alternative to physical database separation.

Here's the bottom line: logical database separation along application subsystem lines can be an important step in the implementation of a service-oriented architecture, helping to achieve the goal of loose coupling, which itself is aimed at reducing the impact that changes made to one part of an application system have on other parts of that system. At the same time, placing various applications' tables in one physical database is great if you want to leverage DBMS-defined referential integrity, and if you don't want to worry about partially-done units of work following a system failure event. DB2 lets you have it both ways.

Small plug: I'll be teaching a one-day seminar, "SOA in the Real DB2 World," at the 2008 IDUG North American Conference in Dallas this May. If you register for the conference, I hope that you'll register for my seminar, as well (you can also register ONLY for the one-day seminar, without registering for the rest of the conference).

3 Comments:

Blogger Rick Butler said...

Hi Rob, Thanks, this was an interesting article.
You may find the following redbook to be of interest:
Data Integrity with DB2 for z/OS
SG24-7111-00

Section 1,4 "Example of integrity needed across applications" mentions some points about cross application dependencies and IBM's Master Data Management functionality

May 5, 2008 at 11:43 AM  
Blogger Rick Butler said...

Hi Rob, Thanks, this was an interesting article.
You may find the following redbook to be of interest:
Data Integrity with DB2 for z/OS
SG24-7111-00

Section 1,4 "Example of integrity needed across applications" mentions some points about cross application dependencies and IBM's Master Data Management functionality

May 5, 2008 at 11:43 AM  
Blogger Robert Catterall said...

Thanks for pointing that out, Rick. That kind of functionality will likely become more important as DB2-using organizations do more in the way of database-spanning application development.

May 6, 2008 at 7:31 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home