Robert's Blog


Thursday, October 25, 2007

Getting DB2 Data From A to B

It's a fairly common challenge: you have an operational DB2 database (meaning, one that supports one or more of an organization's core business applications), and a query/reporting database, and you want to make sure that the data in the latter stays fairly current with respect to update actions (insert/update/delete) affecting data in the former. How should you address this challenge? The answer, of course, depends on a number of factors. In this post, I'll describe some data-change replication solutions that might be a good fit, given certain circumstances.

Often, development of an effective strategy for data-change replication begins with an understanding of the organization's priorities in this area. For one company, the priorities could be minimization of dollar cost and implementation time. In that case, one might opt to go with a trigger-driven approach. Database triggers provide some important benefits when the goal is a relatively low-cost replication solution that is up and running in a relatively short period of time:
  • You don't have to buy them - Trigger functionality is built into DB2.
  • You don't have to alter existing source-data-change programs - The triggers are fired by what these programs are already doing.
  • They provide a lot of flexibility - A trigger can initiate an action before or after the triggering data change operation has taken effect. The triggered action can be anything doable by way of an SQL statement or statements, and since a stored procedure call is an SQL statement, you can do most anything programmatically possible using a trigger.
A very important thing to keep in mind when contemplating a trigger-based replication solution is this: the action taken via a trigger is synchronous with respect to the triggering data-change operation. In other words, an insert statement (for example) aimed at a table on which an insert trigger has been defined will not complete successfully unless and until the action taken by the trigger has completed successfully. This being the case, one thing that you should NOT do through a trigger is apply a captured data change to the target query/reporting database. This would take too long, and would slow down source-data change operations too much. Instead, you should use the trigger to place the captured data change information into a holding tank from which an apply process running on the target system can retrieve it. The holding tank is sometimes in the form of a DB2 table (or tables) on the source system. An alternative for temporary storage of the captured, to-be-applied data changes is a message queue, such as that implementable through IBM's WebSphere MQ product (hereinafter referred to as MQ).

Here are some of the things I like about the use of an MQ queue for captured data changes that will be applied on the target database system:
  • MQ provides assured delivery of messages (such as those containing captured source-system data changes) from one location to another (or to several target locations through publish and subscribe functionality, also known as pub/sub).
  • An MQ queue can be configured as a recoverable resource, so that even if a server on which a queue is located crashes, no messages will be lost.
  • If a burst of source-system data changes causes messages to be placed on a queue (via triggers, for example) faster than they can be taken off the queue by target-system processes, that's not a problem - the queue depth just increases for a while, until the PUT (i.e., the placing of messages on the queue) and GET activity (referring to the retrieval of messages from the queue) returns to a state of equilibrium.
  • MQ runs on just about any server platform, under most all of the major operating systems.
  • An MQ GET can be executed as a "destructive read" (i.e., a message can be simultaneously retrieved from and removed from a queue), providing a very CPU-efficient message delete capability.
  • MQ messages can be processed on a FIFO basis (first in, first out) - important if the order of data change application on the target system is important.
  • Built-in DB2 SQL functions such as MQSEND and MQRECEIVE make it much easier to code triggers that interact with MQ.
Good stuff, yeah, but it must be said that at some companies, the volume of source-system data changes is too high for triggers to be a top-choice solution (there are systems that process 1000 or more database changes per second during peak times). These organizations are more likely to opt for a data replication solution that will interface with the source system's database transaction log manager. DB2 for z/OS users can write such routines themselves, but most choose to purchase a replication tool that will do the job for them. Real-time data-change replication tools that can capture from a DB2 for z/OS system and apply on target systems running DB2 or other DBMSs are available from several vendors, including:
The important thing is to know that you have options when it comes to implementing a DB2 data replication solution. Think about the characteristics of your environment, such as the rate of data change activity for source tables or interest, the allowable lag time between a source data change and the corresponding target data change (could be an hour, could be seconds), the toleration for impact of a replication solution on CPU consumption and throughput on the source side (depends largely on current source system CPU utilization, transaction response times, and batch run times), and budget constraints (vendor tools might or might not be a possibility). With your situation well understood, determine the approach that will make sense for you.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home