Robert's Blog

Tuesday, October 13, 2009

Wow - DB2 Data Sharing Comes to the AIX/Power Platform

When my youngest child - now 8 years old - was younger still, I would read her a story at bedtime. One of her favorites was "Lilly's Purple Plastic Purse," by Kevin Henkes. Lilly was enthralled by her teacher, Mr. Slinger, and expressed her admiration for him pithily: "'Wow,' said Lilly. That was just about all she could say. 'Wow.'"

That pretty much sums up my reaction to IBM's recent announcement of DB2 pureScale, which essentially brings mainframe DB2 data sharing technology to IBM's Power Systems platform running the AIX operating system: "Wow."

I got involved with DB2 for z/OS data sharing in the mid-1990s, while DB2 Version 4 (in which the feature was delivered) was still in the beta-test phase (I was in IBM's DB2 National Technical Support group at the time). I remember being pretty excited about shared-data architecture (in which multiple DB2 systems share concurrent read/write access to a database stored on shared disk volumes) and the potential for the solution to meet formerly unattainable objectives in terms of workload growth and database uptime. Sure enough, potential became reality, and DB2 for z/OS data sharing on the IBM Parallel Sysplex mainframe cluster became (and still is) the gold standard for enterprise data-serving scalability and availability. It was a huge jump forward, capability-wise, for DB2 on the mainframe platform.

Now here we are in 2009, and DB2 for AIX has taken that big leap forward. pureScale doesn't just meet the shared-data competition in the UNIX marketplace - it changes the game. It will deliver levels of scalability and availability that simply were not possible before. How? Simple: it utilizes the same centralized shared-memory approach to global lock management and data coherency that has worked wonders for organizations that run DB2 for z/OS in data sharing mode. Here's the deal: if you're going to give multiple data servers read/write access to one database, you have a couple of choices when it comes to keeping the different data servers from trashing the consistency of said database: you can have a node directly communicate with all the other nodes regarding data rows that it's changing and data that it has cached locally, or you can go with a centralized approach, in which a data server node posts global lock and global buffer pool information to structures residing in devices that provide a shared-memory resource to the group (and here the term "global" refers to lock and buffer pool information that a node has to make known to other nodes so as to preserve data integrity). The problem with the former solution (one node directly communicates global lock and page cache information to others) is that it doesn't scale well - go beyond 4 nodes or so, and the increase in overhead largely negates the processing capacity of an added node.

DB2 for z/OS data sharing, as people familiar with the technology know, was implemented with the centralized approach. The structures are known as the lock structure, the group buffer pools, and the shared communications area (the latter used to keep member nodes apprised of database objects in an exception state), and the shared-memory devices are called coupling facilities (originally external devices that are increasingly implemented as logical partitions within mainframe servers). The lock structure functions, in part, as a "bulletin board," to which nodes can post global lock information - and at which nodes can access global lock information - in microseconds (the lock structure also stores information about currently held data-changing locks, which helps to speed recovery in the event of a node failure). Members of a DB2 for z/OS data sharing group use the group buffer pools in the coupling facilities to "register interest" in database pages cached locally, so that they can be informed when a locally cached page has been changed by another member (changed pages are written to group buffer pools as part of commit processing, and they can be accessed there by other members in MUCH less time than a retrieval from disk - even from disk controller cache - would require).

This centralized global lock and global page cache mechanism has scaled up very effectively, and I mean in the real world, not just in a demo setting: at the company for which I worked when I was on the user side of the DB2 community, we had a 9-way mainframe DB2 data sharing group that handled a huge workload with very little overhead. I know of a 15-member DB2 for z/OS data sharing group at a large bank, and there could be systems out there with more nodes than that. Centralized management of global lock and page cache information has also paid dividends in the area of availability: the impact of a node failure is minimized, and restart of a failed DB2 member is accelerated. At my former workplace, a member of a production DB2 for z/OS data sharing group terminated abnormally in the middle of the day. It was automatically - and quickly - restarted, and the failure event did not impact our clients (the application workload continued to run on the surviving nodes while the failed member was restarted).

With pureScale, DB2 for AIX users can realize these same shared-data scalability and availability advantages. DB2 pureScale basically provides the functionality that coupling facilities do in a mainframe DB2 data sharing group, housing the global lock, group buffer pool, and shared communications area structures in a super-high-performance shared memory resource. Member systems running AIX and DB2 9.8 (the DB2 release that enables participation in a multi-node shared-data system) connect to the pureScale servers, and the increase in overall processing power as nodes are added is almost linear. In other words, the overhead of concurrent read/write access to the shared database increases only slightly as nodes are added to the group (IBM has demonstrated pureScale configurations with scores of nodes). On the availability front, there's automatic and fast (seconds) restart of a DB2 member in the event of a failure, fast (seconds) release of locks on rows that were being changed by a member DB2 at the time of a failure, and automatic routing of incoming transactions to other members during restart of a failed DB2 member.

Allow me to restate the point for emphasis: the DB2 for z/OS data sharing/parallel sysplex architecture has proven itself for nearly 15 years in tremendously demanding conditions, in terms of throughput and availability requirements, at sites all over the world. Developers at IBM's labs in Toronto (DB2 for Linux/UNIX/Windows) and Austin (Power Systems) have worked for years to bring that architecture to the DB2/AIX/Power platform, leveraging the advanced technology originally brought to the market by their colleagues in San Jose (DB2 for z/OS) and Poughkeepsie (System z). DB2 pureScale is the result of those efforts. As DB2 for z/OS data sharing took the already-high scalability and availability standards of the mainframe DB2 platform and raised them still higher, so pureScale will do for DB2 on AIX/Power, the platform that already sets the standard for UNIX system reliability.

There are other great parallels between DB2 for z/OS data sharing and DB2 pureScale: both are application-transparent, both provide system-managed workload balancing, and both allow for very granular increases in system processing capacity.

There's plenty more to report about pureScale, and I'll try to provide additional information in future posts. For now, I'll highlight a few items that I hope will be of interest to you:
  • The DB2 for LUW data partitioning feature (DPF) isn't going anywhere. Particularly for data warehouse/business intelligence systems, the shared-nothing clustering architecture implemented via DPF is the best scale-out solution.
  • Transaction log record sequencing is there. As in a DB2 for z/OS data sharing group, each member in a DB2 pureScale configuration logs changes made by SQL statements executed on that member. If a table that has been changed by SQL statements executing on multiple members has to be recovered, a log record sequence numbering mechanism and read access for any given member to all other members' log files ensures that the roll-forward operation (following restoration of a backup) will apply re-do records in the correct order.
  • pureScale licensing is very flexible. If you need to temporarily add processing capacity to a DB2 pureScale configuration to handle a workload surge, you pay for the additional peak capacity only when you use it.
  • Talk to your tool vendors about pureScale. IBM has been working for some time with several vendors of DB2 for LUW tools, to help them get ready for pureScale.
As I mentioned, there's more information to come. Stay tuned. This is big.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home