Catterall Consulting: March 2009

Robert's Blog

Monday, March 30, 2009

A Refreshingly Cloud-y DB2 Forecast

A couple of months ago, I posted an entry to this blog on the topic of cloud computing. I was spurred to write this entry by an article on cloud computing that I'd recently read in InformationWeek magazine. While I found that article to be quite interesting, I was somewhat alarmed at the fact that DB2 was not among the database management systems mentioned therein. "Where is IBM?", I wondered. Sure, I'd noticed bits and pieces of cloud-related DB2 activity, including a ChannelDB2 video by IBM's Bradley Steinfeld that showed how to set up a DB2 Express-C system in Amazon's EC2 cloud computing infrastructure (Express-C is a full-function version of DB2 for Linux, UNIX, and Windows that can be used - even in a production environment - on a no-charge basis), but what I really wanted to see from IBM was a cohesive and comprehensive strategy aimed at making DB2 a leader with respect to cloud-based data-serving.

I'm very pleased to inform that this strategy does exist. It was presented by IBM during a "Chat With The Lab" presentation and conference call conducted on March 25 (and available soon for replay - check the DB2 "lab chats" Web page, and the "DB2 and Cloud Computing" link on that page, for more information and to download the associated presentation). Let me tell you, I liked what I saw and heard during this session. IBM's Leon Katsnelson, Program Director for Data Management Portfolio and Product Management (and the guy behind the FreeDB2 Web site), said during the call that development of the strategy for driving DB2 utilization in the Cloud was guided by this question: "What technologies and business models can we bring to market to help our customers realize the promise of cloud computing?" That "promise" refers to the potential of cloud computing to lower costs, enhance flexibility, and increase agility for organizations large and small, across all industries.

Here are the key elements of IBM's DB2 strategy as it relates to cloud computing:

Deliver key technologies to support the private cloud initiatives of DB2-using organizations. Public clouds (those physically based outside of a using enterprise) often come to mind when one thinks about cloud computing, but leaders at many larger organizations are keenly interested in developing and exploiting private (i.e., in-house) clouds that would function as public clouds do in terms of real-time Web, application, and data server instantiation and scaling. To aid these initiatives, IBM is delivering full support for DB2 in virtualized environments, enhancing and standardizing DB2 server provisioning and automation, and providing sub-capacity pricing for cost-effective virtualization (a virtual DB2 "machine" will often use a subset of the processing "cores" available on a physical server, and sub-capacity pricing accommodates this reality).

Partner with key public cloud providers to fully integrate DB2 into the ecosystem. Perhaps the best-known of the public clouds is Amazon Web Services (AWS). IBM has partnered with Amazon to provide several options for individuals and organizations seeking to use DB2 in a cloud environment. These include: 1) a pre-built DB2 Amazon Machine Image (AMI) that can be used for development purposes with no associated DB2 software charges (you pay only for the incremental use of Amazon's infrastructure), 2) pre-built DB2 AMIs that can be used for production purposes (pricing for these is expected to be announced in the second quarter of this year), and 3) creating your own DB2 AMI using your existing DB2 licenses. In addition to working with Amazon, IBM has partnered with other leaders in the cloud space to help organizations deploy DB2 in a cloud setting. Representatives of two of these partners - Corent and RightScale - participated in the "Chat With the Lab" call. Corent provides a set of software products, called SaaS Suite, that can enable companies to quickly develop and deploy sophisticated, turnkey, DB2-based "Software as a Service" (SaaS) applications in Amazon's Elastic Compute Cloud (aka EC2, the computer resources utilized by Amazon Web Services users). RightScale provides products and services that help organizations to effectively and efficiently manage their cloud servers (including DB2 servers) and cloud-deployed applications, in Amazon's EC2 and other cloud environments.

Provide a robust DBMS for SaaS vendors. Cloud computing is a terrific resource for companies that want to develop and sell "Software as a Service" applications (customers of these companies - Salesforce.com is an example of a SaaS vendor - use an application's functionality but don't run the application software in-house), and IBM wants DB2 to be these firms' DBMS of choice. With its combination of attractive pricing, advanced autonomic features (e.g., self-tuning memory management and automated updating of catalog statistics), and industry-leading technologies such as Deep Compression and pureXML, DB2 presents a compelling choice for cloud-based SaaS vendors looking to gain a competitive edge in the marketplace.

Offer terms and conditions and pricing to make DB2 the best DBMS for the Cloud. Advanced technology is great, but as mentioned previously, many companies looking to leverage cloud computing resources are aiming for cost savings, improved flexibility, and enhanced agility. If these are your goals, you won't be interested in DBMS software (or any other kind of software) that burdens you financially or overly restricts your deployment options. The folks at IBM get this, and they are determined to make DB2 as attractive to business people as it is to IT pros with respect to running in the Cloud.

Cloud computing is a disruptive technology, and some companies may see it as a threat. IBM's leaders see opportunity in the Cloud, and I believe that they have a strategy in place that will make DB2 a big part of their - and their customers' - success in the cloud computing arena.

Tuesday, March 24, 2009

DB2 for z/OS: Overlooking Writes is Wrong

I recently got into an e-mail discussion about DB2 for z/OS I/O performance. The particular concern that sparked the discussion was the impact of synchronous remote disk mirroring on I/O response times ("synchronous remote disk mirroring" refers to a high-availability feature, provided by several vendors of enterprise-class disk subsystems, that keeps data on volumes at a local site in synch with volumes at another site, with the remote site typically being within 30-35 kilometers of the local site). The person who initiated the online exchange was focused on DB2 synchronous read performance, and for him I had a couple of nuggets of information: 1) synchronous remote disk mirroring will generally have little or no impact on DB2 read I/O performance, since only disk writes are replicated to the remote site; and 2) regardless of whether or not disk volumes are remotely mirrored, you don't want to base your assessment of a DB2 for z/OS subsystem's I/O performance solely on the read response times provided by your DB2 monitoring product. In the remainder of this blog post, I'm going to expand on that second point, because DB2 write I/Os do matter.

One type of DB2 write I/O that can obviously affect application performance is a write to the active log - this because it's a synchronous event with respect to commit processing (a commit operation cannot complete until associated records in the DB2 log buffers have been externalized to disk). The good news here is that you can easily check on an application's wait time due to log writes using a DB2 monitor (it's one of the fields that shows up among the "class 3 suspension" times in a DB2 monitor accounting report or online display of thread detail information). In most cases, this wait time will be quite small (it should be zero for a read-only transaction). DB2 active log data sets associated with high-volume production systems are generally placed on disk volumes fronted by non-volatile cache memory (non-volatile meaning that a battery backup will keep data that's been written to cache and not yet to spinning disk from being lost in the event of a power failure). When this is the case, the log write is considered to be complete, from a z/OS (and DB2) perspective, once the log records have been written to disk controller cache memory (they'll be asynchronously destaged to spinning disk by the disk controller at a later time). That's a very fast write. If you see that wait time due to log writes is more than a small percentage of total class 3 suspend time for an application process, it's possible that you have some device contention that needs to be resolved, perhaps by relocating some heavily-accessed data sets that might now be interfering with active log I/Os. Also, ensure that copy 1 and copy 2 of any given active log data set are not on the same disk volume (that's important for availability as well as performance).

What about the writing of updated pages to tablespaces and indexes on disk? Plenty of people think that these are a non-factor in terms of their effect on application performance. Folks think this way because the writes are deferred with respect to the actions (e.g., UPDATE, INSERT) that change the pages. The application processes that update DB2 data are not charged for the externalization of changed index and tablespace pages to disk (the DB2 database services address space, aka DBM1, bears this cost), and this leads people to suppose that "no one waits for a DB2 database write." Au contraire, mes amis. Application programs can indeed end up waiting for DB2 database writes, and this wait time is recorded in the "other write I/O" field in the "class 3 suspensions" section or a DB2 monitor accounting report or online display of thread detail information. Here's why you should not be surprised to see a non-zero value in this field: a DB2 application process cannot access a page in the buffer pool if said page is scheduled for write (i.e., if a write I/O operation that will externalize the page to disk is underway but has not yet completed). That write I/O operation could be related to DB2 checkpoint processing or to a buffer pool deferred write threshold being reached or to a DB2 data set being pseudo-closed, but in any case it can cause a delay in the execution of an application process that needs a page that is in the process of being externalized.

Now, "other write I/O" wait time will typically be a low value, since at any given time one would expect only a small percentage of the pages in the DB2 buffer pools to be scheduled for write (a tablespace or index page could of course be changed a number of times before being externalized to disk). If you do see a time for wait due to "other write I/O" that is more than a small percentage of overall class 3 suspension time for a DB2 application process, what could you do about it? For one thing, you could work to reduce contention within the disk subsystem. A good way to do that is to enlarge your DB2 buffer pool configuration. If you can't do that (perhaps you don't have enough system memory on your z/OS system to support a larger buffer pool configuration), look at redistributing buffer space among your pools (i.e., allocate more buffers to pools with lots of I/O activity, and fewer to pools with less I/O activity). Also take a look at pseudo-close activity (check the number of data sets converted from read/write to read-only in the "open/close activity" section of a DB2 monitor statistics report or online display of subsystem activity). Pseudo-close is good for quicker restart in the event of a DB2 failure, but too much pseudo-close activity can mean a lot of page externalization. Personally, I like to see a number in the low double digits per minute for data sets converted from read/write to read-only during busy times. If you see a good bit more than that, consider adjusting the values of the ZPARM parameters PCLOSEN and PCLOSET upward somewhat.

If you need to, consider moving some data sets around within the disk subsystem so as to relieve I/O "hot spots." I don't like to get into "hand placement" of DB2 data sets on disk, preferring instead to let the operating system handle this (I like to define STOGROUPs with a VOLUMES specification of '*'), but sometimes manual intervention is needed.

DB2 synchronous read wait time gets the lion's share of attention when it comes to analyzing DB2 for z/OS I/O performance, and that's as it should be - this often accounts for a big chunk of overall in-DB2 time for an application process. That said, one should recognize that DB2 application programs can end up waiting for write I/Os to complete, too. Just keep an eye on the write-related suspension times (for log writes and for database writes, keeping in mind that the latter are often labeled as "other write I/O" by DB2 monitor products), and be prepared to act if these numbers are more than a small percentage of total class 3 suspend time for an application process.

Tuesday, March 17, 2009

SOA Hits a Speed Bump (but not a Dead End)

In the February 23, 2009 issue of InformationWeek magazine, there's an interesting article on the "state of SOA (Service-Oriented Architecture)" titled, "Trouble Ahead, Trouble Behind". That title happens to be a line in a song ("Casey Jones") written and recorded by the Grateful Dead, and it makes you wonder if the article's author, Roger Smith, sees SOA as having one foot (or more) in the grave. Some folks do (the article cites a blog post, "SOA is Dead; Long Live Services," issued by Anne Thomas Manes), but Smith makes it clear at the outset that he's optimistic about the architecture's long-term prospects: "Reports of SOA's demise have been greatly exaggerated." To back up this contention, Smith shares insights gleaned from a recent InformationWeek Analytics survey of 270 business technology professionals. Responses to this survey do indeed indicate a slowdown in SOA implementations - a finding that corroborates a report on the topic published this past November by Gartner (also cited in the article). A slowdown, however, is not the same as a full-stop, and Smith emphasizes that the InformationWeek survey data show continued forward progress on the SOA adoption front, albeit at a more deliberate pace than in years past.

The current economic downturn surely has something to do with this, since SOA isn't free and getting project funding is increasingly tough, but Smith points out that "Far and away the major reason respondents who aren't evaluating or implementing SOA cite for not pursuing the initiative is a lack of a viable business case - 43% say it's because SOA initiatives have developed a reputation for overpromising and and underdelivering." Overpromising and underdelivering? Is it possible that SOA has been over-hyped? Oh, yeah. Here, Smith puts much of the blame on vendors that "sold the concept to CIOs and other corporate decision makers as being about specific (and expensive) products like Web services or SOA management products, enterprise service buses, SOA gateways, and hardware acceleration devices for Web services." The hangover resulting from drinking in all the vendor hype about SOA is also mentioned in an article, written by Steve Craggs and published last May in Mainframe Executive magazine, titled "Why Do SOA Projects Fail?" Craggs writes that plenty of people in IT departments are also guilty of having oversold SOA, becoming more enamored with SOA technology itself than with it's application in support of business objectives.

Craggs, like Smith, sees SOA as being a positive for adopting organizations, and in his article he provides a lot of useful information in a "lessons learned" format. In so doing, he lays out the primary reason for optimism with respect to the long-term viability and growth of SOA: fundamentally, it's all about the reuse of code associated with application services, and "It's not until a critical mass of reusable services has been assembled that the benefits [of SOA] start to mount to measurable levels. However, once this point is reached, benefits seem to grow almost exponentially." Roger Smith makes a similar point in the aforementioned InformationWeek article: "we believe that a snowball effect will arise over the coming years: As more Web services can be invoked, more applications will be written to invoke them. With the increased availability of Web services components, application designers will evolve from thinking about application architectures as monolithic, siloed software efforts and move toward the exploitation of configurable, component-based SOAs."

Now for a DB2 tie-in. Last week, I delivered a couple of presentations at a meeting of the Michigan DB2 Users Group. The first of these was on DB2 stored procedures, and I was really struck by the level of interest in the topic as indicated by all the follow-on questions and discussions that proceeded from the presentation. I see the continued growth in organizations' use of, and plans for, DB2 stored procedures as supportive of SOA thinking, given that it shows:

An inclination towards abstraction and looser coupling of application system components. A developer writing a program that will call a DB2 stored procedure to accomplish a database function (lookup or update) does not need to have knowledge of the database schema that would otherwise be required if the program were to issue the SQL DML statements (SELECT, UPDATE, INSERT, DELETE, etc.) associated with the database function. That makes it much easier to effect back-end database design changes in a way that minimizes disruption of front-end application code. It also promotes reuse of encapsulated database functions (a DB2 stored procedure can be invoked by any language that supports the call syntax, including Java, C#, Ruby, Perl, Python, and PHP, to name a few).
A "database-layer" mind set. An application developed in accordance with SOA principles is characterized by a distinct layering of functional components. Of course, you could stuff a lot of business logic in a stored procedure, but in my experience organizations have used DB2 stored procedures to encapsulate data-access logic in a form that can easily be invoked by business-layer programs (which often run in servers that are physically separate from the database server, though SOA does not depend on such physical separation of layers). Because these stored procedures are focused on data retrieval and/or update, as opposed to business actions based on retrieved values or results of updates, they tend to be relatively simple versus the more "vertical" (referring to inclusion of business and perhaps even presentation logic) programs associated with a monolithic-style application architecture. This facilitates stored procedure development and deployment, and that story gets even better when you look at SQL itself as the stored procedure programming language (as I've mentioned in an earlier blog post, mainframers should be psyched about the delivery of native SQL procedures with DB2 for z/OS V9).

SOA dead? Not hardly. Has the pace of SOA implementation efforts slowed recently? Yes, but an SOA project is not a sprint-type event. It takes time to realize the benefits of SOA adoption. As a longtime distance runner, I know the importance of going at a sustainable pace if you want to reach your goal. DB2 people, in gravitating more towards stored procedures and making life easier for application developers wanting to access DB2 databases using all kinds of programming languages and application servers, are helping to lay the groundwork for future SOA success stories.

Monday, March 9, 2009

Claims and Drains on the Main(frame)

It's nice when knowledge gained some time ago becomes useful again in a new context. Take DB2 for z/OS drain locking. This technology was delivered in the mid-1990s, I think via DB2 V3 - though it may have veen V4. In any case, drains - and their counterpoint, claims - were initially spoken of primarily in the context of DB2 utility operations (more on this momentarily). Just last week, I had the opportunity to share some of what I know about drains and claims in responding to a question about EXCHANGE, a new SQL statement associated with the clone table functionality of DB2 for z/OS V9. Basically, EXCHANGE can be used with a clone table to effect a very quick (usually) replacement of a table's contents with new data by way of the DB2 SQL interface - this in contrast to the traditional utility-driven LOAD REPLACE approach. The person asking the question wanted to know of the locking and concurrency implications of EXCHANGE, and that prompted the exchange of information about drains and claims. As the question was posted to the popular DB2-L forum, several other DB2 experts chimed in with related and very useful comments. I'll cite these as I go along.

First, a bit of background. A number of DB2 for z/OS utilities require some degree of exclusive access to a target tablespace during one or more phases of their execution - either total exclusivity (no other programs can read data from, or update data in, the tablespace), or write-exclusivity (concurrent read access by other programs is permitted, but concurrent data-change access is not). In the early years of mainframe DB2, this exclusive access was secured through the acquisition of tablespace locks. This was problematic when an application process held a tablespace lock for a long period of time, thereby preventing the utility from acquiring the lock it needed to operate on the targeted database object. Quite often, the contention was not between a utility and a long-running batch job; rather, it was between a utility and one or more threads used by DB2-accessing online transaction programs. A batch workload tends to be fairly predictable: you know when jobs will start (especially when they are submitted through a job scheduling system), you know how long they usually run, and you know which tables (and, therefore, tablespaces) they access. You can schedule utilities around them.

An online transaction-associated thread can be a different matter when 1) it is reused by many transaction programs and 2) some of those programs are bound with RELEASE(DEALLOCATE). The first of these factors leads to thread persistence, and the second (in combination with the first) enhances CPU efficiency (as described in a previous post to this blog). In particular, there are plenty of sites with high-volume CICS-DB2 transaction workloads that aim for high levels of thread reuse (aided by the utilization of so-called protected entry threads) and have a lot of DB2-accessing CICS programs bound with RELEASE(DEALLOCATE). In former times this could lead to utility lock-out situations because a given CICS-DB2 thread could persist for hours (or even days for a 24X7 transactional workload), and tablespace locks acquired by programs executed through the thread and bound with RELEASE(DEALLOCATE) would not be released until the thread was terminated (while a utility might be locked out in that case, other application processes typically would not be, as user programs almost always get non-exclusive locks on tablespaces). Drains to the rescue!

The drain locking mechanism enabled DB2 utilities (and certain commands) to "break in" on a persistent (i.e., long-lived) thread holding one or more tablespace locks, even if those locks were acquired for programs bound with RELEASE(DEALLOCATE). Here's why: a DB2 application process has to acquire a read or a write claim on an object to be accessed, and that claim will be released (and must subsequently be reacquired if needs be) at each commit point, regardless of the RELEASE option specified at program bind time. When a DB2 utility or command issues a drain lock request for an object (such as a tablespace), no new claims can be acquired on that object by application processes. When claims already held at the initiation of the drain process are released in the course of commit activity, the drain lock is obtained and the utility or command can proceed with execution. When execution is complete and the drain lock is released, application processes can again acquire claims on the target object (note that some drains affect only data-change activity and allow a continuance of read access, while other drains affect all claimers).

So, what does this have to do with the EXCHANGE statement? Well, when the statement is issued, DB2 will drain both the base table and the corresponding clone table (the one that will be switched with the base table to achieve what appears to application programs to be a table-content replacement). If frequent commits are the rule in the DB2 environment, the drain locks should be acquired in short order, the switch will occur (also quickly), and program access to the "new" base table (formerly the clone table) can resume (again, there is no visibility at the program level of the fact that access after the EXCHANGE operation is to a physically different table - it just appears that it's the same table with new content). Workload disruption should be minimal. Suppose, though, that the drain locks on the base and clone tables can't be acquired because an application process has a claim on the base or the clone table and does not release that claim through a commit? Because the drain process (as previously mentioned) prevents new claims from being acquired on a target object, you are now looking at a potentially very noticeable workload interruption for some users. That's something to consider. If you want to use EXCHANGE (and it is definitely a cool feature of DB2 V9), you might want to review your DB2 application workload to ensure that programs commit frequently (this is, of course, most often an issue for programs that run in batch mode). In doing that, don't overlook read-only programs - they need to commit to release claims, just as data-changing programs do. If certain long-running programs don't issue frequent commits, and if they can't be changed to do so (or if they commit frequently but utilize cursors defined WITH HOLD, as explained below), you'll need to schedule EXCHANGE operations accordingly if said programs access tables named in the EXCHANGE statement.

Now, to give associated credit where it's due, as promised (referring to people who contributed useful information to the DB2-L thread that sparked this blog entry):

Phil Grainger, DB2 expert and Senior Principal Product Manager at CA, noted that if an EXCHANGE operation is to be minimally disruptive in terms of application access to affected data, it must be committed or rolled back in a timely manner. So, the program issuing the EXCHANGE relies on the commit frequency of programs that are concurrently accessing the base (and/or clone) table, and those programs in turn rely on the EXCHANGE-ing program to commit following the table switch.
Peter Backlund, a DB2 consultant based in Sweden, reminded that 1) claims are acquired on database objects even for programs bound with ISOLATION(UR), and 2) claims are NOT released at commit when they are associated with cursors defined using the WITH HOLD option.
Peter Vanroose, a DB2 specialist at ABIS Training and Consulting in Belgium, followed up on Peter's comment regarding cursors defined WITH HOLD, explaining why we should be GLAD that DB2 behaves this way: in retaining a claim through commits, the WITH HOLD option of DECLARE CURSOR provides a means whereby a programmer can be assured that the content of a table from which a long-running FETCH loop is retrieving rows will not be switched for a clone table's content before the FETCH loop completes.
Steen Rasmussen, a DB2 expert and Principal Technical Specialist at CA, mentioned that he had delivered a presentation about DB2 for z/OS V9 clone tables at IDUG last year. Steen's presentation is currently available on the IDUG Web site (http://www.idug.org/), in the Premier Technical Library in the Members Access Area of the site (IDUG Premier-level membership is included in the registration fee for IDUG conferences, and is available to others for only $25 per year). In a few months, the presentation will also be available in IDUG's basic Technical Library (basic membership in IDUG is available free of charge). The title of Steen's presentation is "The Clones Have Landed - Watch Out!"

Happy cloning, and remember: release those claims, so that others may drain.

Monday, March 2, 2009

pureXML: Three Years On, Still a Big DB2 Draw

Back in 2006, I wrote an article about XML for IBM Database Magazine (known then as DB2 Magazine). With the availability that year of DB2 9.1 for Linux, UNIX, and Windows (LUW), IBM delivered pureXML, a technology that enabled truly integrated management of XML and non-XML data. Right from the start, pureXML generated interest amongst non-DB2-using organizations like nothing I'd ever seen before. Yes, there was enthusiasm for pureXML within the established DB2 community, but I was struck by the way that this new feature caused companies that had previously utilized other relational database management systems to evaluate - and, quite often, to adopt - DB2 (DB2 9 for z/OS brought pureXML capabilities to the mainframe platform in 2007).

Skipping ahead to the spring of 2008, I found myself working with a company that was transitioning to DB2 from a non-IBM database management system (in this case, an open-source DBMS). The primary motivation for undertaking that migration project was the client's desire to take advantage of pureXML. I ended up discussing that experience in a video that IBM posted to ChannelDB2 and to other Web sites.

Moving now to the present time, just last week I participated in my first virtual event, an IBM-organized online get-together called Data in Action. Together with Dave Beulke, a fellow IBM Data Champion, I answered DB2-related questions during a "Chat with the Champions" session. One of this session's attendees, who mentioned that he had worked previously with non-IBM DBMSs, was seeking to learn more about DB2. Why? pureXML.

Now, DB2 has plenty in the way of advanced functionality, including best-of-breed data compression, robust multi-node clustering, a rich SQL procedure language, and pureQuery for superior Java application efficiency and manageability. All this and more is wrapped up in a package that delivers - across platforms - what in my opinion is the best combination of scalability, availability, and value offered in today's database software marketplace. Why, then, is pureXML still such a standout DB2 feature? I think it has to do with the right technology coming along at precisely the right time.

XML is very much in the business mainstream. It was important when I wrote the aforementioned IBM Database Magazine article, and it has become more pervasive since then. A growing number of organizations have to deal with very large amounts of data in XML format, perhaps because XML is the medium of data exchange in their industry (in other words, you handle XML or you don't play the game), or possibly because XML was selected as the best means of representing a logical data structure that is hierarchical in nature. Many of these companies use relational database management systems, and they want to manage their XML data in a way that provides the query performance and flexibility, data validation, security, and reliability that they've come to count on in their RDBMS environments. On top of that, they want manage and use their XML and non-XML data in an integrated fashion, versus having XML data in silo A and non-XML data in silo B.

Prior to DB2 V9, organizations with XML data management needs as described above faced an array of unappealing options: get high-performance XML search and retrieval and XML-centric services through the use of a specialized XML data store (not only does that work against integration - it can also require the use of different groups of technical staff for support of XML and non-XML data administration); go for "integration" of XML and non-XML data, but only in a gross sense, by storing XML documents as large objects (LOBs) or long strings in a relational database (thereby losing efficient search based on values of particular nodes buried within these big chunks of character data); or make do with "shredding," another attempt at XML and non-XML data "integration" that involves placing data values corresponding to the nodes of XML documents into standard columns of tables in a relational database (this approach can bog down quickly when XML schema changes necessitate the frequent addition of columns to existing tables).

Along comes DB2 pureXML, and POW! - there's the answer: an XML data type that preserves the structure of a stored XML document in a way that is visible to - and can be exploited by - the DBMS (enabling DB2 to quickly navigate to a particular node within an XML document, and allowing efficient indexing of XML data based on values in particular nodes of a schema); the availability of XPath expressions in queries and in index definitions (and, on Linux, UNIX, and Windows servers, the availability of the XQuery XML query language); catalog extensions by which XML schemas can be stored and used to validate the structural correctness of XML documents at INSERT time; and true integration of XML and non-XML data, with utility support and the ability to retrieve XML data - either whole documents or parts thereof - from a table based on values in non-XML columns, or to SELECT non-XML data column values based on node values in XML columns.

What draws many organizations to pureXML - including those that have not previously used DB2 - is the quest for competitive advantage. If company XYZ is one of several suppliers to a large firm that requires XML-based data exchange, and if XYZ leverages DB2 pureXML to respond much more quickly to the client company's data requests or to client-dictated changes in the structure of the XML data-interchange format, XYZ could gain more of the client's favor - and business - relative to competing suppliers. If an enterprise uses pureXML to efficiently and effectively manage both its XML and non-XML data resources, it can operate in a leaner and more agile way, passing cost savings on to customers and getting to market first with new services that address new market opportunities (or threats). These organizations got the pureXML message, and they're responding to it.

For people who support DB2 at companies that have long used the DBMS, the charge is to make sure that your colleagues - particularly in application development and on the business side of the organization - are aware of the capabilities provided by the pureXML technology that you already have in-house (assuming that you're on DB2 V9). Numerous organizations opted to go with DB2 and pureXML because it relieved the XML data management headaches from which they'd been suffering. If you already have pureXML technology in your shop, you can help to prevent those headaches from coming on in the first place - so spread the word.

Previous Posts

Archives