Robert's Blog

Monday, March 2, 2009

pureXML: Three Years On, Still a Big DB2 Draw

Back in 2006, I wrote an article about XML for IBM Database Magazine (known then as DB2 Magazine). With the availability that year of DB2 9.1 for Linux, UNIX, and Windows (LUW), IBM delivered pureXML, a technology that enabled truly integrated management of XML and non-XML data. Right from the start, pureXML generated interest amongst non-DB2-using organizations like nothing I'd ever seen before. Yes, there was enthusiasm for pureXML within the established DB2 community, but I was struck by the way that this new feature caused companies that had previously utilized other relational database management systems to evaluate - and, quite often, to adopt - DB2 (DB2 9 for z/OS brought pureXML capabilities to the mainframe platform in 2007).

Skipping ahead to the spring of 2008, I found myself working with a company that was transitioning to DB2 from a non-IBM database management system (in this case, an open-source DBMS). The primary motivation for undertaking that migration project was the client's desire to take advantage of pureXML. I ended up discussing that experience in a video that IBM posted to ChannelDB2 and to other Web sites.

Moving now to the present time, just last week I participated in my first virtual event, an IBM-organized online get-together called Data in Action. Together with Dave Beulke, a fellow IBM Data Champion, I answered DB2-related questions during a "Chat with the Champions" session. One of this session's attendees, who mentioned that he had worked previously with non-IBM DBMSs, was seeking to learn more about DB2. Why? pureXML.

Now, DB2 has plenty in the way of advanced functionality, including best-of-breed data compression, robust multi-node clustering, a rich SQL procedure language, and pureQuery for superior Java application efficiency and manageability. All this and more is wrapped up in a package that delivers - across platforms - what in my opinion is the best combination of scalability, availability, and value offered in today's database software marketplace. Why, then, is pureXML still such a standout DB2 feature? I think it has to do with the right technology coming along at precisely the right time.

XML is very much in the business mainstream. It was important when I wrote the aforementioned IBM Database Magazine article, and it has become more pervasive since then. A growing number of organizations have to deal with very large amounts of data in XML format, perhaps because XML is the medium of data exchange in their industry (in other words, you handle XML or you don't play the game), or possibly because XML was selected as the best means of representing a logical data structure that is hierarchical in nature. Many of these companies use relational database management systems, and they want to manage their XML data in a way that provides the query performance and flexibility, data validation, security, and reliability that they've come to count on in their RDBMS environments. On top of that, they want manage and use their XML and non-XML data in an integrated fashion, versus having XML data in silo A and non-XML data in silo B.

Prior to DB2 V9, organizations with XML data management needs as described above faced an array of unappealing options: get high-performance XML search and retrieval and XML-centric services through the use of a specialized XML data store (not only does that work against integration - it can also require the use of different groups of technical staff for support of XML and non-XML data administration); go for "integration" of XML and non-XML data, but only in a gross sense, by storing XML documents as large objects (LOBs) or long strings in a relational database (thereby losing efficient search based on values of particular nodes buried within these big chunks of character data); or make do with "shredding," another attempt at XML and non-XML data "integration" that involves placing data values corresponding to the nodes of XML documents into standard columns of tables in a relational database (this approach can bog down quickly when XML schema changes necessitate the frequent addition of columns to existing tables).

Along comes DB2 pureXML, and POW! - there's the answer: an XML data type that preserves the structure of a stored XML document in a way that is visible to - and can be exploited by - the DBMS (enabling DB2 to quickly navigate to a particular node within an XML document, and allowing efficient indexing of XML data based on values in particular nodes of a schema); the availability of XPath expressions in queries and in index definitions (and, on Linux, UNIX, and Windows servers, the availability of the XQuery XML query language); catalog extensions by which XML schemas can be stored and used to validate the structural correctness of XML documents at INSERT time; and true integration of XML and non-XML data, with utility support and the ability to retrieve XML data - either whole documents or parts thereof - from a table based on values in non-XML columns, or to SELECT non-XML data column values based on node values in XML columns.

What draws many organizations to pureXML - including those that have not previously used DB2 - is the quest for competitive advantage. If company XYZ is one of several suppliers to a large firm that requires XML-based data exchange, and if XYZ leverages DB2 pureXML to respond much more quickly to the client company's data requests or to client-dictated changes in the structure of the XML data-interchange format, XYZ could gain more of the client's favor - and business - relative to competing suppliers. If an enterprise uses pureXML to efficiently and effectively manage both its XML and non-XML data resources, it can operate in a leaner and more agile way, passing cost savings on to customers and getting to market first with new services that address new market opportunities (or threats). These organizations got the pureXML message, and they're responding to it.

For people who support DB2 at companies that have long used the DBMS, the charge is to make sure that your colleagues - particularly in application development and on the business side of the organization - are aware of the capabilities provided by the pureXML technology that you already have in-house (assuming that you're on DB2 V9). Numerous organizations opted to go with DB2 and pureXML because it relieved the XML data management headaches from which they'd been suffering. If you already have pureXML technology in your shop, you can help to prevent those headaches from coming on in the first place - so spread the word.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home