Catterall Consulting: May 2009

Robert's Blog

Thursday, May 28, 2009

Much Ado About DB2 Indexes (Part 2)

Last week, I posted an entry in which I described the numerous index-related enhancements delivered via DB2 9 for z/OS. In this related "part 2" entry, I'll cover new features of DB2 9.7 for LUW (Linux, UNIX, and Windows) - announced a few weeks ago and available next month - that pertain to indexes. My thanks go out to Matt Huras, Mike Winer, Chris Eaton, and Matthias Nicola of IBM's DB2 development organization, who recently delivered presentations on DB2 9.7 features that provided very useful information to me and others in the DB2 community.

First up: index compression. DB2 9.7 for LUW index compression differs from the index compression feature of DB2 9 for z/OS in several ways. For one thing, leaf pages of DB2 9.7 compressed indexes are compressed in memory as well as on disk, whereas DB2 9 for z/OS compression is disk-level only. Secondly, while DB2 9 for z/OS compression is based on squeezing the contents of an 8K, 16K, or 32K index leaf page in memory onto a 4K page on disk, DB2 9.7 index compression is accomplished via three algorithms that are chosen automatically by the DBMS (these algorithms, which can be used in combination, are briefly described below). Additionally, DB2 9.7 index compression is activated automatically when row compression is activated for a table on which indexes are defined (DB2 9.7 index compression can also be activated independent of row compression by way of an ALTER INDEX or CREATE INDEX statement with the new COMPRESS YES option, with a REORG required after ALTER INDEX).

I expect that index compression will prove to be very popular among users of DB2 9.7, especially in large-database environments, as it offers substantial disk space savings (likely to be in the range of 35% to 55%), better buffer pool hit ratios (with correspondingly reduced I/O activity), fewer page requests (because index leaf pages will hold more key values, and index levels may be reduced), and fewer index page splits. There will be some CPU overhead cost associated with index compression, but this should be offset to some degree by the aforementioned reductions in I/O activity and page access requests.

Now, a little about the algorithms by which DB2 9.7 index compression is achieved (again, these are selected - and combined, if appropriate - automatically by DB2):

RID list compression. In an index leaf page, each entry contains a key value and list of RID (row ID) values, the latter indicating the location of rows containing the key value. A RID value will take up 4, 6, or 8 bytes of space, depending on the base table's tablespace type (e.g., LARGE or REGULAR, partitioned or non-partitioned). For a LARGE non-partitioned tablespace, for example, a RID will occupy 6 bytes of space: 4 bytes for a page number and 2 for a slot number within the page. If an index on a table in that LARGE tablespace is compressed, and if, say, 10 rows within a given page contain a certain key value, the full RID value only has to be stored once in the RID list for those 10 rows. For the other 9 rows containing the key value, only the delta values between one row's RID value and the next have to be stored (RID values are always stored in ascending sequence). Because that delta value can be stored in as little as one byte of space, substantial savings can be achieved. RID list compression delivers maximum benefit for indexes that have relatively low cardinality (i.e., lots of duplicate key values and, therefore, relatively long RID lists) and a relatively high cluster ratio (making it likely that multiple rows with duplicate key values will be found on a given page).
Prefix compression. Key values are stored in an index leaf page in ascending sequence. Sometimes, adjacent key values will be very similar (consider, for example, timestamp values that have year, month, day, and hour values in common; or a multi-column key for which leading columns have low cardinality). In such cases, DB2 9.7 can store the full key with the common prefix once in a page, with subsequent entries containing only the differentiated values that follow the common prefix.
Slot directory compression. A certain amount of the space in an index leaf page is occupied by something called a slot directory. It used to be that the size of the slot directory - determined based on the maximum number of index entries that could be stored on the page - was fixed. For a compressed index, the size of the slot directory is variable and can be reduced based on factors such as common prefix entries, variable length key parts, and duplicate key values.

Note that a new DB2 9.7 table function, ADMIN_GET_INDEX_COMPRESS_INFO(), can be used to obtain an estimate of the space savings that would result from activating compression for a given non-compressed index. This same function can be used to get the actual space savings for an index after it has been compressed.

Note also that compression can't be used for all indexes in a DB2 9.7 environment. Compression is not available for indexes on catalog tables, block indexes (these enable multi-dimensional clustering), XML path and meta indexes, and index specifications.

Next up, partitioned indexes. These are indexes, defined on a range-partitioned table, that are themselves partitioned along the lines of the underlying table's partitions. In comparison with global (i.e., non-partitioned) indexes on range-partitioned tables, partitioned indexes will allow for more efficient partition roll-in and roll-out operations (i.e., ATTACH and DETACH of partitions), as they eliminate the global index maintenance (and associated logging) that would otherwise be required. Partitioned indexes will also enable users to run REORG at the partition level.

In a DB2 9.7 system, all indexes on range-partitioned tables will be created, by default, as partitioned indexes as long as this is possible. It is not possible for a unique index when the index key is not a superset of the underlying table's partitioning key.

And, last but not least, DB2 9.7 delivers online creation of XML indexes and online REORG of same. In both cases, use of the ALLOW WRITE ACCESS option will enable the XML index CREATE or REORG operation to proceed without blocking writers.

So, for both the mainframe and Linux/UNIX/Windows platforms, IBM DB2 development keeps delivering good news on the index front. I expect more of the same in the future.

Tuesday, May 19, 2009

Much Ado About DB2 Indexes (Part 1)

I was in Denver last week for the North American Conference of the International DB2 Users Group (IDUG). More so than in recent years past, plenty of the talk during this conference was about index enhancements. Several important index-related features have come out over the past several DB2 releases, notable examples being online index reorganization for DB2 for z/OS and DB2 for Linux, UNIX, and Windows (LUW); block indexes (these enabled multi-dimensional clustering) with DB2 8.1 for LUW; and data-partitioned secondary indexes (aka DPSIs, referred to as "dipsies") with DB2 8 for z/OS. That said, the current versions of the DBMS - DB2 9 for z/OS and DB2 9.7 for LUW (announced last month and available in June) - pack in more index goodies than I've seen since type 2 indexes were delivered with DB2 for z/OS Version 4 in the mid-1990s. In this blog entry, I'll summarize what's new in the world of indexes with DB2 9 for z/OS. Next week, I'll post a "part 2" entry that will describe index enhancements delivered with DB2 9.7 for LUW.

Here, then, is my list of DB2 9 for z/OS features that pertain to indexes (while not necessarily an exhaustive list, it's fairly comprehensive):

Index-on-expression: As I wrote in my previous blog entry, this enhancement provides quick relief to a headache-inducing situation with which many a DBA has dealt: you have a query with a predicate involving a column expression (e.g., WHERE SUBSTR(COL1, 1, 2) = 'AB' or WHERE COL1 + COL2 = 100). The column expression makes the predicate non-indexable, and if there are no other indexable predicates in the query you're looking at a tablespace scan. With DB2 9 for z/OS you can create an index on a column expression, enabling you to make previously non-indexable predicates indexable. The potential payoff: orders-of-magnitude improvement in query performance.

Larger index page sizes: DB2 users have long had a choice with respect to the size of a data page in a table. 4 KB and 32 KB page-size options have always been there, and 8 KB and 16 KB page sizes were added to the mix several years ago. Indexes were a different story, with 4 KB being the only page size supported. That changed with DB2 9 for z/OS and its support for 8KB, 16 KB, and 32 KB index page sizes. Some might think of larger index page sizes only as a means of achieving index compression (see the next item in this list), but they can deliver benefits outside of compression enablement. Consider index page splitting, which occurs when a key value has to be inserted into an already-full index leaf page (not uncommon when an index key is not a continuously-ascending value): a portion of the leaf page's entries (traditionally, half of the entries, but that's also changed with DB2 9, as you'll see when you read about "adaptive index page splitting" a little further down in this list) are moved to an empty page to make room for the new entry, and the whole index tree is latched while this occurs. Larger index page sizes mean less index page splitting. Another potential benefit of a larger index page size is a reduction in the number of levels for an index. Suppose, for example, that an index with 4 KB pages has four levels: a root page that points to level-2 non-leaf pages, which in turn point to level-3 non-leaf pages, which point to the leaf pages. That same index might require only three levels with a larger page size, and that would reduce the CPU cost of each index probe operation (from root-level down to leaf-level) by 25%.

Index compression: Mainframe DB2 users have enjoyed the benefits of tablespace compression since Version 3 (if memory serves me correctly). For some DB2 subsystems, particularly in data warehouse environments, in which the average number of indexes defined on a table tends to be higher than in online transactional systems, the disk space occupied by indexes can exceed the amount used for tablespaces (especially if the latter are compressed, as is very commonly the case). With DB2 9, index compression is an option. It's different from tablespace compression in that 1) it's not dictionary-based and 2) the compression is only at the disk level (index pages are uncompressed in server memory). To be compressed, an index has to use a page size greater than 4 KB (for existing indexes, this can be accomplished via an ALTER INDEX followed by a REBUILD - and go down further in the list to see a REBUILD INDEX enhancement). DB2 then takes that 8 KB, 16 KB, or 32 KB index leaf page (only leaf-level pages are compressed, but the vast majority of an index pages are on this level) and compresses the contents onto a 4 KB page on disk. You might be tempted to think that a 32 KB index page size if best for compression purposes, but you have to keep in mind that DB2 will stop putting entries in a leaf page in memory once it has determined that no more will fit onto the compressed 4 KB version of the page on disk; thus, the aim is to strike a balance between maximizing space savings on disk and minimizing wasted space on index pages in memory. Fortunately, the DSN1COMP utility provided with DB2 9 will give you information that will help you to choose the optimum page size for an index that you want to compress.

Adaptive index page splitting: As previously mentioned (see the "larger index page sizes" item in this list), when an index leaf page must be split in order to accommodate a new entry, DB2 for z/OS will - before DB2 9 - move half of the page's entries to an empty page. That was OK unless entries were inserted in a sequential fashion within ranges. For example, suppose that an index is defined on a column that contains ascending values within the ranges of A001 to A999, B001 to B999, C001 to C999, and so forth. If a leaf page with the highest Annn value - say, A227 - is full and must be split to accommodate a new Annn entry (e.g., A228), half the entries in that page will be moved to a new page. Trouble is, the resultant 50% free space on one of those two pages (the one that does not contain the new highest value in the Annn range) will not be reused because nothing lower than A228 (using my example) will be added to the index (more precisely, that space won't be reused until the index is reorganized). DB2 9 improves on that situation by tracking value-insert pattern for an index. If it detects a sequential-within-range pattern (versus continuously-ascending overall, such as a timestamp or sequence number, in which case no splits will occur because new entries will always be at the "end" of the index), it will change the split process so that fewer than 50% of the split page's entries will be moved to the new page (or, if the insert pattern is descending within ranges, more than 50% of the split page's entries will be moved to the new page). The result: fewer page splits, leading to reduced CPU and elapsed time for application processes.

Online index rebuild: What I'm specifically referring to here is the introduction in DB2 9 for z/OS of a SHRLEVEL CHANGE option for the REBUILD INDEX utility. Formerly, an index rebuild operation would cause the underlying table to be in a read-only state for the duration of the rebuild process. Now, a table can be updated while a REBUILD INDEX operation is underway - DB2 deals with these data-changing operations by using the associated log records to apply the corresponding changes in an iterative fashion as needed to the index being rebuilt (during a final "catch-up" phase of this log apply processing, write activity against the underlying table is drained, as is the case for an online REORG running with SHRLEVEL CHANGE). This utility enhancement is good news for organizations (and there are many) at which new indexes on existing tables are commonly created with the DEFER YES option with a follow-on execution of REBUILD INDEX to physically build the index, and it means better data accessibility when REBUILD INDEX is run for an index in rebuild-pending status. Note, however, that if REBUILD INDEX is run with SHRLEVEL CHANGE for a unique index, inserts and updates (if the latter target a column of the unique index key) will not be allowed for the underlying table, because uniqueness cannot be enforced while the index is being rebuilt.

RENAME INDEX: Online schema evolution - the process by which DB2, in succeeding releases, allows more and more database object change operations to be performed without the need for a drop and re-create of the target object - marches on. In DB2 9 for z/OS, the functionality of the RENAME statement has been extended to include indexes. Note that renaming an index will not cause invalidation of packages (or of DBRMs bound directly into plans), because static SQL statements reference indexes by their object identifier (aka OBID), not by name. Prepared dynamic SQL statements in the dynamic statement cache, on the other hand, reference indexes by name, so those that use a renamed index will be invalidated (they'll of course be re-prepared and re-cached at the next execution following invalidation).

Greater leveraging of index lookaside: Index lookaside, a feature that allows DB2 to repeatedly access entries on a leaf page (and on the associated parent non-leaf page) without having to do a full index probe (root-to-leaf level transit of the index tree) each time, was introduced way back in Version 2 Release 3. It greatly reduced GETPAGEs (and thus, CPU time) for many application processes that used a file of search values sorted according to an indexed table column to retrieve DB2 data. In DB2 for z/OS Version 8, the use of index lookaside was finally extended to data-changing processes, but only for INSERT, and only for the clustering index on a table. With Version 9, DB2 can use index lookaside for INSERT operations with indexes other than the clustering index (assuming that these indexes have an ascending - or, I believe, descending - key sequence), and can also use index lookaside for DELETE operations. IBM performance guru Akira Shibamiya noted in a presentation given at last year's IDUG North American Conference that a test involving heavy insert into a table with three ascending-key indexes showed a reduction in average GETPAGEs per INSERT to 2 in a DB2 9 environment versus 12 in a DB2 for z/OS Version 8 system.

Closing the DPSI performance gap: Data-partitioned secondary indexes (DPSIs), introduced with DB2 for z/OS V8, are indexes over range-partitioned tables (referring to table-controlled versus index-controlled partitioned tablespaces) that are themselves partitioned in accordance with the partitioning scheme of the underlying table. DPSIs are nice for improving performance and availability with respect to some partition-level utilities and for FIRST TO LAST partition-rotation operations, but restrictions on their use for SQL statement access path purposes meant that DPSIs had a "performance gap" versus non-partitioned indexes. In the DB2 9 environment, this gap is made considerably smaller, thanks to these enhancements:

Enhanced page-range screening: Page-range screening refers to DB2's ability to avoid accessing table or index partitions in the course of executing an SQL statement, when it determines based on one or more predicates that qualifying rows or index entries cannot possibly be located within said partitions. Page-range screening can have a VERY beneficial impact on query performance, and in the Version 9 environment DB2 can apply page-range screening more broadly to DPSIs.
More parallelism: There are more situations in a Version 9 system in which DB2 can parallelize data retrieval for a statement that uses a DPSI for data access.
A DPSI can provide index-only access for a SELECT statement with an ORDER BY clause.
A DPSI can be defined as UNIQUE in a DB2 9 environment, if the DPSI key columns are a super-set of the table's partitioning columns.

And a few more: Just to wrap up with a few quickies:

DB2 9 can use a non-unique index to avoid a sort for a SELECT DISTINCT statement.
The calculation of CLUSTERRATIO by the DB2 9 RUNSTATS utility provides the optimizer with a more accurate indication of a table's "clusteredness" with respect to an index, particularly if the indexed key has a significant number of duplicate values. This can enable the optimizer to make better decisions regarding the use of such indexes (if desired, the old CLUSTERRATIO calculation can be retained through the ZPARM parameter STATCLUS).
The LASTUSED column in the real-time statistics table SYSINDEXSPACESTATS (part of the DB2 9 catalog) shows the last time that an index was used for data access (e.g., for SELECT, FETCH, searched UPDATE, or searched DELETE) or to enforce a referential integrity constraint. This should be VERY helpful when it comes to identifying indexes that are no longer used and which therefore would be candidates for dropping in order to reduce disk-space consumption and CPU costs for inserts and deletes and utilities.

That's a lot of good index stuff. As I mentioned up top, tune in next week for a look at some cool index-related enhancements delivered in DB2 9.7 for Linux, UNIX, and Windows.

Wednesday, May 13, 2009

In Denver: Mainframes, DB2, COBOL, and SOA

I'm in Denver, Colorado this week for the 2009 North American Conference of the International DB2 Users Group (IDUG). Yesterday, I moderated a Special Interest Group session (also known as a SIG - basically, a "birds of a feather" discussion group) on the topic of "mainframes, DB2, COBOL, and SOA." The conversation was interesting and lively, and I'll summarize it for you by way of this post, with key discussion threads highlighted.

Is Service-Oriented Architecture still relevant? Absolutely. Sure, the buzz around SOA has diminished lately, thanks to more recent arrivals on the "You've GOT to get into this!" scene (see: cloud computing), but that's probably a good thing, as SOA was being hyped to the point that unrealistic expectations were leading to disappointing results (see my recent blog entry on this subject). With SOA frenzy in abatement, organizations can get about the work of implementing SOA initiatives that have been well researched, properly scoped, and properly provisioned (referring to having the right tools on hand). One of the participants in our SIG session mentioned that SOA is a very high priority at his organization, a large department of the United States federal government (at which point two other SIG participants, Susan Lawson and Dan Luksetich of consulting firm YLA, spoke up about the SOA work that they've been doing at another large U.S. government department). And SOA is not just a big deal in public sector circles - we had people in our SIG group from a wide variety of industries.

What does COBOL have to do with it? The SIG was titled as it was because many people have this idea that SOA requires the use of "modern" programming languages such as Java, C#, Ruby, and Python. That, of course, is totally untrue. COBOL programs can be a very important part of an SOA-compliant application. One thing that's happening in a lot of mainframe DB2 shops is the implementation of data access logic in the form of COBOL stored procedures that are called by business-tier programs running in off-mainframe app servers such as WebSphere. As we discussed this particular subject, one participant noted that his company has hardly any COBOL programmers. No problem. DB2 for z/OS stored procedures, which are very well suited to the data tier of an SOA, can be written in a variety of languages, including SQL (I'm particularly bullish on DB2 for z/OS V9 native SQL procedures, about which I blogged a few weeks ago). Organizations are also exposing CICS-DB2 and IMS-DB2 transaction programs, written in COBOL, as Web services.

Don't get too caught up in the technology behind an SOA. One of our SIG participants made the very important point that a successful SOA project has more to do with process and governance (and, some would say, with cultural change) than with technology. There are all kinds of options with regard to tools, languages, platforms, and protocols, but getting SOA right depends largely on changing the way an IT organization works: more discipline, more standards, better business-IT alignment. Because change makes a lot of people uneasy, executive-level support is usually critical to the achievement of a positive SOA outcome.

Give plenty of thought to service granularity. Susan and Dan of YLA talked of a client company that ran into major performance problems with an SOA-oriented application, with the key factor being an inordinately high number of calls to the back-end DB2 database. Sometimes, a situation of that type can result when the services provided by application programs are too fine-grained. More coarsely-grained services can allow for greater back-end efficiencies, but they can also reduce flexibility when it comes to reusing blocks of code to build new services. There's no one-size-fits-all solution when it comes to determining the granularity of services that an SOA-compliant application should provide, but it's probably a good idea to avoid the extremes at either end of the spectrum. An application architect friend of mine liked to put it this way: "What do people [meaning the folks who write service-consuming programs] want? Do they want water, or do they want to be able to get an atom of oxygen and a couple of atoms of hydrogen?" The right answer is the one that makes sense in your environment.

Can you do SOA and still have good end-user response time? With its emphasis on abstraction and loose coupling of application system components (the better to achieve agility with respect to extending application capabilities, and flexibility in terms of mixing and matching computing platforms at various application tier levels), SOA tends to increase an application's instructional path length (meaning, the app will consume more CPU cycles than one architected along more monolithic lines). Thus, going the SOA route could lead to elongated end-user response times. This performance hit can be mitigated through several means, one being the use of message queuing software (such as IBM's WebSphere MQ, formerly known as MQSeries) to de-couple back-end database processing from the front-end response to the user (in other words, make back-end processing asynchronous from the perspective of the end user). Another SOA performance-boosting technique involves the use of cache servers to speed response for certain data-retrieval requests (you can read more about the use of message queues and cache servers to enhance SOA application performance in an article I wrote on the topic for IBM Database Magazine).

Dan Luksetich talked up another option for improving the performance of an SOA application: drive multitasking. If the back-end processing associated with a transaction involves the execution of, say, three discrete tasks, see if you can kick off three processes that can do the required work in parallel. This is where enterprise service bus (ESB) and workflow orchestration software (sometimes referred to as a "process engine") can really come in handy (read more about this in my IBM Database Magazine column titled "Get on the Enterprise Service Bus").

SOA can impact database design as well as application design. Often, an SOA project will result in a DB2 database design that is more highly normalized versus a database designed for an application with a monolithic architecture. This has to do with the goal of loose coupling (i.e., dependency reduction) that is a key aspect of an SOA. What you want is a database design that is driven by the nature of the data in the database, as opposed to a design that is aimed at optimizing the performance of a particular application (the latter approach sounds good until you start thinking about other applications that could be built on the same database foundation - it can be to an organization's advantage to trade some database processing efficiency for improved flexibility).

SOA can be an impetus for database consolidation on a mainframe server. As previously mentioned, an important aspect of SOA is abstraction of one application system tier (e.g., the data layer) from another (such as the business layer). Another key characteristic of an SOA is standardization with respect to interactions between programs running in different tiers of the application system. Once this abstraction and standardization has been achieved, the platform on which data and data access logic resides should not be a concern to a business-logic programmer. The data server of choice should be the one that can deliver the scalability, availability, and security needed by the organization, and a mainframe (or parallel sysplex mainframe cluster) running DB2 for z/OS is not going to be beat on that score. Indeed, several of the SIG participants spoke of the momentum behind consolidation of databases from distributed systems servers to mainframes that is due in part to the progress of SOA implementation efforts.

That's pretty cool: the mainframe, referred to by some as a "legacy" (read: old-fashioned) server platform, is shining anew as a primo foundation for leading-edge enterprise applications designed in accordance with SOA principles. A very satisfying SIG, indeed.

Friday, May 8, 2009

DB2 9 for z/OS: Rx for Database Design Headaches

I spent most of this past week teaching a DB2 9 for z/OS Transition class. In such a situation, it's fun to get other people's take on the new features delivered with this latest release of DB2 on the mainframe platform. One of the students had an interesting comment regarding the ability to create an index on a key expression (that being an expression that reference's at least one of a table's columns and which returns a scalar value): "This is going to help me deal with some application performance problems that are database design-related." He went on to tell a tale familiar to many experienced DB2 DBAs: an application was migrated to DB2 from a non-relational database management system, but in moving the data there was no attempt to redesign the database to take advantage of relational technology. Instead, the records in the legacy database files were just plopped into DB2 tables that were designed according to the record layouts of the old system. Among other things, this led to a number of situations in which predicates of application program queries had to be coded with scalar functions in order to generate the required result sets. In particular, there were plenty of predicates of the form:

WHERE SUBSTR(COL, start, length) = 'some value'

Oops. That's a non-indexable predicate in a DB2 V8 environment. What are you going to do about that? Ask end users to endure tablespace scans? Store the predicate-referenced data twice in the target table - once in its original column form and again in a column that contains the desired substring information? That second option's no fun: more disk space consumption, and required modification of inserting programs and programs that update the column in question (or creation of triggers to maintain the "substring" column, knowing that the triggered actions will increase overhead for SQL statements that cause a trigger to fire).

Enter DB2 9. When you're running this DB2 release in New Function Mode, you can create an index on an expression. That index might look like this:

CREATE INDEX SUBSTRIX
ON TABLE_ABC
(SUBSTR(COLn, 2, 5))
USING STOGROUP XYZ
PRIQTY 64
SECQTY 64
BUFFERPOOL BP1;

And guess what? When a query comes along with a predicate of the form:

WHERE SUBSTR(COLn, 2, 5) = 'some value'

That predicate is now stage 1 and indexable. Now, there's no free lunch. Indexes defined on expressions will cause overhead to increase somewhat for insert operations and for updates that would change an expression-generated key value. Some utilities, such as LOAD and REBUILD INDEX, will also end up consuming a little more CPU time. Still, with the potential for orders-of-magnitude performance improvement for some queries containing predicates that had previously been non-indexable, the index-on-expression feature of DB2 9 for z/OS is, I think, going to end up being a big draw for many shops.

Another DB2 9 feature that can help ease the pain of database design-related problems is the new "instead-of" trigger category (added to the existing UPDATE, INSERT and DELETE trigger types). You may have a situation in which a view has been created to make life easier for programmers who have to code SELECTs for certain data retrieval operations. Trouble is, that view might be read-only - it might, for example, be based on a join of two or more tables. What then? Do you tell the programmers that they should target the view for reads, and the underlying tables for data-change operations? That would certainly fly in the face of the "make life easier" rationale behind the creation of the view. Should you dispense with the view and denormalize the database design to provide a single-table SELECT-result that matches what one can get from the view? Sure, if you want to increase disk space consumption, change update/insert/delete SQL statements accordingly, and decrease flexibility with respect to the design of future applications that might need to access the database.

A better solution: go to DB2 9 (if you're not already there), get to New Function Mode, and define INSTEAD OF triggers that will enable programmers to both read from, and change data in, the view that had formerly been read-only.

Here's an illustrative example of what I'm talking about: suppose you have an EMPLOYEE table and a DEPARTMENT table. They both have a DEPTNO column, but only the DEPARTMENT table contains department names (as should be the case for a third-normal-form database design). If you want to make it really easy for programmers to retrieve department names along with department numbers for employees, you can create a view based on a join of the EMPLOYEE and DEPARTMENT tables; however, the resulting view would be read-only absent an INSTEAD OF UPDATE trigger on the view.

Here's what I mean:

CREATE VIEW EMP_DEPT
AS SELECT E.EMPNO, E.FIRSTNME, E.LASTNAME, E.DEPTNO, D.DEPTNAME
FROM EMPLOYEE E, DEPTARTMENT D
WHERE E.DEPTNO = D.DEPTNO;

If you subsequently issue the following SQL statement to change the first name of employee 000100 to "CHUCK" (perhaps from "CHARLES")

UPDATE EMP_DEPT
SET FIRSTNME = 'CHUCK'
WHERE EMPNO = '123789';

You'll get a -151 SQL error code because the view EMP_DEPT is read-only (thanks to the fact that the SELECT defining the view has more than one table in the FROM-list.

You can take care of this problem with an INSTEAD OF UPDATE trigger like this one:

CREATE TRIGGER OK_UPDTE INSTEAD OF UPDATE
ON EMP_DEPT
REFERENCING NEW AS N OLD AS O
FOR EACH ROW MODE DB2SQL
UPDATE EMPLOYEE E
SET E.EMPNO = N.EMPNO,
E.FIRSTNME = N.FIRSTNME,
E.LASTNAME = N.LASTNAME,
E.DEPTNO = N.DEPTNO
WHERE E.EMPNO = O.EMPNO;

With this trigger in place, the update statement above that previously got the -151 SQL code will execute successfully. Ta-da! Now the programmers won't have to reference different objects in their SELECT and UPDATE SQL statements - they'll just target the EMP_DEPT view in either case. Problem solved, with no need to change the underlying database design.

So, if you find yourself dealing with predicates that are non-indexable because the database design necessitates the coding of column scalar functions in the predicates, you have a very attractive remedy in the form of DB2 9 indexes on expressions. Similarly, if you want to put views on top of a database design to make some data-retrieval operations easier, and you don't want to have to direct programmers to the underlying tables (versus the views) for data-change operations, DB2 9 INSTEAD OF triggers may be just what the doctor ordered. Look for opportunities to put these features to work in your shop.

Previous Posts

Archives