Catterall Consulting: Much Ado About DB2 Indexes (Part 2)

Last week, I posted an entry in which I described the numerous index-related enhancements delivered via DB2 9 for z/OS. In this related "part 2" entry, I'll cover new features of DB2 9.7 for LUW (Linux, UNIX, and Windows) - announced a few weeks ago and available next month - that pertain to indexes. My thanks go out to Matt Huras, Mike Winer, Chris Eaton, and Matthias Nicola of IBM's DB2 development organization, who recently delivered presentations on DB2 9.7 features that provided very useful information to me and others in the DB2 community.

First up: index compression. DB2 9.7 for LUW index compression differs from the index compression feature of DB2 9 for z/OS in several ways. For one thing, leaf pages of DB2 9.7 compressed indexes are compressed in memory as well as on disk, whereas DB2 9 for z/OS compression is disk-level only. Secondly, while DB2 9 for z/OS compression is based on squeezing the contents of an 8K, 16K, or 32K index leaf page in memory onto a 4K page on disk, DB2 9.7 index compression is accomplished via three algorithms that are chosen automatically by the DBMS (these algorithms, which can be used in combination, are briefly described below). Additionally, DB2 9.7 index compression is activated automatically when row compression is activated for a table on which indexes are defined (DB2 9.7 index compression can also be activated independent of row compression by way of an ALTER INDEX or CREATE INDEX statement with the new COMPRESS YES option, with a REORG required after ALTER INDEX).

I expect that index compression will prove to be very popular among users of DB2 9.7, especially in large-database environments, as it offers substantial disk space savings (likely to be in the range of 35% to 55%), better buffer pool hit ratios (with correspondingly reduced I/O activity), fewer page requests (because index leaf pages will hold more key values, and index levels may be reduced), and fewer index page splits. There will be some CPU overhead cost associated with index compression, but this should be offset to some degree by the aforementioned reductions in I/O activity and page access requests.

Now, a little about the algorithms by which DB2 9.7 index compression is achieved (again, these are selected - and combined, if appropriate - automatically by DB2):

RID list compression. In an index leaf page, each entry contains a key value and list of RID (row ID) values, the latter indicating the location of rows containing the key value. A RID value will take up 4, 6, or 8 bytes of space, depending on the base table's tablespace type (e.g., LARGE or REGULAR, partitioned or non-partitioned). For a LARGE non-partitioned tablespace, for example, a RID will occupy 6 bytes of space: 4 bytes for a page number and 2 for a slot number within the page. If an index on a table in that LARGE tablespace is compressed, and if, say, 10 rows within a given page contain a certain key value, the full RID value only has to be stored once in the RID list for those 10 rows. For the other 9 rows containing the key value, only the delta values between one row's RID value and the next have to be stored (RID values are always stored in ascending sequence). Because that delta value can be stored in as little as one byte of space, substantial savings can be achieved. RID list compression delivers maximum benefit for indexes that have relatively low cardinality (i.e., lots of duplicate key values and, therefore, relatively long RID lists) and a relatively high cluster ratio (making it likely that multiple rows with duplicate key values will be found on a given page).
Prefix compression. Key values are stored in an index leaf page in ascending sequence. Sometimes, adjacent key values will be very similar (consider, for example, timestamp values that have year, month, day, and hour values in common; or a multi-column key for which leading columns have low cardinality). In such cases, DB2 9.7 can store the full key with the common prefix once in a page, with subsequent entries containing only the differentiated values that follow the common prefix.
Slot directory compression. A certain amount of the space in an index leaf page is occupied by something called a slot directory. It used to be that the size of the slot directory - determined based on the maximum number of index entries that could be stored on the page - was fixed. For a compressed index, the size of the slot directory is variable and can be reduced based on factors such as common prefix entries, variable length key parts, and duplicate key values.

Note that a new DB2 9.7 table function, ADMIN_GET_INDEX_COMPRESS_INFO(), can be used to obtain an estimate of the space savings that would result from activating compression for a given non-compressed index. This same function can be used to get the actual space savings for an index after it has been compressed.

Note also that compression can't be used for all indexes in a DB2 9.7 environment. Compression is not available for indexes on catalog tables, block indexes (these enable multi-dimensional clustering), XML path and meta indexes, and index specifications.

Next up, partitioned indexes. These are indexes, defined on a range-partitioned table, that are themselves partitioned along the lines of the underlying table's partitions. In comparison with global (i.e., non-partitioned) indexes on range-partitioned tables, partitioned indexes will allow for more efficient partition roll-in and roll-out operations (i.e., ATTACH and DETACH of partitions), as they eliminate the global index maintenance (and associated logging) that would otherwise be required. Partitioned indexes will also enable users to run REORG at the partition level.

In a DB2 9.7 system, all indexes on range-partitioned tables will be created, by default, as partitioned indexes as long as this is possible. It is not possible for a unique index when the index key is not a superset of the underlying table's partitioning key.

And, last but not least, DB2 9.7 delivers online creation of XML indexes and online REORG of same. In both cases, use of the ALLOW WRITE ACCESS option will enable the XML index CREATE or REORG operation to proceed without blocking writers.

So, for both the mainframe and Linux/UNIX/Windows platforms, IBM DB2 development keeps delivering good news on the index front. I expect more of the same in the future.

Previous Posts

Thursday, May 28, 2009

Much Ado About DB2 Indexes (Part 2)

0 Comments:

Post a Comment