Robert's Blog


Thursday, March 25, 2010

A Closer Look at DB2 9 for z/OS Index Compression

In May of last year I posted a blog entry that included some information about the index compression capability introduced with DB2 9 for z/OS. It's a good time, I think, to add to that information, and I'll do that by way of this post.

How does DB2 do it? In that entry from last year, I noted that DB2 9 index compression is not dictionary-based, as is DB2 data compression (with dictionary-based compression, commonly occurring strings of data values are replaced with shorter strings, and this replacement is reversed when the data is accessed). For a tablespace defined with COMPRESS YES, DB2 will place as many compressed data rows as it can into a 4K page in memory (or an 8K or 16K or 32K page, depending on the buffer pool to which the tablespace is assigned), and the page size in memory is the same as the page size on disk. Index compression reduces space requirements on disk but not in memory: the size of a leaf page in a compressed index will be smaller on disk than in memory (only leaf pages are compressed, but the vast majority of most indexes' pages are leaf pages). Index compression is based on getting the contents of an 8K or 16K or 32K index leaf page in memory into a 4K page on disk, without using a dictionary (an index has to be assigned to an 8K or 16K or 32K buffer pool in order to be compressed). To do this, DB2 uses a combination of three compression mechanisms:
  • Prefix compression: Suppose you had a 3-column key on state, city, and telephone number. You might then have a LOT of duplicates of some combinations of column 1 and column 2 values (e.g., state = 'Texas' and city = 'Houston'). If compression is used for this index, DB2 will not repeatedly store those duplicate “prefix” values in leaf pages on disk; instead, DB2 will store a given key prefix once in a compressed leaf page on disk, and store along with that prefix the part of the whole-key key value that is different from one entry to the next (e.g., phone number = '713-111-2222', phone number = '713-222-3333', etc.). Note that while this example (for the sake of simplicity) presents a prefix that breaks along key-column lines, this is not a restriction. In other words, a prefix, in the context of prefix compression, can include just a portion of a key column value (for example, 'ROBERT' could be a prefix for the last names 'ROBERTS' and 'ROBERTSON').
  • RID list compression: If a given index key value has many duplicates, several of these duplicate values could be in rows that are located in the same page of a table. If the index is not compressed, the full RID (row ID) of each of these rows will be stored following the key value in a leaf page on disk, even though only one byte of that four- or five-byte RID (the byte that indicates the row's position in the data page) will be different from one value to the next in the RID chain (the page number, occupying 4 bytes for a partitioned tablespace with a DSSIZE of 4 GB or larger, and 3 bytes otherwise, will stay the same). If that index were to be compressed, DB2 would save space on disk by storing the multi-byte page number once in the RID chain, followed by the single-byte row location indicators, until the page number (or the key value) changes. This compression technique is particularly effective for indexes that
    have relatively low cardinality and are either clustering or have a high degree of correlation with the table's clustering key.
  • In-memory-only key map: An uncompressed index leaf page contains a key map, which itself contains a 2-byte entry for each distinct key value stored in the page. If the index is compressed, this map will not be stored on disk (it will be reconstructed, at relatively low cost, when the leaf page is read into memory). This compression technique nicely complements the RID list compression mechanism, as it is most effective for high-cardinality indexes (especially those with short keys, as the more distinct key values a page holds, the more space the key map occupies).
These compression techniques often deliver impressive results, with plenty of DB2 9 for z/OS users reporting disk savings of 50-70% after enabling compression for an index. Still, they have their limits, and when DB2 determines that a leaf page in memory already holds as much as can be compressed onto a 4K page on disk, it will stop placing entries in that page, even if that means letting a good bit of space go unused in the in-memory page. This is why you want to run the DSN1COMP utility for an index prior to compressing it. DSN1COMP will provide estimates of disk space savings on the one hand, and in-memory page space wastage on the other hand, that you could expect to see based on your choice of an 8K, 16K, or 32K page size for the to-be-compressed index. The right index page size will be the one that maximizes disk space savings while minimizing in-memory page space wastage.

Index compression overhead: it's about I/Os, not access. The differences in the way that data and indexes are compressed in a DB2 9 for z/OS environment lead to differences regarding the associated CPU overhead. First of all, data compression is hardware-assisted (it takes advantage of a microcode assist built into the System z server line) while index compression is not. Second, in the case of data compression, the overhead cost is paid when data is accessed in a buffer pool in memory, as rows are not decompressed until they are retrieved by DB2 on behalf of an application process (similarly, new or changed rows are compressed and placed in pages in memory as part of insert and update operations). For a compressed index, the overhead cost is incurred at I/O time, since pages are decompressed when read into memory and compressed when written to disk. So, once a leaf page of a compressed index is in memory, repeated accesses of that page will not involve additional overhead due to compression, whereas data compression overhead is incurred every time a row is retrieved from, or placed into, a page in memory. With respect to the I/O-related cost of compression, the situation is reversed: there is no additional overhead associated with reading a compressed data page into memory from disk, or writing such a page to disk, while for a compressed index the CPU cost of reading a leaf page from disk, or writing a changed leaf page to disk, will be higher than it would be for a non-compressed index. One take-away from this is that large buffer pools are a good match for compressed indexes, as fewer disk I/Os means lower compression overhead.

This "pay at I/O time" aspect of the CPU cost of index compression has implications for where that cost shows up. If the I/O is of the prefetch read variety, or a database write, the leaf page compression cost will be charged to the DB2 database services address space (aka DBM1). If it's a synchronous read I/O, index compression overhead will affect the class 2 CPU time of the application process for which the on-demand read is being performed. Thus, for an application that accesses index leaf pages that are read in from disk via prefetch reads (as might be the case for a batch job or a data warehouse query), the cost of index compression may appear to be close to zero because it's being paid by the DB2 database services address space.

So, what kind of overhead should you expect to see, in terms of the in-DB2 CPU cost of applications that access compressed indexes? Because of the multiple variables that come into play, mileage will vary, but my expectation is that you'd see in-DB2 CPU consumption that would be higher by some single digit of percent versus the non-compressed case. Remember: keep the I/Os down to keep that cost down.

Wednesday, March 17, 2010

My Favorite DB2 9 for z/OS BI Feature

A lot of the consulting work that I do relates to the use of DB2 for z/OS as a data server for business intelligence (BI) applications. DB2 9 for z/OS delivered a number of new features and functions that are particularly attractive in a data warehousing context, including index compression, index-on-expression, instead-of triggers, the INTERSECT and EXCEPT set operators, and the OLAP functions RANK, DENSE_RANK, and ROW_NUMBER. That's all great stuff, but my favorite BI-friendly DB2 9 feature -- and the subject of this blog entry -- is global query optimization.

I'll admit here that I didn't fully appreciate the significance of global query optimization when I first heard about it. There was something about virtual tables, and correlating and de-correlating subqueries, and moving parts of a query's result set generation process around relative to the location of query blocks within an overall query -- interesting, but kind of abstract as far as I was concerned. Time and experience have brought me around to where I am today: a big-time global query optimization advocate, ready to tell mainframe DB2 users everywhere that this is one outstanding piece of database technology. I'll give you some reasons for my enthusiasm momentarily, but before doing that I'd like to explain how global query optimization works.

Adarsh Pannu, a member of IBM's DB2 for z/OS development team, effectively summed up the essence of global query optimization when he said in a recent presentation that it's about "improved subquery processing." That really is it, in a nutshell, and here's the first of the BI angles pertaining to this story: queries that contain multiple subquery predicates are common in data warehouse environments. These subquery predicates might be in the form of non-correlated in-list subqueries, like this one:

WHERE T1.C1 IN (SELECT C2 FROM T2 WHERE…)

Alternatively, a subquery predicate might be a match-checking correlated subquery like this one:

WHERE EXISTS (SELECT 1 FROM T2 WHERE T2.C2 = T1.C1…)

Prior to Version 9, in evaluating a query containing one or more subqueries DB2 would optimize each SELECT in isolation, without regard to the effect that the access path chosen for a subquery might have on the performance of the query overall. Furthermore, DB2 would handle subquery processing based on the position of the subquery within the overall query -- in other words, if a subquery were a good ways "down" in an overall query, it wouldn't be evaluated "up front" at query execution time. Using this approach, DB2 might well choose the optimal access path for each individual SELECT in a query, but the access path for the query overall could end up being quite sub-optimal in terms of performance.

DB2 9 takes a different approach when it comes to optimizing a query containing one or more subquery predicates. For one thing, it will evaluate a subquery predicate in the context of the overall query in which the subquery appears. Additionally, DB2 9 can do some very interesting transformative work in optimizing the overall query. It might, for example, change a correlated subquery predicate to a non-correlated subquery, materialize that result set in a work file, and -- treating this materialized result set as a virtual table -- "move” it to a different part of the overall query and join it to another table referenced in the query. Needless to say, such transformations -- correlated subquery to non-correlated subquery (or vice versa), subquery to join -- are always accomplished in such a way as to preserve the result set of the query as initially coded.

An example can be very useful in showing how all this comes together in the processing of a subquery-containing query in a DB2 9 environment. Suppose you have a query like the one below, in which a correlated subquery predicate has been highlighted in green:

SELECT * FROM TABLE_1
WHERE EXISTS
(SELECT 1 FROM TABLE_2
WHERE TABLE_1.COL1 = TABLE_2.COL1
AND TABLE_2.COL2 = 1234
AND…)

The EXPLAIN output for the query, showing the working of global query optimization, might look like this (and here we see just a few of the relevant columns from a PLAN_TABLE):



Here's what the EXPLAIN output is telling you: the optimizer changed the match-checking correlated subquery highlighted in green to a non-correlated subquery (evidenced by “NCOSUB” in the QBTYPE column of the PLAN_TABLE). This result set (which you’d get if you removed the predicate in the correlated subquery that contains the correlation reference to a column in TABLE_1) is sorted to remove duplicates (this to help ensure that the overall query’s result set will not be changed due to the query transformation) and to get the result set rows ordered so as to boost the performance of a subsequent join step (see the “Y” values under SORTC_UNIQ (shortened to SCU) and SORTC_ORDERBY (SCO) of the PLAN_TABLE). Then, the materialized and sorted subquery result set, identified as “virtual table” DSNWFQB(02), with the “02” referring to the query block from which the result set was generated (the “de-correlated” subquery), is moved to the “top” of the query and nested-loop joined to TABLE_1 (see the “1” under the METHOD (shortened to M) column of the PLAN_TABLE). It all boils down to this: the correlated subquery predicate is used to match rows in TABLE_1 with rows in TABLE_2, and in this case the optimizer has determined that the required row-matching can be accomplished more efficiently with a join versus the correlated subquery. It's a big-picture -- as opposed to a piece-part -- approach to query optimization.

So, why am I big on query optimization? Reason 1 is the potentially huge improvement in performance that can be realized for a subquery-containing query optimized by DB2 Version 9 versus DB2 Version 8 (I'm talking about an order of magnitude or more improvement in elapsed and CPU time in some cases). Reason 2 is the fact that global query optimization can greatly improve query performance without requiring that a query be rewritten. That's important because BI queries are often generated by tools, and that generally means that query rewrite is not an option. It's possible that some of the complex queries in a data warehouse environment might be generated by application code written by an organization’s developers (or by people contracted to write such code), and if such is the situation you could say that query rewrite (via modification of the query-generating code) is technically possible. Even in that case, though, rewriting a query could be very difficult. In my experience, custom-coded “query-building engines” may construct a query by successively applying various business rules, and this often takes the form of adding successive subquery predicates to the overall query (these might be nested in-list non-correlated subqueries). To “just rewrite” a query built in this way is no small thing, because coding the query with more in the way of joins and less in the way of subqueries would require more of an “all at once” application of business rules versus the more-straightforward successive-application approach. Thankfully, global query optimization will often make query rewrite a non-issue.

I'll conclude with this thought: if you have a data warehouse built on DB2 for z/OS Version 8, global query optimization is one of the BEST reasons to migrate your system to DB2 9. If you're already using DB2 9, but not in a BI-related way, global query optimization is a great reason to seriously consider using DB2 9 for data warehousing.

Friday, March 5, 2010

Another Note on DB2 for z/OS Buffer Pool Page-Fixing

In the summer of 2008, I posted a blog entry on page-fixing DB2 buffer pools, a feature introduced with DB2 for z/OS Version 8. A recent discussion I had with a client about buffer pool page-fixing brought to light two aspects of this performance tuning option that, I believe, are overlooked by some DB2 users. In this post I'll describe how you can make a quick initial assessment as to whether or not the memory resource of a mainframe system is sufficient to support buffer pool page-fixing, and I'll follow that with a look at the "bonus" performance impact that can be realized by buffer pool page-fixing in a DB2 data sharing environment.

Gauging the server memory situation. As pointed out in the aforementioned 2008 blog entry on the topic, page-fixing a buffer pool can reduce CPU consumption by eliminating the requests that DB2 would otherwise have to make of z/OS to fix in memory -- and to subsequently release -- a buffer for every read of a page from, or write of a page to, the disk subsystem. These page fix/page release operations are individually inexpensive, but the cumulative CPU cost can be significant when the I/Os associated with a pool number in the hundreds (or thousands) per second. The prospect of removing that portion of a DB2 workload's CPU utilization may have you thinking, "Why not?" Well, there's a reason why PGFIX(NO) is the default setting for a DB2 buffer pool, and it has to do with utilization of a mainframe server's (or z/OS LPAR's) memory resource.

With PGFIX(NO), the real storage page frames occupied by DB2 buffers are candidates for being stolen by z/OS, should the need arise. If something has to be read into memory from disk, and there is no available page frame to accommodate that read-in, z/OS will make one available by moving its contents to a page data set on auxiliary storage (if that relocated page is subsequently referenced by a process, it will be brought back into server memory from auxiliary storage -- this is known as demand paging). z/OS steals page frames according to a least-recently-used algorithm: the longer a page frame goes without being referenced, the closer it moves to the front of the steak queue. If a DB2 buffer goes a long time without being referenced, it could be paged out to auxiliary storage.

So, page-fixing a buffer pool in memory would preclude z/OS from considering the associated real storage page frames as candidates for stealing. The important question, then, is this: would some of those pages be stolen by z/OS if they weren't fixed in memory from the get-go? If so, then page-fixing that pool's buffers might not be such a great idea: in taking away some page frames that z/OS might otherwise steal, buffer pool page fixing could cause page-steal activity to increase for other subsystems and application processes in the z/OS LPAR. Not good.

Fortunately, there's a pretty easy way to get a feel for this: using either your DB2 monitor (an online display or a statistics report) or the output of the DB2 command -DISPLAY BUFFERPOOL DETAIL, look for fields labeled "PAGE-INS REQUIRED FOR READ" and "PAGE-INS REQUIRED FOR WRITE" (or something similar to that). What these fields mean: a page-in is required for a read if DB2 wants to read a page from disk into a particular buffer, and that buffer has been paged out to auxiliary storage (i.e., the page frame occupied by the buffer was stolen by z/OS). Similarly, a page-in is required for a write if DB2 needs to write the contents of a buffer to disk and the buffer is in auxiliary storage.

If, for a pool, the PAGE-INS REQUIRED FOR READ and PAGE-INS REQUIRED FOR WRITE fields both contain zeros, it is likely that the pool, from a memory perspective, is "V=R" anyway (that is to say, the amount of real storage occupied by the pool is probably very close to, if not the same as, its size in terms of virtual storage). In that case, going with PGFIX(YES) should deliver CPU savings without increasing pressure on the server memory resource, since the page frames being stolen are probably not those that are occupied by that pool's buffers. If you want an added measure of assurance on this score, issue a -DISPLAY BUFFERPOOL DETAIL(*) command. The (*) following the DETAIL keyword tells DB2 that you want statistics for the pool since the time it was last allocated. That might have been days, or even weeks, ago (the command output will tell you this), and if you see that the "PAGE-INS REQ" fields in the read and write parts of the command output contain zeros for that long period of time, it's a REALLY good bet that the pool's occupation of real storage won't increase appreciably if you go with PGFIX(YES). For even MORE assurance that the memory resource of the z/OS LPAR in which DB2 is running is not under a lot of pressure, check the "PAGE-INS REQUIRED" numbers for the lower-activity pools (those with fewer GETPAGE requests than others). If even these show zeros, you should be in really good shape, memory-wise.

With all this said, keep a couple of things in mind. First, even though your "PAGE-INS REQUIRED" numbers may give you a high degree of confidence that going to PGFIX(YES) for a buffer pool would be a good idea, make sure to coordinate this action with your z/OS systems programmer. That person has responsibility for seeing that z/OS system resources (such as server memory) are effectively managed and utilized, and you need to make sure that the two of you are on the same page (no pun intended) regarding buffer pool page-fixing. If you've done your homework, and you let the z/OS systems programmer do his (or her) homework (such as looking at z/OS monitor-generated system paging statistics), getting to agreement should not be a problem. Second, be selective in your use of the PGFIX(YES) buffer pool option. The greater the amount of I/O activity for a pool, the greater the benefit of PGFIX(YES). I'd recommend considering page-fixing for pools for which the rate of disk I/O activity is at least in the high double digits (writes plus reads) per second (and be sure to include prefetch reads when calculating the rate of disk I/O operations for a buffer pool). By staying with PGFIX(NO) for your lower-activity pools, you ensure that DB2 will make some buffer pool-associated page frames available to z/OS for page-out, should something cause the LPAR's memory resource to come under significant pressure.

And for you data sharing users... Just a couple of weeks ago, someone told me that he was under the impression that page-fixing buffer pools would have a negative performance impact in a DB2 data sharing environment. NOT SO. Assuming (as mentioned above) that your server memory resource is sufficient to accommodate page-fixing for one or more of your buffer pools, the resulting CPU efficiency benefit should be MORE pronounced for in a data sharing group versus a standalone DB2 system. How so? Simple: the buffer pool page fix/page release activity that occurs for DB2 reads to, and writes from, the disk subsystem with PGFIX(NO) in effect also occurs for writes of pages to, and reads of pages from, coupling facility group buffer pools. Like disk I/Os, page read and write actions involving a group buffer pool can number in the thousands per second. PGFIX(YES) eliminates the overhead of page fix/page release requests for disk I/Os AND for group buffer pool page reads and writes. So, if you're running DB2 in a data sharing configuration, you have another incentive to check out the page-fix option for your high-use buffer pools.