Catterall Consulting: April 2009

Robert's Blog

Tuesday, April 21, 2009

DB2 for z/OS Prefetch, Part 3 (DB2 9 Enhancements)

Prefetch (the basics of which I described in part 1 of this 3-part blog entry) has been a feature of DB2 for z/OS for as long as I can remember (I first got my hands on the DBMS around 1987, when Version 1 Release 2 was current). Over the years, prefetch has been enhanced in various ways through succeeding releases of DB2, and that process has continued with Version 9. In this post, I will cover three prefetch-related DB2 9 enhancements: an increase in the prefetch quantity, a new formula for calculating the cluster ratio of an index relative to the underlying table (along with a new and related catalog statistic), and a change in the DB2 optimizer's "thinking" with respect to dynamic prefetch versus sequential prefetch (dynamic and sequential prefetch - along with list prefetch - are described in the aforementioned part 1 of this 3-part entry on prefetch).

OK, prefetch quantity: for years, we've thought of a 32-page prefetch quantity for SQL access to tablespaces and indexes defined with a 4 KB page size, and a 64-page quantity for utility access to such objects. With DB2 9, the prefetch quantity for SQL access goes to 64 pages for sequential prefetch and LOB-related list prefetch (it's still 32 pages for dynamic prefetch and non-LOB list prefetch) for tablespaces and indexes assigned to 4 KB buffer pools for which VPSIZE (the number of buffers allocated to the pool) times VPSEQT (the sequential steal threshold, which defaults to 80% and is used to limit the number of buffers that can be occupied by pages read from disk in a sequential pattern) is greater than or equal to 40,000. For utility access to objects assigned to 4 KB buffer pools, the prefetch quantity in DB2 9 environment is 128 pages when VPSIZE times VPSEQT is greater than or equal to 80,000. The increase in the prefetch quantity should improve response times for application-related tablespace scans and for many utility operations. You can find information about the V9 prefetch quantities for SQL and utility access, for all buffer pool page sizes (4 KB, 8 KB, 16 KB, and 32 KB), in Table 117 in the DB2 V9 for z/OS Performance Monitoring and Tuning Guide (you can find this and other DB2 9 manuals in the DB2 9 "bookshelf" on IBM's Web site).

Next, the new cluster ratio formula. You probably know that CLUSTERRATIO, a column in the SYSINDEXES catalog table, indicates, in the form of a percentage, the degree to which the physical order of a table's rows align with the order of the index's key values (the CLUSTERRATIOF column in SYSINDEXES expresses the same value as a floating-point number between 0 and 1). CLUSTERRATIO influences the DB2 optimizer's access path selection process, particularly as it pertains to prefetch access. With DB2 9, the formula for calculating the cluster ratio of an index has been enhanced to improve the accuracy and usefulness of the calculated value. The principal sources of improved calculation are:

The formula now considers all of the RIDs (row IDs) in an index, as opposed to considering only succeeding key values. The old formula made the cluster ratio for indexes with lots of duplicate key values (e.g., an index on department number on a large employee table) smaller than it should have (keep in mind that RIDs for duplicate key values are stored in an index in ascending RID sequence - in other words, in physical sequential order).
The old formula counted a table row as being clustered if the next key value resided on the same page or a forward page with respect to the last RID of the current key value. Problem is, "forward" could mean 1000 pages forward, and that's really not useful from a prefetch perspective. With the new formula, a "forward" page, in terms of satisfying the "clustered" test, has to be within a sliding window that is based on the prefetch quantity and the size of the buffer pool to which the underlying table is assigned.
The new formula considers the "clustered-ness" of an index for both forward and backward scans, whereas the old formula considered only forward sequential access.

So, in going from V8 to V9 (and after running the RUNSTATS utility for objects in the DB2 9 environment, to update CLUSTERRATIO values), you might see some custer ratio figures increase (perhaps dramatically, for some indexes with lost of duplicate key values) and some decrease (for tables that were clustered relative to an index only if you considered large "skip-ahead" page distances from one key value to the next as being OK, as was the case with the old formula). The new cluster ratio values (which will require package rebinds if you want them to influence static SQL access paths) should improve performance in DB2 9 systems; however, on the off chance that the new values have a negative impact on performance, you can direct DB2 to go back to the old formula by setting the value of the ZPARM parameter STATCLUS to STANDARD.

Also, note that DB2 9 introduces a new catalog statistic, DATAREPEATFACTORF (in SYSINDEXES and related catalog tables), that gives the optimizer a much better feel for the "density" of an index's ascending key values with respect to the sequential arrangement of rows in the pages of the underlying table. In other words, are rows in a table sequential and "dense" with respect to the index, or are they sequential and not dense? This information helps the optimizer to further refine access path selection decision-making, particularly with respect to prefetch.

Finally, a bit on the optimizer's prefetch preference in a DB2 9 environment. What you are likely to see if you migrate to V9 from V8 is an increase in dynamic prefetch activity and a decrease in sequential prefetch activity - in fact, you may see sequential prefetch used only for tablespace scans. The leaning towards dynamic prefetch makes sense: it's "smarter" (it can be temporarily "turned off" during a scan, if the page access pattern becomes non-sequential in terms of overly large "skip ahead" or "skip behind" distances from page to page, and it doesn't require access to particular "trigger pages" to keep prefetch going) and it can work for backward as well as forward scans (sequential prefetch is forward-only). That said, you may think of the dynamic prefetch preference as being an example of the optimizer "punting" with respect to prefetch decisions (i.e., "I'm just going to let that be a run-time decision versus a bind-time decision"). You'd be wrong in so thinking. What you should expect in a DB2 9 system is an increased incidence of the value 'D' (for dynamic prefetch) in the PREFETCH column in EXPLAIN tables. That means that the optimizer expects that dynamic prefetch will be utilized in accessing data or index pages for a query, and will optimize the statement accordingly.

That leads me to the bottom line: DB2 prefetch, an oldie-but-goodie product feature, is better than ever in the DB2 9 environment. DB2 9 will use prefetch more effectively and efficiently, and the result should be improved application (and utility) performance.

Thursday, April 16, 2009

DB2 for z/OS Prefetch, Part 2 (Monitoring and Tuning)

A few days ago, I posted part 1 of a 3-part entry on prefetch in a DB2 for z/OS environment. That post covered the basics: the different types of prefetch and how they work. In this part 2 entry, I'll cover monitoring and tuning in relation to prefetch. I'll provide information in the form of questions and answers, as I've dealt with a number of such queries over the years. At the beginning of next week, I'll post part 3, which will describe some prefetch-related changes introduced with DB2 for z/OS Version 9. On, now, to the Q's and A's.

Q1: Where does the CPU time associated with prefetch activity get charged?

A1: Allied address spaces (those through which SQL statements get to DB2, examples being a CICS application-owning region, a batch address space, and - for statements sent from remote clients via the DRDA protocol - the DB2 Distributed Data Facility address space) get charged for the CPU time associated with synchronous reads (i.e., single-page, on-demand read I/Os), but for prefetch reads, which are asynchronous, the associated CPU time is charged to the DB2 Database Services Address space (also known as DBM1). In fact, in my experience I've seen that the bulk of DBM1 CPU time is related to prefetch reads and to database write I/Os. When DB2-using organizations take steps that lead to reduced prefetch I/O activity (such as making buffer pools larger - a prefetch request for 32 data or index pages does not lead to an I/O operation if all of the 32 pages are already in memory), they tend to see decreases in DBM1 CPU consumption. And note that we are talking about consumption of general-purpose CPU cycles here. It's true that mainframes have specialized processors that assist with I/O operations, but these reduce - as opposed to eliminate - the general-purpose cycles related to I/Os.

Q2: Where can one get information about prefetch activity?

A2: I tend to rely on two sources: a DB2 monitor product (these are available from IBM and from several 3rd-party software vendors) and the DB2 command -DISPLAY BUFFERPOOL DETAIL. With respect to the DB2 monitor product, I like to look over statistics and accounting detail reports (with "detail" referred to as "long" for some monitors) - the former provide a look at overall activity for a DB2 subsystem, and the latter let you restrict the view to activity related to (for example) a given connection type (e.g., CICS or DRDA), a given connection ID (e.g., a CICS region), or a given so-called correlation ID (this could be a batch job name or a CICS transaction ID). In either case, look at the buffer pool activity section of the report (if this activity is broken out by type, look at the read activity, and note that you'll see, certainly in a statistics report, information broken out by buffer pool). With regard to the CPU cost of prefetch and it's impact on DBM1 address space CPU consumption (as mentioned in the answer to question 1 above), check the DB2 address space CPU times in the statistics report. A statistics or accounting report can be set up to capture activity occurring in a given time period. I usually want to see data for a one- to two-hour period of "busy time" on a system, as opposed to a 24-hour period.

While I like the reports generated by DB2 monitor products, you can also get useful real-time information using the online monitor interface. Just look for the information related to buffer pools for overall numbers, and for "thread detail" information if you want to look at I/O activity for a particular application process.

The -DISPLAY BUFFERPOOL DETAIL command is a handy way to quickly generate very useful buffer pool-related information, including prefetch numbers. You can issue the command so as to get information for a single buffer pool, a list of buffer pools, all active buffer pools, or all pools (active and inactive). Because I like to look at an hour's worth of data, I usually issue the command with the INTERVAL option, wait an hour, and issue it again (also with INTERVAL specified); otherwise, you'll get information for the period of time since the pool (or pools) was last activated, and that could be days or weeks ago.

Whether using monitor reports (or screens) or -DISPLAY BUFFERPOOL DETAIL command output, keep in mind the important distinction between prefetch requests and prefetch I/Os. As previously noted, a prefetch request results in an I/O operation only if one or more of the pages requested are not already in memory. Generally speaking, your focus should be on prefetch I/Os.

Q3: Should one pay attention to prefetch I/Os when it comes to buffer pool sizing?

A3: YES! Often, people engaged in buffer pool analysis zero in on synchronous read activity. Synchronous reads are indeed important, and it's good to reduce that activity if you want to improve the performance of DB2-accessing applications, but prefetch I/Os matter, too, for two important reasons: first (reiterating yet again), they consume CPU cycles. Second, although they are asynchronous and anticipatory (as pointed out in my part 1 prefetch entry) and ideally aimed at getting pages into memory before they are requested by an application process, programs can - and quite often do - get suspended during SQL execution while awaiting the end of an in-progress prefetch read I/O. Here's why: suppose that program ABC has issued a SELECT statement, and suppose that DB2 has to examine page P1 from table T1 in the course of executing the SELECT. If, at the time of DB2's request for page P1 on behalf of program ABC, P1 is scheduled to be brought into memory by way of a prefetch I/O (and that I/O might have been started on behalf of another program), program ABC will wait on that request to complete. This wait time shows up in a DB2 monitor accounting report (or an online monitor display of thread detail data) as "wait for other read I/O", one of the so-called class 3 suspension times (this because the data reported comes from DB2 accounting trace class 3 records).

So, when you're looking for a buffer pool that could potentially be enlarged to good effect, performance-wise, look at the TOTAL rate of read I/Os per second for the various pools (using DB2 monitor or -DISPLAY DATABASE DETAIL data, as described in the answer to question 2 above). That would be all synchronous read I/Os (both random and sequential) PLUS all prefetch I/Os (that's prefetch I/Os, not prefetch requests, for sequential prefetch and list prefetch and dynamic prefetch). If you make a buffer pool larger and you want to see if you've done good, look to see if the TOTAL rate of read I/Os has gone down. If you've caused prefetch read I/O activity to go down a good bit, you might well see a reduction in "wait for other read" time in your DB2 monitor accounting reports, and in DBM1 CPU time in your statistics reports.

Q4: Can prefetch activity ever have a negative impact on performance?

A4: Occasionally, yes. Usually, prefetch is very good for performance, substantially reducing run times for DB2-accessing programs. Sometimes, it can be a drag on performance. I can think of three such scenarios. Scenario 1: a program runs longer than it should, due to dynamic prefetch activity. I explained in my part 1 prefetch post that dynamic prefetch gets turned on as a result of something called sequential detection. That generally means that most pages accessed by an SQL-issuing program are sequential with respect to each successive page access (and that means that the nth page in a table or index accessed by a program is usually within half of a prefetch quantity - typically 16 pages, when the prefetch quantity is 32 pages - of the page previously accessed by the program). Suppose that the typical skip-ahead (or behind, in case of a reverse index scan in a DB2 V8 or V9 environment) quantity for a program's data or index access is 16 pages. That's just within the sequential detection threshold, so prefetch gets turned on; however, for each 32-page chunk of table or index data brought into memory, only two or the pages will be requested by the program. It might be better for the program to just request the needed pages individually. If you have this situation (look at an DB2 monitor accounting report for the program, and check to see if there is a low ratio of GETPAGEs to dynamic prefetch requests), you might consider increasing the program's commit frequency, as a commit will "wipe the slate clean" with respect to sequential detection page-access tracking if the program was bound with the RELEASE(COMMIT) option (versus the RELEASE(DEALLOCATE) option).

Scenario 2: a program runs longer than it should because of sequential prefetch. As noted in my part 1 prefetch post, sequential prefetch brings a pretty big chunk of pages into memory right off the bat when an SQL statement with a sequential prefetch-using access path begins execution. That's usually good, but not so much if you really only want, say, 10 rows out of a 10,000-row result set. In that case, you might try adding OPTIMIZE FOR 10 ROWS or FETCH FIRST 10 ROWS ONLY to your SELECT statement. That will tell the optimizer that you don't want to access the whole result set, and sequential prefetch will not be selected at bind time.

Scenario 3: a program runs more slowly due to "abandoned list prefetch." Again, this doesn't happen too often, but there are times when DB2 will choose list prefetch (described in my part 1 prefetch post) for a statement at bind time, and then will abandon list prefetch in favor of a tablespace scan during statement execution. Check a DB2 monitor accounting or statistics report, and look in the "RID list" section (list prefetch usually involves the sorting of row IDs - aka RIDs - obtained from one or more indexes), and look for the number of times that a RID sort failed (some reports will use "terminated" instead of "failed") due to lack of storage. If that number is non-zero, consider enlarging your DB2 RID sort pool (accomplished by changing the value of the MAXRBLK installation parameter in the DB2 ZPARM module). Another figure that you might see in the "RID list" section of a DB2 monitor report pertains to RID sort failures due to "exceeding the RDS limit" (RDS being the Relational Data System, a core component of DB2 for z/OS). Here's what that means: if DB2 is engaged in RID list processing, and it determines that more than 25% of a table's RIDs will qualify for a RID sort, it will terminate that RID list processing operation. Roger Hecq, a longtime DB2 expert who works for UBS, recently made a very good point about this type of failure in responding to a question posted to the DB2-L forum: "If [a] table has grown significantly since the last RUNSTATS and program rebind, then a REORG, RUNSTATS, and rebind will increase the 25% limit, which may help avoid the RID list failure [due to RDS limit exceeded] problem." Excellent point: make sure that DB2 knows how many rows are actually in a table by keeping DB2 catalog data up-to-date via RUNSTATs, keep objects well-organized through regular REORGs, and rebind programs after catalog stats have changed significantly for DB2 database objects on which the program depends.

OK, that's a wrap for prefetch-related monitoring and tuning. As I mentioned up front, come back at the beginning of next week for part 3 of me 3-part prefetch entry, which will cover DB2 for z/OS V9 changes that have to do with prefetch.

Monday, April 13, 2009

DB2 for z/OS Prefetch, Part 1 (The Basics)

Prefetch is another of those DB2 for z/OS topics that has been around for a long time but which has recently attracted attention anew. In this post, I'll cover some prefetch basics. I'll follow up in a few days with a post on prefetch monitoring and tuning, and a few days after that I'll post a third entry covering some prefetch-related changes introduced with DB2 for z/OS Version 9.

A lot of DB2 people understand the prefetch concept quite well. Prefetch requests have two particularly important characteristics. First, they typically involve the reading of multiple data or index pages into a buffer pool from disk with a single I/O operation. Second, they are anticipatory in nature, meaning that DB2 prefetches pages into memory on behalf of an application process (or a utility) before that process explicitly requests those pages, based on the assumption that the process will request the pages (or at least a substantial percentage of them). In this sense, prefetch reads are asynchronous with respect to an application process, and they ideally will not cause the process to suspend SQL statement execution while waiting for requested pages to be brought into memory from the disk subsystem (though waits for prefetch reads are possible, as I'll explain in part 2 of my 3-part entry on prefetch - to be posted in a few days). Elimination (or at least significant reduction) of in-DB2 application wait time for read I/Os is what prefetch is all about, and when it works as desired (and it almost always does - I'll cover a few exceptions in my "prefetch, part 2" post) it can dramatically reduce the run time of a DB2-accessing program.

There are three types of DB2 for z/OS prefetch:

Sequential - Selected at SQL statement bind time (i.e., when the statement's data access path is optimized) if the DB2 optimizer expects that 1) at least a certain number of pages in a target table or index will be accessed in the course of executing the statement (I believe that the threshold is 8 pages), and 2) those pages will be accessed in a physically sequential manner. At statement execution time, as soon as the target table or index is accessed, two (if I recall correctly) prefetch quantities of sequential pages will be read into the appropriate buffer pool (the prefetch quantity is usually 32 pages for an object defined with a 4K page size). Subsequent to these initial two prefetch read operations, additional requests for a prefetch quantity of pages will be issued each time a "trigger" page is accessed by the executing SQL statement (a trigger page is one that is a multiple of the prefetch quantity relative to the first page accessed in the table or index; thus, DB2 tries to stay at least one prefetch quantity of pages "ahead" of the application process).
List - Row IDs (RIDs) for qualifying rows (per a query's predicates) are obtained from one or more indexes, and the corresponding data pages - which need not be sequential - are read into memory via multi-page I/O operations (a RID indicates the physical location of a row in a DB2 table). Note that in most cases, DB2 will sort the RIDs obtained from the index or indexes prior to initiating the multi-page read requests.
Dynamic - DB2 determines at query run time that the pattern of page access for a target table or a utilized index is "sequential enough" to warrant activation of prefetch processing. If the page access pattern subsequently strays from "sequential enough," prefetch processing will be turned off for the query (it will be turned on again if "sequential enough" access resumes).

Of these prefetch types, dynamic is the one that is most interesting. It's activated and deactivated according to a mechanism - known as sequential detection - that works as follows: DB2 tracks pages as they are accessed in the execution of a database-accessing program (the classic example is a singleton SELECT in a do-loop - at bind time DB2 doesn't know that the statement will be executed repeatedly, and that pages in the target table or index might be accessed in a sequential fashion). When the second page in the target table or index is accessed, DB2 checks to determine whether or not it's within half of the prefetch quantity (i.e., 16 pages, if the prefetch quantity is 32 pages) forward of the first page (or backward, if we're talking about the backward index scan capability introduced with DB2 for z/OS Version 8). If it is, that second page is noted as being sequential, access-wise, relative to the first. When the third page is accessed, DB2 checks to see that it's within half a prefetch quantity forward (or backward) of the second page. If it is, the third pages is noted as being sequential with respect to the second page. When 5 out of the last 8 pages accessed are sequential in this sense, DB2 turns on prefetch. It turns prefetch off if the number of sequential pages drops below 5 of the last 8.

OK, so much for the prefetch level-set. Come back in two or three days, and I'll have posted a "part 2" entry that will answer some questions pertaining to prefetch monitoring and tuning.

Previous Posts