DB2 for z/OS Prefetch, Part 3 (DB2 9 Enhancements)
Prefetch (the basics of which I described in part 1 of this 3-part blog entry) has been a feature of DB2 for z/OS for as long as I can remember (I first got my hands on the DBMS around 1987, when Version 1 Release 2 was current). Over the years, prefetch has been enhanced in various ways through succeeding releases of DB2, and that process has continued with Version 9. In this post, I will cover three prefetch-related DB2 9 enhancements: an increase in the prefetch quantity, a new formula for calculating the cluster ratio of an index relative to the underlying table (along with a new and related catalog statistic), and a change in the DB2 optimizer's "thinking" with respect to dynamic prefetch versus sequential prefetch (dynamic and sequential prefetch - along with list prefetch - are described in the aforementioned part 1 of this 3-part entry on prefetch).
OK, prefetch quantity: for years, we've thought of a 32-page prefetch quantity for SQL access to tablespaces and indexes defined with a 4 KB page size, and a 64-page quantity for utility access to such objects. With DB2 9, the prefetch quantity for SQL access goes to 64 pages for sequential prefetch and LOB-related list prefetch (it's still 32 pages for dynamic prefetch and non-LOB list prefetch) for tablespaces and indexes assigned to 4 KB buffer pools for which VPSIZE (the number of buffers allocated to the pool) times VPSEQT (the sequential steal threshold, which defaults to 80% and is used to limit the number of buffers that can be occupied by pages read from disk in a sequential pattern) is greater than or equal to 40,000. For utility access to objects assigned to 4 KB buffer pools, the prefetch quantity in DB2 9 environment is 128 pages when VPSIZE times VPSEQT is greater than or equal to 80,000. The increase in the prefetch quantity should improve response times for application-related tablespace scans and for many utility operations. You can find information about the V9 prefetch quantities for SQL and utility access, for all buffer pool page sizes (4 KB, 8 KB, 16 KB, and 32 KB), in Table 117 in the DB2 V9 for z/OS Performance Monitoring and Tuning Guide (you can find this and other DB2 9 manuals in the DB2 9 "bookshelf" on IBM's Web site).
Next, the new cluster ratio formula. You probably know that CLUSTERRATIO, a column in the SYSINDEXES catalog table, indicates, in the form of a percentage, the degree to which the physical order of a table's rows align with the order of the index's key values (the CLUSTERRATIOF column in SYSINDEXES expresses the same value as a floating-point number between 0 and 1). CLUSTERRATIO influences the DB2 optimizer's access path selection process, particularly as it pertains to prefetch access. With DB2 9, the formula for calculating the cluster ratio of an index has been enhanced to improve the accuracy and usefulness of the calculated value. The principal sources of improved calculation are:
Also, note that DB2 9 introduces a new catalog statistic, DATAREPEATFACTORF (in SYSINDEXES and related catalog tables), that gives the optimizer a much better feel for the "density" of an index's ascending key values with respect to the sequential arrangement of rows in the pages of the underlying table. In other words, are rows in a table sequential and "dense" with respect to the index, or are they sequential and not dense? This information helps the optimizer to further refine access path selection decision-making, particularly with respect to prefetch.
Finally, a bit on the optimizer's prefetch preference in a DB2 9 environment. What you are likely to see if you migrate to V9 from V8 is an increase in dynamic prefetch activity and a decrease in sequential prefetch activity - in fact, you may see sequential prefetch used only for tablespace scans. The leaning towards dynamic prefetch makes sense: it's "smarter" (it can be temporarily "turned off" during a scan, if the page access pattern becomes non-sequential in terms of overly large "skip ahead" or "skip behind" distances from page to page, and it doesn't require access to particular "trigger pages" to keep prefetch going) and it can work for backward as well as forward scans (sequential prefetch is forward-only). That said, you may think of the dynamic prefetch preference as being an example of the optimizer "punting" with respect to prefetch decisions (i.e., "I'm just going to let that be a run-time decision versus a bind-time decision"). You'd be wrong in so thinking. What you should expect in a DB2 9 system is an increased incidence of the value 'D' (for dynamic prefetch) in the PREFETCH column in EXPLAIN tables. That means that the optimizer expects that dynamic prefetch will be utilized in accessing data or index pages for a query, and will optimize the statement accordingly.
That leads me to the bottom line: DB2 prefetch, an oldie-but-goodie product feature, is better than ever in the DB2 9 environment. DB2 9 will use prefetch more effectively and efficiently, and the result should be improved application (and utility) performance.
OK, prefetch quantity: for years, we've thought of a 32-page prefetch quantity for SQL access to tablespaces and indexes defined with a 4 KB page size, and a 64-page quantity for utility access to such objects. With DB2 9, the prefetch quantity for SQL access goes to 64 pages for sequential prefetch and LOB-related list prefetch (it's still 32 pages for dynamic prefetch and non-LOB list prefetch) for tablespaces and indexes assigned to 4 KB buffer pools for which VPSIZE (the number of buffers allocated to the pool) times VPSEQT (the sequential steal threshold, which defaults to 80% and is used to limit the number of buffers that can be occupied by pages read from disk in a sequential pattern) is greater than or equal to 40,000. For utility access to objects assigned to 4 KB buffer pools, the prefetch quantity in DB2 9 environment is 128 pages when VPSIZE times VPSEQT is greater than or equal to 80,000. The increase in the prefetch quantity should improve response times for application-related tablespace scans and for many utility operations. You can find information about the V9 prefetch quantities for SQL and utility access, for all buffer pool page sizes (4 KB, 8 KB, 16 KB, and 32 KB), in Table 117 in the DB2 V9 for z/OS Performance Monitoring and Tuning Guide (you can find this and other DB2 9 manuals in the DB2 9 "bookshelf" on IBM's Web site).
Next, the new cluster ratio formula. You probably know that CLUSTERRATIO, a column in the SYSINDEXES catalog table, indicates, in the form of a percentage, the degree to which the physical order of a table's rows align with the order of the index's key values (the CLUSTERRATIOF column in SYSINDEXES expresses the same value as a floating-point number between 0 and 1). CLUSTERRATIO influences the DB2 optimizer's access path selection process, particularly as it pertains to prefetch access. With DB2 9, the formula for calculating the cluster ratio of an index has been enhanced to improve the accuracy and usefulness of the calculated value. The principal sources of improved calculation are:
- The formula now considers all of the RIDs (row IDs) in an index, as opposed to considering only succeeding key values. The old formula made the cluster ratio for indexes with lots of duplicate key values (e.g., an index on department number on a large employee table) smaller than it should have (keep in mind that RIDs for duplicate key values are stored in an index in ascending RID sequence - in other words, in physical sequential order).
- The old formula counted a table row as being clustered if the next key value resided on the same page or a forward page with respect to the last RID of the current key value. Problem is, "forward" could mean 1000 pages forward, and that's really not useful from a prefetch perspective. With the new formula, a "forward" page, in terms of satisfying the "clustered" test, has to be within a sliding window that is based on the prefetch quantity and the size of the buffer pool to which the underlying table is assigned.
- The new formula considers the "clustered-ness" of an index for both forward and backward scans, whereas the old formula considered only forward sequential access.
Also, note that DB2 9 introduces a new catalog statistic, DATAREPEATFACTORF (in SYSINDEXES and related catalog tables), that gives the optimizer a much better feel for the "density" of an index's ascending key values with respect to the sequential arrangement of rows in the pages of the underlying table. In other words, are rows in a table sequential and "dense" with respect to the index, or are they sequential and not dense? This information helps the optimizer to further refine access path selection decision-making, particularly with respect to prefetch.
Finally, a bit on the optimizer's prefetch preference in a DB2 9 environment. What you are likely to see if you migrate to V9 from V8 is an increase in dynamic prefetch activity and a decrease in sequential prefetch activity - in fact, you may see sequential prefetch used only for tablespace scans. The leaning towards dynamic prefetch makes sense: it's "smarter" (it can be temporarily "turned off" during a scan, if the page access pattern becomes non-sequential in terms of overly large "skip ahead" or "skip behind" distances from page to page, and it doesn't require access to particular "trigger pages" to keep prefetch going) and it can work for backward as well as forward scans (sequential prefetch is forward-only). That said, you may think of the dynamic prefetch preference as being an example of the optimizer "punting" with respect to prefetch decisions (i.e., "I'm just going to let that be a run-time decision versus a bind-time decision"). You'd be wrong in so thinking. What you should expect in a DB2 9 system is an increased incidence of the value 'D' (for dynamic prefetch) in the PREFETCH column in EXPLAIN tables. That means that the optimizer expects that dynamic prefetch will be utilized in accessing data or index pages for a query, and will optimize the statement accordingly.
That leads me to the bottom line: DB2 prefetch, an oldie-but-goodie product feature, is better than ever in the DB2 9 environment. DB2 9 will use prefetch more effectively and efficiently, and the result should be improved application (and utility) performance.