Robert's Blog

Tuesday, April 21, 2009

DB2 for z/OS Prefetch, Part 3 (DB2 9 Enhancements)

Prefetch (the basics of which I described in part 1 of this 3-part blog entry) has been a feature of DB2 for z/OS for as long as I can remember (I first got my hands on the DBMS around 1987, when Version 1 Release 2 was current). Over the years, prefetch has been enhanced in various ways through succeeding releases of DB2, and that process has continued with Version 9. In this post, I will cover three prefetch-related DB2 9 enhancements: an increase in the prefetch quantity, a new formula for calculating the cluster ratio of an index relative to the underlying table (along with a new and related catalog statistic), and a change in the DB2 optimizer's "thinking" with respect to dynamic prefetch versus sequential prefetch (dynamic and sequential prefetch - along with list prefetch - are described in the aforementioned part 1 of this 3-part entry on prefetch).

OK, prefetch quantity: for years, we've thought of a 32-page prefetch quantity for SQL access to tablespaces and indexes defined with a 4 KB page size, and a 64-page quantity for utility access to such objects. With DB2 9, the prefetch quantity for SQL access goes to 64 pages for sequential prefetch and LOB-related list prefetch (it's still 32 pages for dynamic prefetch and non-LOB list prefetch) for tablespaces and indexes assigned to 4 KB buffer pools for which VPSIZE (the number of buffers allocated to the pool) times VPSEQT (the sequential steal threshold, which defaults to 80% and is used to limit the number of buffers that can be occupied by pages read from disk in a sequential pattern) is greater than or equal to 40,000. For utility access to objects assigned to 4 KB buffer pools, the prefetch quantity in DB2 9 environment is 128 pages when VPSIZE times VPSEQT is greater than or equal to 80,000. The increase in the prefetch quantity should improve response times for application-related tablespace scans and for many utility operations. You can find information about the V9 prefetch quantities for SQL and utility access, for all buffer pool page sizes (4 KB, 8 KB, 16 KB, and 32 KB), in Table 117 in the DB2 V9 for z/OS Performance Monitoring and Tuning Guide (you can find this and other DB2 9 manuals in the DB2 9 "bookshelf" on IBM's Web site).

Next, the new cluster ratio formula. You probably know that CLUSTERRATIO, a column in the SYSINDEXES catalog table, indicates, in the form of a percentage, the degree to which the physical order of a table's rows align with the order of the index's key values (the CLUSTERRATIOF column in SYSINDEXES expresses the same value as a floating-point number between 0 and 1). CLUSTERRATIO influences the DB2 optimizer's access path selection process, particularly as it pertains to prefetch access. With DB2 9, the formula for calculating the cluster ratio of an index has been enhanced to improve the accuracy and usefulness of the calculated value. The principal sources of improved calculation are:
  1. The formula now considers all of the RIDs (row IDs) in an index, as opposed to considering only succeeding key values. The old formula made the cluster ratio for indexes with lots of duplicate key values (e.g., an index on department number on a large employee table) smaller than it should have (keep in mind that RIDs for duplicate key values are stored in an index in ascending RID sequence - in other words, in physical sequential order).
  2. The old formula counted a table row as being clustered if the next key value resided on the same page or a forward page with respect to the last RID of the current key value. Problem is, "forward" could mean 1000 pages forward, and that's really not useful from a prefetch perspective. With the new formula, a "forward" page, in terms of satisfying the "clustered" test, has to be within a sliding window that is based on the prefetch quantity and the size of the buffer pool to which the underlying table is assigned.
  3. The new formula considers the "clustered-ness" of an index for both forward and backward scans, whereas the old formula considered only forward sequential access.
So, in going from V8 to V9 (and after running the RUNSTATS utility for objects in the DB2 9 environment, to update CLUSTERRATIO values), you might see some custer ratio figures increase (perhaps dramatically, for some indexes with lost of duplicate key values) and some decrease (for tables that were clustered relative to an index only if you considered large "skip-ahead" page distances from one key value to the next as being OK, as was the case with the old formula). The new cluster ratio values (which will require package rebinds if you want them to influence static SQL access paths) should improve performance in DB2 9 systems; however, on the off chance that the new values have a negative impact on performance, you can direct DB2 to go back to the old formula by setting the value of the ZPARM parameter STATCLUS to STANDARD.

Also, note that DB2 9 introduces a new catalog statistic, DATAREPEATFACTORF (in SYSINDEXES and related catalog tables), that gives the optimizer a much better feel for the "density" of an index's ascending key values with respect to the sequential arrangement of rows in the pages of the underlying table. In other words, are rows in a table sequential and "dense" with respect to the index, or are they sequential and not dense? This information helps the optimizer to further refine access path selection decision-making, particularly with respect to prefetch.

Finally, a bit on the optimizer's prefetch preference in a DB2 9 environment. What you are likely to see if you migrate to V9 from V8 is an increase in dynamic prefetch activity and a decrease in sequential prefetch activity - in fact, you may see sequential prefetch used only for tablespace scans. The leaning towards dynamic prefetch makes sense: it's "smarter" (it can be temporarily "turned off" during a scan, if the page access pattern becomes non-sequential in terms of overly large "skip ahead" or "skip behind" distances from page to page, and it doesn't require access to particular "trigger pages" to keep prefetch going) and it can work for backward as well as forward scans (sequential prefetch is forward-only). That said, you may think of the dynamic prefetch preference as being an example of the optimizer "punting" with respect to prefetch decisions (i.e., "I'm just going to let that be a run-time decision versus a bind-time decision"). You'd be wrong in so thinking. What you should expect in a DB2 9 system is an increased incidence of the value 'D' (for dynamic prefetch) in the PREFETCH column in EXPLAIN tables. That means that the optimizer expects that dynamic prefetch will be utilized in accessing data or index pages for a query, and will optimize the statement accordingly.

That leads me to the bottom line: DB2 prefetch, an oldie-but-goodie product feature, is better than ever in the DB2 9 environment. DB2 9 will use prefetch more effectively and efficiently, and the result should be improved application (and utility) performance.


Anonymous Martin Hubel said...

Hi Robert,
Thanks for pointing me in the right direction, with your note above: "Table 184 the DB2 V9 for z/OS". I found this in Table 117.

January 19, 2010 at 12:11 PM  
Blogger Robert Catterall said...

Thanks for catching that, Martin -- I believe that the information might have changed location in the manual through a revision that came out sometime after I posted this entry in April of last year. I made the correction in the blog entry.


January 19, 2010 at 7:10 PM  
Anonymous Anonymous said...

Does db2 prefetch on client side and server side while using the db2 clp ? I am trying to tell DB2 to read from the disk ( for no particular reason ) but alas, this seesms impossible. Here is what i have done and would really appreciate if you could tell me what is happening

a) create data partitioned tables about 10 partitions
b) insert 10 rows , 1 row in each partition.
c) using the db2 clp declare and open a UR/CS reader
d) db2expln shows :
Data Prefetch none
e) Output the fmtlock
f) verify partition level locks across all the partitions.

If i fetch 1 row at a time, I touch 1 partition at a time ; Why are the locks being held across all the partitions ? This tells me that db2 has prefetched the data. But db2expln shows Data Prefetch none.
what is happening here ?

February 22, 2010 at 7:00 PM  
Blogger Robert Catterall said...

"I am trying to tell DB2 to read from the disk... but alas, this seems impossible." You don't tell DB2 to read from disk. You tell DB2 what data you want. If that data is on a page (or pages) that is already in the buffer pool, a disk read is unnecessary (and undesirable, for performance reasons). If the data is on a page that is not in the buffer pool, DB2 will read the required page (or pages) from disk.

"Why are the locks being held across all the partitions? This tells me that DB2 has prefetched the data." No, it does not tell you that. Prefetch and locking are independent processes -- the former has to do with performance, and the latter has to do with concurrency of access. I expect that the partition-level locks you see are of the intent (versus exclusive) variety, so they are not a concern, concurrency-wise. These intent locks are probably taken on all partitions because your cursor qualifies rows in each partition.

"But db2expln shows Data Prefetch none. What is happening here?" Prefetch is designed to read multiple pages from a file into the buffer pool with one I/O request. You have 10 storage objects (your table partitions), each with one row on one page. It would not make sense to use prefetch in that situation. If you had thousands (or millions) of rows per partition, prefetch could be a good choice for data retrieval if many pages have to be examined in the execution of a query.

Hope this helps.


February 23, 2010 at 6:50 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home