Robert's Blog

Tuesday, June 1, 2010

Nuggets from DB2 by the Bay, Part 3

Still more items of information from the 2010 International DB2 Users Group North American Conference held last month in Tampa, Florida.

A new chapter in the history of the DB2 optimizer. Terry Purcell, uber-optimizer-guy on IBM's DB2 for z/OS development team, delivered an excellent session on new query optimizer features in the DB2 10 environment. These goodies include:
  • An new option allows you to get dynamic statement cache matching for SQL statements that have different literal values but are otherwise identical. Prior to DB2 10, matching was possible only for statements that were identical on a byte-for-byte basis: such statements would either contain parameter markers or identical literal values. The CPU efficiency attained through statement matching with different literals won't be quite what you get when statements containing parameter markers are matched, but it should be pretty close.
  • How about PERSISTENT caching of prepared dynamic SQL statements? Dynamic statement caching is great, but when a prepared statement gets flushed out of the cache, it's gone, right? Not any more, folks. DB2 10 (in New Function Mode) will provide an access path repository in the catalog that will enable you to stabilize -- in a long-term way -- access paths for dynamic SQL statements (a "game-changer," Terry called it). When this option is in effect, DB2 will look first to match an incoming dynamic statement with a statement in the repository, then (if not found in the repository) in the dynamic statement cache. If neither of these matching attempts is successful, DB2 will dynamically prepare the statement. Want to change a path in the repository? You'll be able to do so by rebinding at the query level. By the way, the repository will also enable a more-robust implementation of DB2's access path hint functionality: it will be possible to put a hint into the repository, so you'll no longer have to provide a query number value in order to use a hint.
  • Mass rebind? No problem. A new REBIND option, APREUSE(YES), will instruct DB2 to generate a new control structure for a package (to take advantage of a service fix, for example) while retaining the existing access path, if possible. If the package's old access path can't be reused for some reason, a new one will be generated. And, speaking of new and different access paths, another DB2 10-delivered REBIND option, APCOMPARE(ERROR), can be used to tell DB2 to issue an error message if a rebind operation changes an access path (you can optionally have DB2 issue a warning instead of an error). Going forward, when you want to do a mass rebind of packages as part of a version-to-version DB2 migration, you may well want to do your rebinds with APREUSE(YES) and APCOMPARE(ERROR).
  • More user-friendly access plan stability. Lots of people like the access plan stability capability that was delivered with DB2 9 for z/OS via the new PLANMGMT option of the REBIND command. Nice as that is, it could be a hassle trying to get information about a version of a package's access plan other than the one currently in use. DB2 10 will address that problem with a new catalog table, SYSPACKCOPY, that will provide metadata for previous and original copies of access plans.
  • Playing it safe when choosing access paths. DB2 has always gone for lowest cost when choosing an access path for a query. Sometimes, that can be a problem for a statement with one or more host variables in its predicates, as the path identified as lowest-cost might result in really bad performance for certain variable values. The DB2 10 optimizer, older and wiser, will consider risk (i.e., the chance of getting poor performance for certain statement variable values) as well as cost in determining the optimal path for a SQL statement.
  • Staying with the RID option. Some folks cringe when DB2 starts down the path of using a RID list in executing a query (perhaps for multi-index access), and then switches to a tablespace scan because a RID limit was reached. DB2 10 can overflow a big RID list to a workfile and keep on trucking. Will you need more workfile space as a result? Perhaps, but note that the spillover effect is mitigated by a new larger default size for the RID pool in a DB2 10 environment.
  • In-list advances. DB2 10 extends its predicate transitive closure capability (the ability to determine a relationship between A and C based on A-B and B-C relationships) to in-list predicates. DB2 10 can also use matching index access for multiple in-list predicates in a query (prior to DB2 10, if a query had several in-list predicates, only one of these could be used for a matching index scan). And one more thing: DB2 10 can take several OR-connected predicates that match on one index and convert them to a single in-list predicate to generate a result set (that's more efficient than using "index ORing" for the predicates, as is done in a pre-10 DB2 system).
  • Query parallelism enhancements. With DB2 10, you can get parallelism for multi-row FETCH operations (though not for an ambiguous cursor). DB2 10 also enables parallel query tasks to share workfiles. And, in DB2 10, something called "dynamic record range partitioning" can be used to cause data in table A to be split into partitions that "line up" with the partitions of table B, the result being improved parallel table-join processing. This does introduce a data sort, but the expectation is that the technique will be used when table A is on the small side, so the sort shouldn't be a big deal.=
  • A RUNSTATS efficiency boost. It used to be that sampling provided some help in reducing the CPU cost of a RUNSTATS utility job. With DB2 10, sampling provides a LOT of help in this department, because the sampling percentage now applies to the percentage of pages examined (it used to refer to the percentage of data rows examined in gathering statistics for non-indexed columns -- you could sample 25% of the rows, but end up accessing ALL of the tablespace's pages). What's more, there's an optional specification that you can use to tell DB2 to figure out the sampling percentage to use for an execution of RUNSTATS.
Where to go for more DB2 10 information. IBM's Roger Miller has provided information about DB2 10 presentations available on IBM's Web site. They are in a folder accessible via this url:

In this folder, you'll find these presentations that were delivered at the IDUG conference in Tampa:
  • Session A01: DBA improvements, by Roger Miller
  • Session B02: What's new from the optimizer, by Terry Purcell
  • Session A03: DB2 10 Performance Preview, by Akiko Hoshikawa
  • Session A06: DB2 and System z Synergy, by Chris Crone
  • Session A08: DB2 10 Availability Enhancements, by Haakon Roberts
You'll also find these presentations from the recent IBM Information on Demand European Conference:
  • Session 2908: DB2 10 for z/OS security features help satisfy your auditors, by Jim Pickel
  • Session 2894: What’s coming from the optimizer in DB2 10 for z/OS, by Terry Purcell
  • Session 3010: New pureXML Features in DB2 for z/OS: Breaking Relational Limits, by Guogen (Gene) Zhang
I'll be back here in a few days with still more from my notes taken at the IDUG Conference in Tampa. Ciao for now.


Anonymous Anonymous said...

I have read in many places about the RID list failure solution in V10. However, it always refers to failure due to space. Is there anything in V10 which addresses the RDS over 25% limit failures ?


Rob Pearce
Swisscom IT Services

October 30, 2012 at 4:08 AM  
Blogger Robert Catterall said...

Sorry about the delayed response, Rob.

My understanding is that the limit to which you refer remains in a DB2 10 for z/OS environment: if, at run time, the optimizer determines that more than 25% of a table's rows will be qualified by a predicate for which RID list processing is the selected access path, DB2 will abandon list prefetch in favor of a table space scan.

People should keep in mind that for a static SQL statement, this 25% threshold is established at BIND time and is stored in the associated package; thus, if the table grows significantly after the package has been bound, so that the actual 25% threshold is higher than it was at bind time (e.g., 25% of 100 millions rows versus 25% of 10 million rows), RID list processing could be abandoned at run time when it shouldn't be. If that happens (and a DB2 monitor shows when RID list processing is abandoned due to a limit being exceeded), rebinding the package could rectify the situation by causing a recalculation of the RDS RID list processing threshold (you'd want to make sure that catalog statistics are current before performing that REBIND action - that could necessitate RUNSTATS execution for the target table space).


November 5, 2012 at 7:07 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home