Robert's Blog


Tuesday, December 18, 2007

Keeping Historical Data for DB2 Performance Analysis

I'm sure that many of you have seen this before: a performance problem arises in a DB2 for z/OS environment. The hunt begins for a root cause. Eventually, it comes to light that a significant application change effected about four weeks ago might be the source of the problem. To help nail this down, someone requests some DB2 monitor data from 30 days ago - just prior to the implementation of the aforementioned application change. The person has the right idea (compare "before" data on subsystem activity with "after" data in an effort to isolate the problem), but he ends up being frustrated because the data he wants is nowhere to be found - the data needed to generate DB2 monitor reports is kept for only a few days past the date of generation.

I understand why people toss SMF records (the raw trace data that is the key input to a DB2 monitor reporting job) while they're still pretty fresh: a large and busy system can generate TONS of these things in one day, and the longer you keep them, the more disk space you have to set aside for that purpose. Disk storage is cheaper than it used to be, but vendors don't give the stuff away.

Some folks deal with this dilemma (wanting to have historical DB2 performance data to help with problem diagnosis and resolution, but cringing at the associated storage cost) by extracting certain fields from the SMF trace records and storing them in a performance database, later to generate reports from this database as needed. That's better than nothing, but I've seen a couple of problems with this approach. First, people sometimes end up not putting into their performance database
certain trace record field values that are later identified as a key to resolving a performance issue (keeping all trace record fields in the performance database is an option, but then you have those higher disk storage costs to deal with). Second, the reports generated from records in the performance data might provide seemingly misleading data - this because DB2 trace record formats and field meanings can change from release to release, and the routines used to generate monitor reports from performance database data may not keep up with these changes.

I've seen an approach to collecting and keeping historical DB2 monitor data that I really like, and in this post I'm going to share it with you. Step 1 is to generate, on a daily basis, DB2 monitor accounting detail and statistics detail reports for your production DB2 for z/OS system(s). On the accounting side, the traces run to generate data for the report should include classes 1, 2, 3, 7, and 8. With respect to the grouping specification for the report data, I favor "connection type" as a nice balance between voluminous (grouping by, say, correlation ID would generate a LOT of report pages) and really broad brush (grouping by subsystem ID would perhaps hide some important information). Grouping by connection type will give you one detailed report for each type of connection to the DB2 subsystem - DRDA, batch, CICS, etc. The time interval for the report could be a busy hour of the day, or a full 24-hour period (I lean towards the former). The statistics report data should include that generated by the default statistics traces (1, 3, 4, 5, and 6). The report time interval should be consistent with that chosen for the daily accounting detail report. If you are running DB2 for z/OS in data sharing mode, generate both member-scope and group-scope statistics reports. After the accounting and statistics reports reports have been generated from a given days' DB2 SMF records, you can keep those records around for maybe another few days and then get rid of them.

Step 2 is to make these daily reports easily available to interested parties. The best way to do this, I think, is to "print" the reports to files in the disk subsystem, where they can be easily located and browsed through the TSO/ISPF interface. IBM for years sold a nifty product, called Report Management and Distribution System (RMDS), that greatly facilitated this process. Do any of you wonder what ever happened to RMDS? I did, and in looking around I found that it appears to have morphed a few times, becoming most recently an IBM product called Tivoli Output Manager for z/OS (something that does - I believe - what RMDS did, and more). You can check out documentation for Tivoli Output Manager on IBM's Web site. Something to make managing these files an easier task: consider "printing" them to a Generation Data Group (GDG). A GDG, in case you're not a z/OS system programming type, is in essence a group of files that can be thought of as successive iterations of the same process (in this case, a DB2 monitor report-generating process). You specify a limit on the number of iterations to retain, and the oldest ones are automatically deleted once that limit is reached. The file names are all the same base, with numerical suffixes distinguishing one from another. You'd have a GDG for each report type: one for the accounting detail reports, and one for the statistics detail reports (and a third for the group-scope statistics reports, if you run a DB2 data sharing group). If you'd like more information on GDGs in an easily digestible form, check out the IBM "red book" titled ABCs of z/OS System Programming Volume 3.

How many days of these reports should you retain? I'd like to have at least 30, and 60 days would be great. Trust me, once you see how useful these retained reports are, not only for problem solving but also for general DB2 workload trend analysis, you'll really be glad you've got 'em.

Non-technical postscript: Christmas Day is right around the corner. Whatever your faith tradition, I wish you joy and peace as 2007 draws to a close.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home