The Key to DB2 for z/OS DDF Application Performance
Happy New Year, folks.
In this first of my 2008 posts, I will write about the performance of applications that utilize the DB2 for z/OS Distributed Data Facility, aka DDF - the component of DB2 that accepts data requests issued by remote clients ("remote," in this context, refers to a different logical - usually separate physically, as well - server, as opposed to a considerable distance - the remote client system could be 1000 miles from the DB2 for z/OS server, or it could be 20 feet away). DDF also enables DB2 on a mainframe to act as a remote data requester, but my focus today is on DB2 for z/OS in the role of data provider.
In many ways, performance design and tuning for a DDF-using application is just as it is for a local (to the DB2 for z/OS subsystem) application (an SQL statement that runs for an overly long time when executed via a CICS transaction program will likely run for a long time if it gets to DB2 through the DDF). When it comes to queries with multi-row result sets, however, there is a very important difference. The performance of such queries in a DDF application environment is very much dependent on the use by DB2 of a capability called block fetch. Consider a query, issued from a remote client (or embedded in a local DB2 stored procedure called from a remote client), that generates a result set of 100 rows. If DB2 returns the result set to the remote client using block fetch, the rows will be sent in "chunks" through the network (generally speaking, the "chunking factor" depends on the number of rows that will fit into one 32-kilobyte block). If block fetch is NOT used, the result set rows will be sent to the client one at a time (one row in each network transmission). The difference in response time for a DDF-related query when block fetch is and isn't used can easily be an order of magnitude or more.
Here's the good news: it's pretty easy to get DB2 for z/OS to use block fetch. Basically, it's a matter of enabling DB2 to recognize that the query in question will be used solely for the purpose of retrieving data, as opposed to updating or deleting data (sometimes a query is used to fetch data values that are subsequently used to locate rows that will be updated or deleted by the application process). The most straightforward way to do this is to code the query in the form of a DECLARE CURSOR statement with FOR FETCH ONLY or FOR READ ONLY specified. If that option is not available to you for some reason, there are other ways to get DB2 to use block fetch. For more information about these other activators of block fetch, check the DB2 Administration Guide. In the DB2 for z/OS V8 Administration Guide, the section of interest is 220.127.116.11.3, "Block fetching result sets."
You may wonder why block fetch is such a big deal for DDF-using applications, when you're used to getting excellent response time for queries issued from local-to-DB2 programs that retrieve one result set row at a time from the subsystem. The reason is pretty simple: moving a result set row from the DB2 Database Services Address Space to an allied address space (e.g., a CICS region or a batch address space) on the same z/OS system is a virtually instantaneous event compared to the time required to move that same row from DB2 to a network-connected client system. [Note: this is not to say that block fetching is of no value outside the DDF environment. The multi-row fetch capability introduced with DB2 V8 can significantly reduce CPU time for queries issued from locally-connected applications.]
I'll close out this post with a bit of block-fetch-related information that I find to be interesting. Many organizations use DB2 for z/OS stored procedures to boost the scalability and security of their DDF-using applications (that's a good thing). In these stored procedure programs, you'll often see the clause OPTIMIZE FOR n ROWS (with some literal integer value substituted for "n") used as a DB2 optimizer influencer (absent this clause in a SELECT statement, DB2 assumes that the associated application process will retrieve all of the rows in a result set, and that is not always the case). Did you know that for DDF-using applications, OPTIMIZE FOR n ROWS controls result set row blocking, in addition to influencing the DB2 optimizer? If you didn't know that, and you have OPTIMIZE FOR 1 ROW specified for cursor-based SELECT statements in stored procedures that are called via DDF, don't freak out - DB2 will use a blocking factor of 16 for purposes of result set row transmittal to remote clients when a value of 1, 2, or 3 is specified for "n" in the OPTIMIZE FOR n ROWS clause (DB2 for z/OS V8 users who want to be assured of getting this DDF row blocking behavior may want to check out APAR PK57930).
Here's to a great 2008. I hope that you'll visit my blog regularly this year. I'll try to make it worth your while.
Keeping Historical Data for DB2 Performance Analysis
I'm sure that many of you have seen this before: a performance problem arises in a DB2 for z/OS environment. The hunt begins for a root cause. Eventually, it comes to light that a significant application change effected about four weeks ago might be the source of the problem. To help nail this down, someone requests some DB2 monitor data from 30 days ago - just prior to the implementation of the aforementioned application change. The person has the right idea (compare "before" data on subsystem activity with "after" data in an effort to isolate the problem), but he ends up being frustrated because the data he wants is nowhere to be found - the data needed to generate DB2 monitor reports is kept for only a few days past the date of generation.
I understand why people toss SMF records (the raw trace data that is the key input to a DB2 monitor reporting job) while they're still pretty fresh: a large and busy system can generate TONS of these things in one day, and the longer you keep them, the more disk space you have to set aside for that purpose. Disk storage is cheaper than it used to be, but vendors don't give the stuff away.
Some folks deal with this dilemma (wanting to have historical DB2 performance data to help with problem diagnosis and resolution, but cringing at the associated storage cost) by extracting certain fields from the SMF trace records and storing them in a performance database, later to generate reports from this database as needed. That's better than nothing, but I've seen a couple of problems with this approach. First, people sometimes end up not putting into their performance database certain trace record field values that are later identified as a key to resolving a performance issue (keeping all trace record fields in the performance database is an option, but then you have those higher disk storage costs to deal with). Second, the reports generated from records in the performance data might provide seemingly misleading data - this because DB2 trace record formats and field meanings can change from release to release, and the routines used to generate monitor reports from performance database data may not keep up with these changes.
I've seen an approach to collecting and keeping historical DB2 monitor data that I really like, and in this post I'm going to share it with you. Step 1 is to generate, on a daily basis, DB2 monitor accounting detail and statistics detail reports for your production DB2 for z/OS system(s). On the accounting side, the traces run to generate data for the report should include classes 1, 2, 3, 7, and 8. With respect to the grouping specification for the report data, I favor "connection type" as a nice balance between voluminous (grouping by, say, correlation ID would generate a LOT of report pages) and really broad brush (grouping by subsystem ID would perhaps hide some important information). Grouping by connection type will give you one detailed report for each type of connection to the DB2 subsystem - DRDA, batch, CICS, etc. The time interval for the report could be a busy hour of the day, or a full 24-hour period (I lean towards the former). The statistics report data should include that generated by the default statistics traces (1, 3, 4, 5, and 6). The report time interval should be consistent with that chosen for the daily accounting detail report. If you are running DB2 for z/OS in data sharing mode, generate both member-scope and group-scope statistics reports. After the accounting and statistics reports reports have been generated from a given days' DB2 SMF records, you can keep those records around for maybe another few days and then get rid of them.
Step 2 is to make these daily reports easily available to interested parties. The best way to do this, I think, is to "print" the reports to files in the disk subsystem, where they can be easily located and browsed through the TSO/ISPF interface. IBM for years sold a nifty product, called Report Management and Distribution System (RMDS), that greatly facilitated this process. Do any of you wonder what ever happened to RMDS? I did, and in looking around I found that it appears to have morphed a few times, becoming most recently an IBM product called Tivoli Output Manager for z/OS (something that does - I believe - what RMDS did, and more). You can check out documentation for Tivoli Output Manager on IBM's Web site. Something to make managing these files an easier task: consider "printing" them to a Generation Data Group (GDG). A GDG, in case you're not a z/OS system programming type, is in essence a group of files that can be thought of as successive iterations of the same process (in this case, a DB2 monitor report-generating process). You specify a limit on the number of iterations to retain, and the oldest ones are automatically deleted once that limit is reached. The file names are all the same base, with numerical suffixes distinguishing one from another. You'd have a GDG for each report type: one for the accounting detail reports, and one for the statistics detail reports (and a third for the group-scope statistics reports, if you run a DB2 data sharing group). If you'd like more information on GDGs in an easily digestible form, check out the IBM "red book" titled ABCs of z/OS System Programming Volume 3.
How many days of these reports should you retain? I'd like to have at least 30, and 60 days would be great. Trust me, once you see how useful these retained reports are, not only for problem solving but also for general DB2 workload trend analysis, you'll really be glad you've got 'em.
Non-technical postscript: Christmas Day is right around the corner. Whatever your faith tradition, I wish you joy and peace as 2007 draws to a close.
What Are Your Plans for DB2 for z/OS Stored Procedures?
If your organization runs DB2 on a mainframe server, are you using stored procedures? If so, good - plan on using them more extensively. If you're not using DB2 stored procedures, get moving with respect to putting an implementation plan in place.
As you've probably surmised, I'm very bullish on DB2 for z/OS stored procedures. I believe that they are the future of server-side SQL on the mainframe platform. I'll get to why in a moment. First, I'd like to address a question to those of you employed by DB2 for z/OS-using organizations that have either not started using stored procedures or are utilizing them only in a very limited way: what is holding you back?
Are you concerned that stored procedures - particularly when invoked by off-mainframe client programs through DB2's Distributed Data Facility (an application model that I favor) - simply can't handle a high-volume transactional workload? In fact, they can. There are organizations out there right now that are driving more than 1000 transactions per second using stored procedures and DDF. I'm currently working with a company that has a production DB2 DDF workload (much of it involving stored procedures) that regularly peaks at over 700 transactions per second, and they are nowhere near done with respect to bringing more application work of this kind online. This should not come as a surprise to anyone. DB2 for z/OS stored procedure programs, after all, can be written in a variety of languages, including tried-and-true (and CPU-efficient) COBOL. Execution of these programs can be spread across multiple stored procedure address spaces managed by Workload Manager (aka WLM, a component of the z/OS operating system). Not only that, but WLM in goal mode can dynamically fire up additional instances of a given stored procedure address space to better handle swings in application volume.
Are you concerned that DDF means dynamic SQL, especially when you're talking about database access interfaces such as Java Database Connectivity (JDBC) and ADO.NET that are frequently used with programs written for UNIX-, Linux-, and Windows-based application servers? Worry not. Yes, a stored procedure call issued via one of these database access interfaces will end up being processed as a dynamic SQL statement on the DB2 for z/OS system, but that's just the call. Static SQL embedded in a mainframe server-side stored procedure program is still static SQL, even if the associated stored procedure program is invoked dynamically. DB2 stored procedures, therefor, enable an organization to leverage the benefits of static SQL with respect to performance (pre-optimized SQL) and security (application authorization IDs require only the EXECUTE privilege on stored procedures and related packages, as opposed to direct table access privileges), while enabling off-mainframe client-side programmers to code in ways that come naturally to them.
Worried that you won't be able to monitor your DB2 for z/OS environment as you can the other parts of your mainframe system? Don't be. There are multiple products available from several vendors of DB2-related tools that can provide you with detailed information about your stored procedure workload. Stored procedures are definitely not some kind of black-box technology.
I mention these concerns sometimes on the minds of mainframe systems people because in my experience, programmers - whether mainframers themselves or distributed systems people - are already sold on the benefits of DB2 stored procedure usage. Mainframe-side developers like to be able to leverage their COBOL (or C or whatever) and SQL skills for projects that involve a lot of off-mainframe code written in Java (or C# or whatever). The off-mainframe coders like being able to use a familiar syntax when accessing DB2 stored procedures and associated query result sets (use of stored procedures is an established best practice in relational database environments that run on Linux, UNIX, and Windows servers).
What about the future of DB2 stored procedures? Lookin' brighter all the time, baby. I'm particularly pumped about what IBM is doing with respect to SQL procedures. These are stored procedures for which the program source is contained within the CREATE PROCEDURE statement (this versus a CREATE PROCEDURE statement that references an external-to-DB2 load module). Yeah, they're easy to develop, but to me the appeal was somewhat limited because, under the covers, DB2 would end up generating a C language program to be the SQL procedure executable, and the thing ended up running as a traditional external stored procedure does. That all changes with DB2 for z/OS V9 in New Function Mode: a SQL procedure executable will not be a C language program. Instead, the executable code will be stored in the DB2 directory (conceptually similar to the storage of DB2 package sections in the directory), and it will execute in the DB2 Database Services address space. That's cool enough (no need for a C compiler), but here's what has me really jazzed: these SQL procedures, when invoked via DDF (i.e., by off-mainframe client programs), will be able to use the processing power of a zIIP engine. zIIPs, as you may well know, are zSeries processors that can do real work (especially when enclave SRBs - a certain type of z/OS control block representing a dispatchable piece of work - are involved, as they are in the case of DDF threads), with the bonus being that they don't figure into the pricing of mainframe software. Boom! Instant TCO (total cost of ownership) benefit for mainframe-using organizations.
Great, huh? Are you as excited about this technology as I? Am I just over-caffeinated (I am enjoying a good-sized cup with my Wi-Fi)? Just check it out, OK? And if you're already realizing the advantages of DB2 stored procedures, tell some other folks about it. As we in the American South might say, give stored procedures a call, y'all.