Catterall Consulting: June 2009

Robert's Blog

Wednesday, June 24, 2009

DB2 9 Native SQL Procedures: One Address Space is OK

I've been doing a lot of presenting lately on the topic of DB2 for z/OS stored procedures. In these presentations, I've emphasized the benefits of native SQL procedures, introduced for the mainframe platform via DB2 9 for z/OS (I blogged on the importance of this development in an entry I posted late last year). During two different sessions held recently in two different cities, two different people asked me the same question pertaining to native SQL procedures versus external stored procedures (the latter being what you might think of as "traditional" stored procedures in a DB2 for z/OS environment). In this entry, I'll share with you that question and my response.

First, the common question: "When DB2 for z/OS Version 5 provided support for stored procedure address spaces managed by the Workload Manager (WLM) component of z/OS, we were told that an advantage of this enhancement was the ability to have multiple stored procedure address spaces, versus the one DB2-managed stored procedures address space (SPAS). With different stored procedures assigned to different WLM application environments and their associated address spaces, if a stored procedure program misbehaved in such a way as to bring down the address space in which it was running, the other stored procedure address spaces would not be impacted. Now, with DB2 9 native SQL procedures, we're back to one address space for stored procedure execution (native SQL procedures execute in the DB2 database services address space, also known as DBM1). Doesn't that mean that we now have the same risk we faced when using the old DB2-managed SPAS, namely, that one errant stored procedure could take down the one stored procedure address space (and this time, we're talking about losing DBM1)?"

My response: It's true that having multiple WLM-managed stored procedure address spaces can reduce the impact of an address space failure caused by an external stored procedure program, but that kind of failure has to do with stored procedure program code executing outside of DB2. Multiple DB2-accessing stored procedures running in in multiple stored procedure address spaces are all executing code in DBM1 when they issue SQL statements (as is true of multiple DB2-accessing CICS transaction programs running in multiple CICS AORs), and that doesn't cause DBM1 to crash. Native SQL procedures running in DBM1 execute as packages. It's all DB2-generated and DB2-managed code. This means that the exposure mitigated by having multiple WLM-managed stored procedure address spaces - that user-written stored procedure program code running outside of DB2 could cause a problem that would lead to the failure of a WLM-managed address space - does not exist for native SQL procedures. To put it another way, having native SQL procedures executing in one address space - DBM1 - is no more risky than having multiple packages invoked by external callers all running in DBM1, and that's been standard operating procedure for DB2 since day one (execution of an embedded SQL statement involves, under the covers, a call to DB2 and a reference to a section of a package).

So, take advantage of the enhanced performance and simplified lifecycle management offered by DB2 9 native SQL procedures, and don't worry about not having multiple address spaces in which to run these stored procedures - you don't need them. Native SQL procedures are made up of SQL statements, and SQL statements - as always - run in DBM1.

Thursday, June 18, 2009

DB2, Stored Procedures, COBOL, and Result Sets

Last week, I helped a DBA at a large financial services firm with a couple of questions related to DB2 for z/OS stored procedures and result sets (referring to the row and column information accessed via a query included in a DECLARE CURSOR statement). Both of the DBA's questions had to do with COBOL programs called by DB2 stored procedures. A lot of mainframe sites have COBOL programs that are invoked by stored procedures, and plenty of people have some misconceptions with regard to the accessibility of result sets generated through cursors in stored-procedure-called programs, so in this post I'll share the information that I provided to the aforementioned DBA.

When it comes to DB2 for z/OS stored procedures calling COBOL programs, the situation is most interesting when both the stored procedure program and the program called by the stored procedure are written in COBOL. In that case, you have a couple of viable options: the COBOL stored procedure program could invoke the target COBOL program by way of an SQL CALL statement (the target program would run as a nested stored procedure, assuming that it had been set up to execute that way through a CREATE PROCEDURE statement referencing the program name), or it could execute the target program through a COBOL CALL (so that the target would run as a COBOL subroutine). Either way, both the COBOL stored procedure program and the target COBOL program would run in a WLM-managed stored procedure address space - the same address space if the target program executes as a COBOL subroutine, and the same or a different address space if the target is invoked via SQL CALL, depending on whether or not the same or a different WLM application environment was specified in the target's CREATE PROCEDURE statement (if you are using the DB2-managed stored procedure address space, get away from that and into WLM-managed address spaces soon - the DB2-managed space is not supported in a DB2 9 environment).

One of the differences between the nested-procedure and COBOL subroutine scenarios has to to with task control blocks (TCBs). If the target COBOL program is invoked via SQL CALL, it will run under its own TCB. If the target is executed as a COBOL subroutine, it will run under the TCB of the stored procedure program that issued the COBOL CALL for the subroutine.

Another difference - the one with which the question-asking DBA was concerned - has to do with access to a result set defined by a cursor declared in the target COBOL program. The DBA first brought up a situation in which a program running on an off-mainframe application server was seemingly able to fetch rows from a cursor declared in a "two levels down" stored procedure: the mid-tier program calls DB2 for z/OS COBOL stored procedure A, and stored procedure A calls COBOL stored procedure B. Stored procedure B issues a DECLARE CURSOR statement (on which the WITH RETURN option is specified) and opens this cursor. The mid-tier program subsequently fetches the result set rows associated with the cursor declared and opened in stored procedure B. That was working, but it shouldn't have been, because a DB2 for z/OS stored procedure generating a result set can return that result set only one level up within a series of nested calls. In other words, if stored procedure B declares and opens a cursor, stored procedure A (which called B via SQL CALL) can fetch rows from that cursor-defined result set by issuing an ASSOCIATE LOCATOR statement to get the locator value for the result set, and an ALLOCATE CURSOR statement to define a cursor and associate it with the result set locator value. If the program that called stored procedure A wants to retrieve the result set generated by the cursor declared in stored procedure B, it cannot use this ASSOCIATE LOCATOR/ALLOCATE CURSOR mechanism, because that mechanism only works one level up in a nested SQL CALL structure (DB2 for Linux, UNIX, and Windows allows result-set retrieval at both the one-level-up level and at the top level of a nested SQL CALL structure - "top" referring to the program that issued the initial CALL to a stored procedure).

So, how was the mid-tier program mentioned by the DBA able to get the two-levels-down result set generated by stored procedure B? Upon further investigation, the DBA found that stored procedure B, in addition to declaring and opening a cursor defining a result set, inserted the result set rows into a global temporary table (these come in two flavors, declared temporary tables and created temporary tables, with the latter usually being the best choice in terms of performance). Stored procedure A then declared and opened a cursor (WITH RETURN) referencing this global temporary table, and the mid-tier program (caller of stored procedure A) could then access the result set because it (the mid-tier program) was only one level up from stored procedure A. That's in fact an excellent way to make a stored procedure-generated result set available to a program several levels up in the nested call structure: put the result set in a global temporary table.

So, we had one mystery solved. The DBA then pointed to another situation that had him scratching his head: a program (again running on an off-mainframe middle tier) called COBOL stored procedure X, stored procedure X invoked COBOL subroutine Y via COBOL CALL, and the middle tier program was subsequently able to access a result set generated through a cursor declared (WITH RETURN) and opened by COBOL subroutine Y.

This was actually a working-as-designed situation. The DBA was thinking that it shouldn't have worked, because he was under the impression that a result set generated by program Y could be returned if program Y were invoked via SQL CALL, and could not be returned if program Y executed as a COBOL-called subroutine. You can in fact find passages in DB2 manuals and "red books" that appear to confirm this understanding of result set processing. It's not that the documentation is wrong - it's just that it can be easily misinterpreted if you consider it from a different perspective versus that of the documentation authors. Here's what I mean by that: when you read in a DB2 book that a COBOL-called subroutine cannot return a cursor-defined result set, what's being communicated is the fact that the subroutine can't return a result set to the program that invoked it via COBOL call. A subroutine called via COBOL call from a COBOL DB2 stored procedure program can return a result set to the program that called the stored procedure. This is consistent with the result set processing mechanism I described above for nested stored procedures: a SQL-called stored procedure can pass a result set to a one-level-up program (i.e., to the program that called it). In the context of result set processing, a subroutine called via COBOL CALL from a stored procedure program runs at the same "level" as the calling stored procedure; therefore, a result set generated by that subroutine, while not accessible by the calling stored procedure, can be accessed by the caller of the stored procedure (i.e., by the "one level up" program). Just remember that the cursor declared in the subroutine has to include the WITH RETURN option, and the stored procedure invoking the subroutine has to be defined with DYNAMIC RESULT SETS 1 (or more than 1, if multiple result sets will be generated by the stored procedure program and/or by COBOL-called subroutines invoked by the stored procedure program).

Is that clear? I hope so. I'm very big on DB2 stored procedures, and I want people to know how they can use them.

Monday, June 8, 2009

Thoughts on DB2 Triggers

I was in the Upper Midwest of the USA for most of last week, presenting at three regional DB2 user group meetings - in Minneapolis, Chicago, and Milwaukee - on three consecutive days. One of the presentations I gave in each city covered DB2 for z/OS data warehouse performance. In that presentation, I provided some guidelines on the average number of indexes defined per table in a data warehouse database (I wrote of this in an entry posted to this blog last year). Following the meeting in Milwaukee, one of the attendees asked me if I had any recommendations pertaining to the number of triggers defined on a table. I don't, because trigger usage scenarios and environments vary so widely, but the question sparked an interesting discussion about DB2 triggers that covered a variety of sub-topics. By way of this entry, I'll commit these DB2 trigger thoughts of mine to paper (electronically speaking).

[Super-brief level-set: by way of a trigger defined on a DB2 table, one can cause an SQL-expressed action to be taken automatically in response to an update, delete, or insert targeting the base table. For example, one could use a trigger defined on table A to cause an insert into table A to drive an update of a column in table B.]

First, concerning that question on the number of triggers defined on a table, the answer is very much of the "it depends" variety. I recall a presentation, delivered at a DB2 user group meeting several years ago, in which a developer described a new application that his company had implemented entirely by way of triggers. The number of triggers created for that application was fairly large, and I'm thinking that quite a few triggers were defined on certain individual tables. The application was successfully put into production, and everything worked fine, so having a lot of triggers is not necessarily a bad thing. On the other hand, there are situations in which triggers can affect application performance in an undesirable way. In that regard, the story has gotten better in recent years, certainly on the mainframe platform. Triggers were introduced with DB2 Version 6 for z/OS (the functionality had previously been delivered for DB2 on Linux, UNIX, and Windows servers), and in that and the subsequent release the presence of a trigger defined with UPDATE OF COL5 on a table increased the CPU cost of any UPDATE statement targeting the table, even if the statement did not change data in column COL5. That trigger cost was eliminated in DB2 for z/OS Version 8, so that the aforementioned trigger would affect the performance only of UPDATE statements that changed data in COL5.

So, continuing with this example, how would the performance of a COL5-changing UPDATE statement be impacted by the UPDATE OF COL5 trigger? That would depend, of course, on the nature of the triggered action (i.e., the SQL statement executed as a result of the trigger being "fired" by the UPDATE). If the triggered action is an update of one row in one table, identified by a unique, indexed column referenced in a predicate, the impact of the trigger on the performance of COL5-changing UPDATE statements is likely to be minimal. If, on the other hand, the triggered action were more involved (and keep in mind that it could be a call to a stored procedure), the affect of the trigger on COL5-changing UPDATE statements would be more noticeable. The key here is to keep in mind that the action taken when a trigger is fired is synchronous with respect to an SQL statement that causes the trigger to fire. In other words, the trigger-firing SQL statement isn't finished until the triggered action is finished. This means that there are performance implications for "downstream" triggers that might be fired as a result of the initial trigger being fired (a trigger defined with UPDATE OF COL5 on table ABC could drive an update of COL7 on table XYZ, and that triggered action would fire a trigger if one were defined with UPDATE OF COL7 on table XYZ).

Does this synchronous business mean that triggers with more complex triggered actions are a performance no-no? Not necessarily. One way to have that cake and eat it, too, is to have the trigger place information of interest (e.g., certain column values) on a WebSphere MQ queue (a trigger can certainly do this - the triggered action has to be an SQL statement, and DB2 provides built-in functions, such as MQSEND, that can be used to send data to a designated MQ location). Once that's done, the statement that fired the trigger can complete execution. Asynchronously, with respect to the trigger-firing statement, the data sent to the MQ queue by the trigger can be processed as needed, perhaps by a DB2 stored procedure invoked by the MQ listener (the MQ listener function can automatically take an action, such as calling a stored procedure, when a message lands on a queue).

How do triggers stack up, in terms of CPU efficiency, with other means of getting database action X accomplished as a result of action Y being performed? Suppose that you have a need to capture "before" and "after" values in certain columns of table ABC when those columns are updated by programs. If program PROG1 updates the columns of interest in table ABC, you could request that the program be modified to insert into table XYZ "before" and "after" values following the table ABC updates. This approach might well be the most CPU-efficient way to address your requirement, but it could prove to be impractical for at least a couple of reasons. For one thing, who would code the requested PROG1 functional enhancement? Will that person - likely engaged now in some other high-priority application development effort - be available to change PROG1 to your liking within the next year? Maybe not. Then there's potential problem number two: what if the table ABC columns for which you want to capture changes are updated by multiple programs besides PROG1? Are you going to try to get change-capture functionality added to all of those programs? How long will that take? You could opt to use a vendor tool to detect and capture changes made to the specified columns of table ABC, but if such a tool isn't currently part of your IT infrastructure, how long will it take to acquire it and how much will it cost?

You could certainly determine that a trigger on table ABC defined with UPDATE OF [the columns of interest] would be the right way to go, offering a quickly implementable solution that would have a modest CPU cost and a very low dollar cost (or euro cost or whatever-currency cost ). And, consider this: if programs that update the table ABC columns in which you are interested are so response-time sensitive that even adding a fairly simple trigger to the mix raises performance concerns, having that trigger defined on a data warehouse table (or operational data store table) to which table ABC changes are propagated might do the trick for you.

Flexibility, agility, and economy - that's what DB2 triggers offer. They should definitely be solution candidates when you have a need for timely implementation of incremental database application functionality.

Previous Posts