Robert's Blog


Friday, August 14, 2009

DB2 for z/OS Stored Procedures and Large Parameters

I recently fielded a question, from a DB2 person who works for an insurance company, about the performance implications of defining large parameters for DB2 for z/OS stored procedures. At this company, DB2 stored procedures written in COBOL had been created with PARAMETER STYLE GENERAL. Now, for the first time, a stored procedure with really large parameters (VARCHAR(32700), to be precise) was to be developed, and the DB2 person was concerned that the entire 32,700 bytes of such a parameter would be passed between the calling program and the stored procedure, even when the actual value of the parameter was an empty string (i.e., a string of length zero). He was thinking that the only way to avoid passing 32,700 bytes to a called stored procedure in cases when no value was to be assigned to a parameter would be to create the stored procedure with PARAMETER STYLE GENERAL WITH NULLS, and have the calling program set the value of the parameter to NULL by way of a null indicator.

I'll tell you that I had been under this same impression for some time, based on my reading of material in various DB2 manuals; however, it turns out that passing an unnecessarily large number of bytes between a calling program and a stored procedure is not a concern when the stored procedure is defined with PARAMETER STYLE GENERAL. I like the way that Tom Miller, a stored procedures expert with IBM's DB2 for z/OS development organization, explains this. He pictures a VARCHAR data type as being a box, in which you have two bytes for a length indicator and then N bytes of what you could call the defined data area. When a program calls a stored procedure that has a VARCHAR-defined parameter, only "active" data in the parameter is passed to DB2. The "inactive" portion of the VARCHAR "box" - that being the part of the defined data area past the current data length (as specified in the 2-byte length portion of the "box") - is undefined and therefore ignored. So, if a program sets the length of a VARCHAR-defined parameter to zero and then calls the associated stored procedure, only an empty string will be passed to DB2 (in other words, the entire VARCHAR "box" will NOT be passed).

This being the case, the choice of PARAMETER STYLE GENERAL or PARAMETER STYLE GENERAL WITH NULLS comes down to addressing the needs of the application developer. PARAMETER STYLE GENERAL WITH NULLS can result in slightly higher overhead versus PARAMETER STYLE GENERAL (it costs a bit more to pass a null value than it does to pass a zero-length VARCHAR), but PARAMETER STYLE GENERAL WITH NULLS can accommodate the full range of possibilities with respect to the values that can be passed with parameters, including the null value. In this light, PARAMETER STYLE GENERAL can be seen as an application simplification choice for programs that don't need to deal with null values.

I have to tip my hat to the DB2 person who sent this question to me, because before I was able to get back to him with a response, he had already executed a test that demonstrated the actual behavior, in terms of bytes passed, for a COBOL stored procedure created with a VARCHAR parameter (specified as INOUT, meaning that it would be used as both an input and an output parameter) and PARAMETER STYLE GENERAL (the calling program, in the case of this test, was another COBOL program, and it invoked the stored procedure by way of an SQL CALL - NOT a COBOL subroutine CALL). The calling program set the VARCHAR parameter to a long string of G's, then set the length of the VARCHAR to zero and called the stored procedure. The stored procedure displayed the length of the parameter and its contents, then set the value of the parameter to a long string of asterisks, set the length of the parameter to 5, and returned to the calling program. The calling program then displayed the length and contents of the parameter. Sure enough, the stored procedure displayed a parameter length of 0 (as set by the calling program) and no content, and the calling program (following return from the stored procedure) displayed a parameter length of 5 (as set by the stored procedure) and content consisting of 5 asterisks followed by a lot of G's (the 5 asterisks returned by the stored procedure overlaid the first 5 of the G's that had initially been moved to the VARCHAR "box" in the calling program).

[Note that there is a caveat with respect to parameters in a REXX program. These never arrive with any undefined portion (with any "air", as the previously cited Tom Miller of IBM puts it), as there is no concept of that in REXX variables.]

I hope that this information will be helpful to your own DB2 stored procedure development efforts. As I've mentioned in other posts to this blog, I'm very big on DB2 stored procedures, and I want people to use them effectively and successfully.

12 Comments:

Anonymous Nuno Job said...

Really cool information. DB2 is smart.

Good job, I'm sure other z/OS customers would love to know the answer to this question.

August 14, 2009 at 8:43 PM  
Blogger Selva said...

Good piece of information!... Thanks for sharing.

August 16, 2009 at 1:59 AM  
Blogger Robert Catterall said...

Thanks for the positive feedback!

Robert

August 23, 2009 at 7:02 PM  
Anonymous Anonymous said...

Good information as always, I am also in favor of Stored Procedures (most certenly the native ones) but only if they are used as such. What is your opinion/idea about using native SQL Stored procedures in replacement of SQL to externalize it from distributed JAVA applications?
Best regards, Eddy.

August 26, 2009 at 12:09 AM  
Blogger Robert Catterall said...

That's an interesting question, Eddy. I think that it involves a trade-off involving architecture and CPU efficiency. From an application architecture perspective, I very much like the idea of having Java programs (probably running on off-mainframe app servers) access DB2 for z/OS-managed data by way of stored procedure calls (and I also favor the use of DB2 V9 native SQL procedures for this purpose). This approach has several advantages:
* It lets the Java developers use JDBC to invoke the stored procedures and to access any returned output parameters and/or result set rows, without the security and performance concerns associated with dynamic "table-touching" SQL. Those concerns are eliminated when SQLJ is utilized, but in my experience, Java developers prefer JDBC over SQLJ.
* It abstracts the back-end database schema from the perspective of the Java programmer. A lot of object-oriented developers would just as soon not have to know the particulars of the row-and-column layout of the back-end database.
* It "de-couples" the data-serving code (stored procedures) from the Java programs that issue the data requests via stored procedure calls. With the SQL packaged in stored procedures, DBAs can make database design changes aimed at boosting application performance and/or functionality, and only the stored procedures dependent on the modified tables need to be updated - not the callers of the stored procedures.
* It promotes the coding of efficient SQL, as the developers of SQL stored procedures are likely to have SQL coding expertise.
* It promotes the reusability of SQL code.

Of course, there's no free lunch, and these flexibility and abstraction benefits can end up increasing CPU consumption on the mainframe DB2 data server, particularly for very simple transactions. If, for example, a data-accessing transaction issues only 1 or 2 SQL statements, it's possible that those two statements would consume less CPU if issued directly from the Java program, as opposed to being packages in a stored procedure, even if they are coded using JDBC (dynamic statement caching can take a big bite out of statement preparation costs, as can IBM's Optim pureQuery Runtime offering). When the number of SQL statements associated with a transaction (excluding FETCHes, which are local to the distributed client program when DB2 block fetch is in effect) gets into the mid-single digits and beyond, stored procedures tend to have the performance advantage.

So, absolutely optimal CPU efficiency from the data serving perspective would likely be achieved through a mix of both Java program-issued SQL statements and stored procedure calls.

What would I do? Well, I'd probably lean towards providing all DB2 data access through stored procedures (and again, I favor the DB2 9 native variety), for the sake of consistency and structure, and for the aforementioned benefits associated with SQL code reuse, database schema abstraction, and looser coupling of the data-serving and data-requesting layers of the application. I'd back away from that in a SELECTIVE way if I had to, and I might have to if the System z data server were CPU-constrained and if I had some very simple transactions (1 or 2 SQL statements) executed very frequently.

Hope this helps.

August 27, 2009 at 7:33 AM  
Anonymous Anonymous said...

Nice overview on the subject, the overview you made is exactly the reason why we would like to do it. By the way the question comes from an application architecture perspective. It would even be used for separate SQL statements and not only for several statements in one Stored Procedure and that is what I was the most afraid of. I work on the technical part of it and as you mentioned there's no free lunch.

Four your information, in the mean time I have been doing some simple performance measuring from JAVA (used for most of our online applications) and PLI (mostly for batch where it will not be with Stored Procedures). The Java applications all use parameter markers and have about 99% cache hit rate so the fact of having Static SQL instead of Dynamic will not help a lot.

What I did was repeating a lot of times, in separate programs, an identical SQL returning 1, 10, 200 and 1000 rows based on the same table with only different where clauses. I did this to try to minimize the other DB2 CPU usage so that I can see the real overhead between an SQL call and the same SQL via a stored procedure call.
For the PLI there is a large overhead as I suspected and it decreases the more rows you fetch. For the JAVA part I see some very strange things happening which I can't explain (I will open a PMR for it) depending on how I do the JDBC call. The numbers below are for the DB2 CPU times in seconds from a DB2/PM report and are by type of JAVA call I used to get the data for 1 row. I didn't take into account any CPU consumed by DB2 but not attributed to the thread. For prepareStatement there were two ways of coding it and I called them type 1/2 and I hope the table still remains a bit readable.
Java coding ! CP CPU ! ZIIP CPU !
-------------------------+--------+----------+
prepareStatement type 1 ! 24,140 ! 29,760 !
prepareStatement type 2 ! 9,870 ! 11,847 !
createStatement ! 6,471 ! 7,837 !
call of Stored Procedure ! 7,705 ! 9,247 !
If it is not a bug or something I missed then the call of a Stored Procedure is even less CPU then the best prepareStatement type.

Best regards, Eddy.

August 28, 2009 at 1:33 AM  
Blogger Robert Catterall said...

Apologies for the delayed response, Eddy.

That's interesting information. A couple of questions:

1) Is the third column "total" CPU (zIIP + general-purpose CPU)? I ask because the numbers look high to me if column 3 is zIIP-only CPU (i.e., I wouldn't expect > 50% offload onto zIIP for these actions).

2) In what language is the stored procedure written? Is it COBOL? DB2 9 native SQL procedure? Something else?

Robert

September 7, 2009 at 3:02 PM  
Anonymous Anonymous said...

No problem, people should even be thankful that you want to do the BLOG as you most certainly have other work to-do.

1) For the zIIP it is effectively zIIP only. I constantly have about 54% of the workload that is done on zIIP. For BI specific queries I even see sometimes up until 90% on zIIP, I suppose it is due to the parallelism but didn't have the time to check it (by the way this is not the only site where I have seen this higher then the 30% you can read from time to time).

2) The stored procedure is a DB2 9 native SQL procedure, we also use some PLI but that was not in the scope for this project.


In the meantime I also have some extra information about the differences. The prepareStatement type 1 overhead is due to the application not doing a close of the statement in JAVA. For the prepareStatement type 2 the higher CPU usage comes from the fact that the parameter marker was set in a string in the JAVA code instead of with a numeric field as defined in the table. This resulted in a SQLCODE -301 at first and then it is retried by the driver with a numeric field giving a SQLCODE of 0.
The new table is:
Java coding ! CP CPU ! ZIIP CPU !
------------------------------------+--------+----------+
prepareStatement type 2 ! 7,647 ! 9,339 !
createStatement ! 6,471 ! 7,837 !
call of Stored Procedure ! 7,705 ! 9,247 !
The prepareStatement type 2 is now a little bit better then the SP but sill more then createStatement 7,647 <> 6,471.
Best regards, Eddy.

September 8, 2009 at 1:43 AM  
Blogger Robert Catterall said...

Now I really have to apologize for the VERY late response on my part. I was super busy last week and the week before.

Your numbers are quite interesting. Indeed, the native SQL stored procedure CPU time looks really good versus the "type 2" prepare of the single statement issued via JDBC call. That very favorable comparison would have me advocating pretty strongly for the native SQL stored procedure approach. The slightly higher CPU cost of the native SQL stored procedure would seem to me to be justified in light of associated security and code re-use benefits.

Thanks very much for sharing this data. I know that for some DBMS platforms, stored procedures are recommended over client-issued SQL DML statements, even for very simple transactions. With the native SQL procedure functionality of DB2 V9, it looks as though this recommendation might be just as applicable to DB2 for z/OS.

September 20, 2009 at 6:46 PM  
Anonymous Anonymous said...

No problem, I am still awaiting the answer of IBM on the difference between "parameter marker" and "createStatement", the last I heard is that it comes due to a DESCRIBE that is done but I don't see why it is necessary only for the prepared ones.

I did more testing with a real life application and although the CPU used stays about the same (CP + zIIP CPU) there is a large difference due to the fact that the native SQL stored procedure uses much less zIIP CPU and unfortunately that is the one you need to pay for.

Best regards, Eddy.

September 20, 2009 at 11:43 PM  
Blogger Robert Catterall said...

In your test with the real-life application, was the native SQL procedure invoked via a local (to the DB2 subsystem) CALL (perhaps from a batch job or a CICS tran), or by way of a CALL that came through the DDF?

September 21, 2009 at 8:22 PM  
Anonymous Anonymous said...

It was all Java with e T4 connection through DDF. All the online is done via JAVA from a WAS server. I suppose IBM did a better homework and lowered the percentage zIIP they send to the level first intended also for dynamic statements :-).
When it is done via PLI (I did some sandbox testing) then the overhead is even larger and everything is on the CP, but is descending the more you read.

Best regards, Eddy.

September 22, 2009 at 12:20 AM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home