Robert's Blog


Wednesday, October 3, 2007

Pros and Cons of Dynamic SQL

Static versus dynamic SQL is a debate that's gone on for years in DB2 circles. I'm weighing in on the matter with this post because the nature of the debate has changed recently in some interesting ways. I'll be writing primarily from the DB2 for z/OS perspective, but many of the issues are the same, or nearly so, in DB2 for Linux/UNIX/Windows (LUW) environments.

When dynamic SQL is the topic of a discussion, you typically have application developers on the one side ("Let us use dynamic SQL!") and DB2 systems programmers and DBAs on the other side ("Sure - as soon as pigs learn to fly"). The appdev people are usually seeking a programming productivity boost, while the systems folks see dynamic SQL as being a risk factor with respect to their twin goals of providing a highly-available and CPU-efficient DB2 subsystem. Both sides have valid points to make, and chances of reaching agreement are enhanced if each group makes an effort to understand the motivation behind the other group's arguments for and against the use of dynamic SQL.

The sysprogs and DBAs often argue that dynamic SQL will consume too many CPU cycles on the DB2 data-serving platform, compared to the same statements executed as static SQL. Let's look at this issue more closely. If the concern is over the cost of preparing a dynamic SQL statement (a cost incurred at package bind time for static SQL), it's worth considering the nature of the dynamic SQL statements that would be issued by an application program. If the statements are expected to be very short-running, the CPU cost of statement preparation could indeed be several times the cost of statement execution (unless DB2 dynamic statement caching is utilized - more on this momentarily). It could be, for example, that a dynamic SQL statement consumes half of a CPU millisecond in execution, and two CPU milliseconds in preparation. If, on the other hand, the dynamic SQL statements are expected to be relatively long-running, statement preparation cost could pale next to execution cost, making the CPU consumption of statement preparation pretty much a moot point (imagine a dynamic SQL statement that consumes a whopping 10 CPU milliseconds in preparation and then uses 5000 CPU milliseconds - 5 CPU seconds - in execution).

Sometimes, the DBAs and sysprogs are concerned that DB2 CPU efficiency could be negatively impacted by a "runaway" dynamic SQL statement. Here, it's important to make the distinction between dynamic SQL that is ad-hoc (i.e., statements will be very much unpredictable with respect to form and structure) and dynamic SQL that is not a mystery at all because it is written by programmers (and quite possibly reviewed by DBAs prior to program implementation). This not-ad-hoc stuff is what I call structured dynamic SQL. It might be that an application enables an end user to supply search arguments that will be used in a query's predicates by typing values into some or all of a group of fields on a screen. To handle the various possible predicate combos, the programmers could code a bunch of DECLARE CURSOR statements, with one being issued based on a user's supplied values, or they could write code that would dynamically build the statement string based on the user's input and then pass that statement string to DB2 for preparation and execution. That's predictable dynamic SQL, and statements can be tested before program implementation because structure is known ahead of time (based on possible user inputs, the statements will have a common overall form and structure will come from a definable set of alternatives based on supplied predicate values).

Another example of structured dynamic SQL involves the use of a database-access API such as JDBC (Java Database Connectivity) or ODBC (Open Database Connectivity) that cause DB2 to process the resultant SQL statements dynamically. In such cases, the structure and form of the SQL statements is certainly known beforehand, and unpredictability is virtually a non-issue.

"Whoa," say some sysprogs and DBAs, "Even when we know what an incoming dynamic SQL statement will look like, we could still have unpredictability because a statement might get a different access path from one execution to another." Technically, that's true, given that the statements are prepared at each execution (again, deferring discussion of dynamic statement caching), and the catalog statistics on which the DB2 optimizer makes its access path decisions could change from one issuance of a statement to another. Here, as in many cases, it's important to dig deeper before letting access path change concerns be a dynamic SQL deal-breaker. First of all, if an access path changes because of updated catalog stats, the change should be for the better (DB2 makes better decisions when catalog stats are more up-to-date and accurate). Second, some statements are very unlikely to get a different access path, even after catalog stats changes. If, for example, a structured dynamic SQL statement uses an index to retrieve a row based on the match of a unique index key, that path is highly to differ from one execution of the statement to another.

Resource contention caused by dynamic SQL is another concern frequently cited by DB2 DBAs and sysprogs. A key worry is often the S-lock taken on a DBD (database descriptor) as a result of a dynamic SQL statement targeting one or more tables within a DB2 database (and here I'm using "database" in the narrow DB2 for z/OS sense: a user-specified set of one or more tables and related objects that comprise a subset of all the tables defined within aDB2 subsystem). This S-lock will block DDL statements (DROP, CREATE, ALTER) issued at the time against one or more of the objects in that same database. OK, again, some digging is advisable. How often, and when, are DDL statements issued in the production environment? How long do the dynamic SQL statements typically run? Can objects accessed by the dynamic SQL statements be isolated within a certain database or databases within the DB2 subsystem (and likewise, can objects frequently targeted by DDL statements be isolated in smaller databases) so as to minimize DBD contention?

What about contention with utility processes, such as online REORGs of indexes and tablespaces? This should not be an issue, since IBM DB2 utilities utilize drain locking to gain restricted access to objects, and application processes that issue dynamic SQL statements must release claims (whether read or write) on objects at commit points - just as static SQL-issuing processes must do.

How about security? That's certainly an important consideration. When dynamic SQL is used, the DB2 authorization ID of a process must have the table access privileges (SELECT, INSERT, UPDATE, DELETE) needed for successful statement execution, whereas successful static SQL execution requires only that the DB2 auth ID of the application process have the EXECUTE privilege on the package containing the SQL statements. One oft-used solution here is to package the "table-touching" SQL statements in DB2 stored procedures which can be invoked using JDBC or ODBC or whatever database access API the application programmers want to use. When that's not an option (as when a program will dynamically build an SQL statement string based on user-supplied predicate values), some organizations will have programs provide an application-specific DB2 authorization ID that is not equal to any individual's ID - in that case, data security becomes mostly an application-side responsibility.

There are times, of course, when the DB2 DBAs and sysprogs have to go with dynamic SQL. The classic example here is a purchased application such as SAP or PeopleSoft that uses what DB2 sees as dynamic SQL because the application can work with different DBMSs and static SQL is pretty much a DB2-exclusive concept. When that is the case, the usual response is to heavily leverage DB2 dynamic statement caching so as to slash statement preparation costs. The amount of virtual storage an EDM pool can consume when dynamic statement caching is used with a big application (could be several hundred megabytes) used to worry some DB2 systems programmers, but that worry faded when DB2 for z/OS (starting with Version 6) allowed the part of the EDM pool used for caching prepared dynamic SQL statements to be kept in a z/OS data space, and just about disappeared when DB2 for z/OS Version 8 delivered 64-bit virtual storage addressing.

An important point about dynamic statement caching: it's of very little value if the dynamic SQL statements in question are not parameterized. That is to say, if predicate values are literals plugged directly into statement strings, dynamic statement caching won't do much to help reduce statement preparation costs unless the plugged-in predicate values hardly ever change. It's a much better coding practice to use parameter markers where predicate values would go, and then to supply the substitution values when the statement is issued by the program. Off-the-shelf software providers know this and code accordingly. When structured dynamic SQL is to be used in your environment, parameterized statements should be a requirement of the application programmers.

Here's a new reason for considering structured dynamic SQL: the emergence of application enabling tools that map the object-centric data view of a modern application programmer with the schema-centric data view of a DBA. Through a tool of this nature, an application developer can request an object ("object" in this context meaning an informational instantiation of something - one example would be information about a particular customer), probably by calling a Web service. The request is transformed by a server-side piece of code into an SQL statement, and the desired information is returned to the requesting client program. The client program might update the object and then invoke a Web service to save the modified object. The aforementioned tool transforms the save request into the SQL statement or statements that will "harden" the data changes associated with the object modifications to the back-end database. The SQL statements sent to the data server are dynamic, and that might give some DBAs and sysprogs pause; however, by enabling business-logic tier application programmers to access and manipulate information in the form of objects, an object-schema mapping tool could significantly improve developers' ability to quickly respond to calls for new functionality spurred by market changes or new opportunities. This would enhance the organization's agility, and that might be worth the expenditure of some extra CPU cycles to process dynamic SQL statements.

The bottom line is this: programmers who request the use of dynamic SQL to boost their ability to quickly develop application functionality generally know what they're talking about, and the same is true of sysprogs and DBAs who want to handle dynamic SQL in such a way as to protect and make good use of an organization's DB2 data-serving resources. When people in these two groups make the effort to understand each others' objectives and concerns, a way can often be found that will satisfy all stakeholders. Don't stand behind outdated edicts. Dig into the issues and put your knowledge and creativity to good use.

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home