Robert's Blog


Wednesday, April 28, 2010

This blog has moved


This blog is now located at http://catterallconsulting.blogspot.com/.
You will be automatically redirected in 30 seconds, or you may click here.

For feed subscribers, please update your feed subscriptions to
http://catterallconsulting.blogspot.com/feeds/posts/default.

Using DB2 Stored Procedures, Part 3: Schema Visibility

This is the third of a three-part entry on various factors you could consider in determining how (or whether) DB2 stored procedure technology might be employed in your organization's application environment. In part one I described some advantages associated with client-side and server-side SQL and pointed out that, if you like your SQL on the database server side of things (and I do), DB2 stored procedures can help you go that route. In part two, I examined stored procedure usage from the static SQL angle, noting that stored procedures provide a way for client-side programs to easily -- and dynamically, if needs be -- invoke static SQL statements. In this final entry in the series, I'll look at stored procedures as a means of limiting database schema visibility in an application development sense.

"Schema" clarified. In a strictly DB2 context, the term "schema" essentially refers to a categorizing mechanism -- a way to logically set apart a set of set of database objects through the use of a common high-level qualifier in the objects' names. So, for example, I could have a REGION4 schema that might include tables, views, and stored procedures -- all with a high-level qualifier of REGION4 (table REGION4.SALES, stored procedure REGION4.NEW_ORDER, etc.). Schemas are handy because they allow me to code SQL statements with unqualified object names and to use these statements with a particular set of objects by supplying the schema name at package bind time (for static SQL) or via the CURRENT SCHEMA special register (for dynamic SQL).

With that said, this is NOT the way I'm using "schema" in this blog entry. Instead, I'm using the term in the more general sense, as a reference to the design of a database: its tables, columns, relationships, etc. There was a time, not so long ago, when people developing application programs that would retrieve or change DB2-managed data had to know a good bit about the design of the target database. In recent years, application architecture has shifted from monolithic to layered and service-oriented, and today's applications often have loosely-coupled tiers that broadly concern user interface (UI), business logic, and data access functionality. In such an application environment, you should think about how "high up," within an application's functional layers, database schema visibility should extend. To put it another way: at what levels within the functional stack should programmers be require to have knowledge of the underlying database design?

Who should know what? This isn't a question of turf I'm talking about -- it has nothing to do with database people saying, "Hey, developers! That's MY knowledge, and I'm not going to share it! Nyah, nyah!" Some of the more vocal advocates of database schema abstraction are application architects. I recall overhearing a top-notch programmer complaining to a colleague that visibility of the database schema extended all the way to the UI layer of an application in development. It wasn't that he couldn't deal with this -- he had excellent SQL skills, and he knew a lot about database design himself. The crux of this guy's argument was that he shouldn't HAVE to be concerned with details of the database design when coding at a higher level of the application.

That programmer is not a whiner. He has a good point. For many application architects, the database schema is "plumbing" about which most developers shouldn't be concerned. If coders can retrieve, manipulate, and persist data without having to know a lot about the design of the underlying relational database, they can focus more of their attention on providing functionality needed by the business and by application users, thereby boosting their productivity. DB2 stored procedures can facilitate database schema abstraction: they provide data access services needed by (for example) business-layer application programmers, and they limit the requirement for database design knowledge to the people who develop the stored procedures. [I should point out here that some people consider stored procedures themselves to be part of the schema of a database, and that, as such, they should also be abstracted from the perspective of a consumer of stored procedure-provided services. These folks would argue that stored procedures should be invoked by higher-level programs not through SQL CALLs, but by way of something like Web services calls. A DB2 stored procedure can indeed be exposed as a Web service, and going with this approach would involve weighing the benefit of more complete schema abstraction against the additional path length introduced by the Web services call on top of the SQL CALL for stored procedure invocation.]

A benefit for DBAs, too. When business-logic developers code to the schema of a database, the flexibility that DBAs have to effect performance-enhancing database design changes can be severely limited. I was once involved in a discussion among some DB2 DBAs about a database design change that had the potential to significantly improve the CPU cost-efficiency of a key application process. The proposed modification was ultimately shelved because one of the DB2 tables that would be redesigned was accessed by something like 1000 different application programs, and changing all that code was not a feasible proposition. When stored procedures provide access to the database, schema changes tend to be more do-able -- yes, code in affected stored procedures has to be altered, but the scope of this effort will often be less than that which would be faced if business-logic programs directly referenced database tables. If a given stored procedure services 10 business-logic programs, changing the one stored procedure to accommodate a database schema change looks better to me than modifying the 10 business-logic programs that would have to be updated were the stored procedure not serving their data access needs.

And don't forget security. Database schema information -- table names, column names, entity relationships -- can be of value to someone who wants to gain access to your organization's data in an unauthorized manner. The more limited the dissemination of database design details, the less likely it is that this information will be used for malicious purposes. If your DB2 database is accessed by way of stored procedures, only stored procedure developers need detailed knowledge of the database schema. It's a good risk mitigation move.

So, there's my reason number three for using DB2 stored procedures: they enable database schema abstraction, which 1) allows business-logic and UI developers to focus less on application "plumbing" and more on application functionality, 2) provides DBAs with more flexibility in terms of implementing performance-improving database design changes, and 3) limits the spread of detailed database design information that could potentially be used by someone with ill intent. I hope that you'll find a reason to put DB2 stored procedures to work at your organization.

Monday, April 19, 2010

Using DB2 Stored Procedures, Part 2: The Static SQL Angle

This is the second of a three-part entry on various factors you could consider in determining how (or whether) DB2 stored procedure technology might be employed in your organization's application environment. In part one I described some advantages associated with client-side and server-side SQL and pointed out that, if you like your SQL on the database server side of things (and I do), DB2 stored procedures provide a very useful means of going that route. In this entry I'll examine stored procedure usage in a static SQL context. In part three, I'll look at how your use of stored procedures might be influenced by your desire for more, or less, database schema visibility from an application development perspective.

The pros of static SQL. Static SQL is, as far as I know, an exclusively DB2 concept, so if you're kind of new to DB2 you might want to check out the blog entry that I posted a few weeks ago, on the basics of SQL (including static SQL) in DB2-accessing programs. Those of you who are familiar with static SQL probably know that it delivers two important advantages versus dynamic SQL:
  • Better performance -- In particular, I'm talking about the database server CPU consumption aspect of application performance. The CPU efficiency advantage delivered by static SQL is, of course, based on the fact that preparation of static SQL statements (referring to preparation for execution by DB2, which includes things such as data access path selection by the DB2 optimizer) is accomplished before the statements are ever issued for execution by the associated application program. This is important because for some simple, quick-running SQL statements, the CPU cost of statement preparation can be a several times the cost of statement execution. Can dynamic statement caching reduce the database-server-CPU-cost gap between static and dynamic SQL? Yes, but even if you have repeated execution of parameterized dynamic SQL statements (so as to have a high "hit ratio" in the prepared statement cache), generally the best you can hope for is database server CPU consumption that approaches -- but doesn't match -- the CPU cost of an equivalent static SQL workload (the effectiveness of dynamic statement caching gets a boost with DB2 10 for z/OS, which provides for better prepared statement cache matching for dynamic SQL statements that include literal values -- versus parameter markers -- in query predicates).
  • More-robust security -- For a dynamic SQL statement to execute successfully, the authorization ID of the application process issuing the statement has to have read (for queries) and/or data-change privileges on the tables (and/or views) named in the statement. Such is not the case for a static SQL statement. All that is needed, authorization-wise, for successful execution of a static SQL statement is the execute privilege on the statement-issuing program's DB2 package.
Now, I'll concede that there are times when dynamic SQL is a very good choice for an application program. A developer at one of my clients had to write a program that would retrieve information from a DB2 database according to a combination of search arguments that a user could enter on a screen. There were quite a few search options, and coding a static DECLARE CURSOR statement for every possible combination would have been a major piece of work. The developer opted instead to have his program build the SELECT statement dynamically, based on the choices entered by the user. This was a good choice, as it enabled him to deliver the needed functionality in a much more timely manner than would have been the case had he opted for the static SQL approach. Still, with exceptions such as this one duly noted, my preference is to go with static SQL in most cases, especially in OLTP and batch application environments (dynamic SQL often dominates in a data warehouse system, and that's OK).

Can static SQL statements be issued from client-side application programs? The answer to that question depends on the programming language used. If a client-side program is written in assembler, C, C++, COBOL, Fortran, PL/I or REXX, it can issue static SQL statements (for assembler and PL/I, the client platform has to be a mainframe). If the language is Java, static SQL can be used by way of SQLJ. Another Java option for static SQL is IBM's Optim pureQuery Runtime product, which can also enable the use of static SQL when client-side programs are written in VB.NET or C#. If the language used on the client side is Perl, Python, PHP, or Ruby, the only possibility is dynamic SQL.

This is where DB2 stored procedures come in. Suppose you want to use static SQL for your application, but you want to use, say, Ruby on the client side? Or, suppose you're using Java on the client side, but your Java developers don't want to use SQLJ (and you don't have Optim pureQuery Runtime)? Well, how about putting the static SQL in DB2 stored procedures, and having your client-side programs invoke these by way of dynamic CALLs? You get the performance and security benefits of static SQL, and your client-side developers get to use their language (and data-access interface) of choice. Everybody's happy!

That's reason number two for using DB2 stored procedures: they provide a means whereby static SQL can be easily -- and dynamically, if needs be -- invoked by client-side programs. Stop by the blog again soon for my reason number three.

Friday, April 16, 2010

Looking Forward to DB2 by the Bay

That would be Tampa Bay, and I'm talking about the International DB2 Users Group North America Conference that will be held May 10 - 14 at the Tampa Convention Center (you can get all the details at IDUG's Web site).

I've participated in every IDUG North America conference since 1997, and I've always found these events to be very well worth my time and dollars. I'll be delivering a presentation on mainframe DB2 data warehousing at this year's conference, and I'll also participate, along with Paul Zikopoulos of IBM's DB2 for Linux/UNIX/Windows (LUW) development organization, in a "Face 2 Face" interactive-discussion session on DB2 Data Warehousing (such sessions were known in the past as SIGs, or Special-Interest Groups).

Why do I attend the IDUG North American Conference every year? There are lots of reasons. For one thing, I really enjoy hearing the latest about DB2 technology from folks in the IBM DB2 development organization -- people like Curt Cotner, IBM Fellow and DB2 Chief Technology Officer; Terry Purcell, Mr. DB2 for z/OS Optimizer; Jeff Josten, from whom I've learned so much about DB2 data sharing; John Campbell, who brings a wealth of lessons learned working with early implementers of new DB2 for z/OS releases; Guy Lohman, a Big Thinker (and doer) from IBM's Almaden Research Center; Matt Huras, Chief Architect of DB2 for LUW; Chris Eaton, who always delivers a ton of DB2 for LUW "news you can use"; and Leon Katsnelson, DB2 for LUW jock and cloud computing savant (you can see the whole conference schedule on the IDUG Web site).

In addition to learning lots from IBM's DB2 top guns, I get a boatload of great information from fellow DB2 consultants who present at the conference: Bonnie Baker, Dave Beulke, Sheryl Larsen, Susan Lawson, Dan Luksetich, and Fred Sobotka, just to name a few. Also among the speakers are professional DB2 instructors like Themis's David Simpson, and technical experts from leading vendors of DB2 tools, such as Phil Grainger from Cogito, Steen Rasmussen from CA, and Rick Weaver from BMC.

And then you have the user presentations. These are what really make IDUG a special conference. You can't beat the in-the-trenches experiences shared by Dave Churn of DST Systems, Rob Crane of FedEx, John Mallonee of Highmark, Bernie O'Connor of Anixter, Bryan Paulsen of John Deere, Billy Sundarrajan of Fifth Third Bank, and others who work where the DB2 rubber meets the road.

Of course, I also learn plenty thanks to encounters in the "coffee track" -- a reference to the refreshment and meal breaks during which I'm likely to get in on conversations amongst people who are dealing with challenges and issues that are of particular interest to me. These networking opportunities alone are almost worth the price of admission. The learning continues in the exhibitors hall, where I can catch up on the latest DB2-related offerings from a wide variety of vendors (and pick up a few t-shirts, pens, and notebooks, to boot).

As if all that weren't enough, the IDUG North American Conference is a great place to take a DB2 certification exam (free for attendees) and get some hands-on at any of several lab sessions.

The location's a winner, too. IDUG negotiated a special rate at the Marriott Waterside Hotel, right next to the convention center, and some of those discounted rooms are, I think, still available. Tampa's a great town. Personally, I like to get up in the morning and take a run along Bayshore Boulevard, just outside of downtown. Here you'll find the world's longest continuous sidewalk (six miles), so there are no worries about cars -- just cruise along with the waters of Tampa Bay on your left (if outbound) and the beautiful historic homes of the Hyde Park neighborhood on your right. There's plenty of nightlife in Ybor City, a short streetcar ride away. Great Cuban food, fresh seafood, palm trees -- all this, and a great conference, too? Who wouldn't want to be there? I hope some of you will be able to attend. Find me and say hi if you do.

Sunday, April 11, 2010

Using DB2 Stored Procedures, Part 1: Where Do You Want Your SQL?

Over the past few weeks, I've had a number of interesting discussions with people from various organizations on the subject of DB2 stored procedures. These conversations have generally involved questions pertaining to the use of stored procedures in a DB2 environment. The question, "How should we use DB2 stored procedures?" will be answered differently by different organizations, based on their particular requirements and priorities. That said, their are some
factors that ought to be considered in any review of DB2 stored procedure utilization. I'll cover these factors in a three-part entry beginning with this post, which will focus on the issue of client-side versus server-side SQL ("server" here referring to a database server, versus an application server). In part two I'll take a look at static versus dynamic SQL, and in part three I'll examine the issue of database schema visibility.

Some advantages of client-side SQL: I've heard various opinions regarding client-side versus server-side SQL (and from the DB2 data server perspective, "client-side" will usually refer to programs, running in an application server, that access a DB2 database). Proponents of having SQL data manipulation language, or DML, statements (i.e., SELECT, INSERT, UPDATE, and DELETE) issued by client-side programs cite, among other things, these benefits:
  • Data-consuming and data-retrieval programs can be implemented in a single, client-side deployment -- This as opposed to a client-side deployment of data-consuming programs and a server-side deployment of associated data retrieval programs. A "one-side" deployment can indeed require less in the way of coordination, with maybe one programming team involved, versus two, and one platform team -- of application server administrators -- with primary responsibility for system management (though it's hoped that DB2 DBAs would be in the loop, as well).
  • Data-consuming and data-retrieval programs are likely to be written in the same language. It's true that client-side and server-side programs are often coded using different languages, especially when the application and database servers are running on different platforms (it's common, for example, to have sever-side SQL embedded in COBOL programs when the DB2 server is a mainframe, while data-consuming programs might be written in Java and might run on a Linux-based application server). When different programming languages are used on client and data-server platforms, communication between the respective development groups can be a little more challenging than it otherwise would be (particularly when an object-oriented language is used on the one side and a procedural language on the other).
Server-side SQL pluses: People who like their SQL on the server side favor this approach for several reasons, including the following:
  • It makes it easy to use static instead of dynamic SQL. I'll cover static SQL and its connection with stored procedures in part two of this three-part entry. For now, I'll point out that, compared to dynamic SQL, static SQL generally delivers better performance (especially in terms of database server CPU time) and provides for more robust security (an authorization ID does not have to be granted access privileges on database tables in order to execute static SQL statements). Can static SQL be issued from client-side programs? Of course, but it's not always a straightforward matter, as I'll point out in my part-two post.
  • It can increase the degree to which data-access code is reused. When data-access code is packaged in server-side programs, it's generally pretty easy to invoke that code from data-consuming programs written in a variety of languages and running on a variety of platforms (including the platform on which the DB2 data server is running). Is this technically possible when the data-access code is running on an application server? Yes, but -- as is the case regarding the use of static versus dynamic SQL -- I believe that data-access code reuse is a more straightforward proposition when that code runs on the data server.
  • It makes it easier to utilize developers with different programming language skills for the same application project. Admittedly, this is only an advantage if your organization HAS groups of developers with expertise in different programming languages and WANTS to use them for the same project. If that is indeed your situation, server-side SQL could be the ticket for you. It's quite common for data-consuming programs written in languages such as Java and C# to invoke data-access programs written in COBOL or SQL (the latter being a reference to SQL stored procedures).
  • It makes it easier to leverage the talents of skilled SQL coders. SQL becomes more and more rich with each release of DB2 (one indicator being the number of pages in the DB2 SQL Reference). That's a good thing, but it also means that SQL mastery becomes a taller order as time goes by. I've seen in recent years the emergence of professionals who consider themselves to be SQL programmers (this as opposed to, say, Java programmers who know SQL). I feel that server-side SQL facilitates the leveraging of individuals who have top-notch SQL skills.
  • It improves CPU efficiency for transactions that issue multiple SQL DML statements. When SQL DML statements are issued from client-side programs, there are network send and receive operations associated with each such statements. These network trips increase application overhead when transactions issue multiple SQL DML statements -- overhead that is reduced when a client-side program invokes a server-side program that issues the multiple SQL DML statements locally to the DB2 server.
The stored procedure angle: If you like your SQL on the DB2 server side of an application (I do), stored procedures provide a great way to go this route. They can be written in a number of languages (including, as mentioned, SQL), and they come with a nice familiarity factor, in that the stored procedure pattern is well-known to many client-side application developers, including those who have worked with DBMSs other than DB2. Note that a DB2 stored procedure
program can declare and open a cursor in such a way that the result set rows can be fetched by the calling client program -- a useful feature for set-level data retrieval operations. [Mainframers should be aware that native SQL procedures, introduced with DB2 9 for z/OS, can run substantially on zIIP engines when called by DRDA clients through the DB2 distributed data facility -- a prime opportunity to put cost-effective zIIP MIPS to good use.]

So, reason number one for using DB2 stored procedures: they are an excellent choice for the packaging of SQL statements in database server-side programs. My reasons two and three for using DB2 stored procedures will be explained in parts two and three of this three-part blog entry.