Robert's Blog

Monday, September 7, 2009

OLTP and BI on the Same DB2 for z/OS System (Part 2)

A few days ago, I posted part one of a 2-part entry on the subject of using the same DB2 for z/OS system (meaning a single logical DB2 system image - it could be a multi-subsystem DB2 data sharing group on a parallel sysplex) for both OLTP and business intelligence (BI) workloads. In that entry I focused on minimizing OLTP-BI workload contention on a number of levels: the disk subsystem, the DB2 buffer pools, DB2 locks, and CPU. With regard to that last contention category, I mentioned that "an important aspect of managing CPU contention between OLTP and BI applications running on the same system is the management of DB2 query parallelization, particularly as it pertains to the BI queries." In this part 2 post, I'll expand on that statement.

DB2 query parallelization is a very good thing when it comes to improving the performance of queries that involve scanning large numbers of data and/or index pages. In case you're not familiar with this DB2 feature or you need a little refresher, query parallelism has been around for quite some time, having been delivered in the mid-1990s with DB2 for z/OS Version 4. The technology enables DB2 to take a particular query and - on determining that parallelization would be beneficial for run-time reduction - split it into several tasks that can be executed concurrently on different engines within a mainframe server (or even on several servers, if we're talking about sysplex query parallelism in a DB2 data sharing group). Depending on the nature of the query, DB2 might start returning rows to the requester as they are qualified by the split queries, or the split-query result sets may be consolidated before any rows are returned (as when a result set sort is required or an aggregate function such as SUM is utilized). The larger the number of pieces into which a query is split, the greater the potential is for better response time.

Generally speaking, parallelized queries split along tablespace partition lines, so greater degrees of parallelism can be expected when target tablespaces have a lot of partitions (assuming that qualifying rows will come from multiple partitions) and the mainframe server has a pretty good number of fast CPUs (the number of CPUs is, of course, NOT an upper bound on the degree of query parallelization, as the split queries are likely to be I/O bound and the CPU portion of these can be interleaved on one engine as I/O wait events occur). So, into how many pieces might DB2 split a query? The answer to that question depends in large part on the setting of two DB2 ZPARMs (i.e., subsystem-level parameters): CDSSRDEF and PARAMDEG.

CDSSRDEF specifies the default value of the CURRENT DEGREE special register for a DB2 subsystem. This value will be 1 if you don't change it, and that means that a DYNAMIC query will not be parallelized by DB2 unless it is preceded by the SQL statement SET CURRENT DEGREE = 'ANY' (a static SELECT statement will be a candidate for parallelization if it's associated with a package bound with the DEGREE(ANY) specification). This default value for CDSSRDEF is the right one for many - and perhaps most - situations because it gives you statement-level control over the use of parallelization by DB2 for dynamic queries (in a BI environment queries tend to be dynamic). Making ALL dynamic queries candidates for parallelization by setting CDSSRDEF to ANY would increase CPU overhead for your BI workload. Why? Because DB2 would have to consume extra cycles just to determine whether or not parallelization would be beneficial for each and every dynamic query. When the determinations is "no, it would not" (as would likely be the case for a query targeting a non-partitioned table or for a query that would retrieve rows from one partition of a partitioned table), that extra CPU consumption in query optimization will not yield a benefit in terms of query execution time.

That said, sometimes a specification of ANY for CDSSRDEF is necessary for dynamic query parallelization, because the BI queries may be generated by PC-based end-user tools that do not allow for insertion of a SET CURRENT DEGREE = 'ANY' statement. If you have a DB2 data sharing group and your BI queries and your OLTP transactions run on different members, you can have CDSSRDEF = ANY on the BI-supporting subsystem (or subsystems), and CDSSRDEF = 1 on the OLTP-supporting DB2 members. If you have a single DB2 subsystem on which you run OLTP and BI work, what should you do if SET CURRENT DEGREE = 'ANY' is not an option for the BI queries? I'd lean towards setting CDSSRDEF to ANY, and then limiting the degree of parallelization for queries through the PARAMDEG specification in ZPARM. The default value of this parameter is 0, and that means that DB2 will determine the degree of parallelization for a query that it decides to split. I like that default because I feel that DB2 does this well and z/OS is very good at managing a complex and dynamic workload (as when it throttles down the processing resources allocated to a parallelized query in order to accommodate new work entering the system). If, however, an OLTP workload were running on the same DB2 subsystem as the BI workload, I'd want to put a relatively low upper bound on the degree of parallelization for dynamic queries, the better to deliver consistent response times for the OLTP transactions. I might go for something as low as 3 or 4 for PARAMDEG, so that I'd get some significant (if not huge) run-time reduction for some of the BI queries while limiting variations with respect to OLTP transaction execution times.

In addition to placing an upper bound on query parallelization when running OLTP and BI work on the same DB2 subsystem, you might want to think about limiting query parallelization to only a portion of the dynamic queries that run on your system. The DB2 for z/OS resource limit facility (RLF) provides a way to do this. What you can do is create a resource limit specification table (RLST) in which you put one or more rows with a value of '4' in the RLFFUNC column (this disables query parallelism) and the names of packages for which you DO NOT want associated dynamic queries to be parallelized in the RLFPKG column (in the LUNAME column of this RLST, you can have a blank value for the local location, or PUBLIC for TCP/IP-connected remote requesters). In addition (or instead), you could disable query parallelism by authorization ID using the AUTHID column of your RLST. Then you can set CDSSRDEF to ANY and know that dynamic queries associated with packages and/or auth IDs specified in your RLST rows will not be candidates for parallelization.

In summary: take advantage of DB2 query parallelization for your BI queries, but use it conservatively when you have OLTP transactions running on the same subsystem.


Post a Comment

Subscribe to Post Comments [Atom]

<< Home