OLTP and BI on the Same DB2 for z/OS System (Part 1)
I recently had the opportunity to work on an interesting challenge, involving the use of one DB2 system for both an online transaction processing (OLTP) and a business intelligence (BI) workload. Now, the system in question happened to be a DB2 data sharing group running on a parallel sysplex, and that opens up some interesting possibilities, but before getting to that I'd like to deal with the overall issue: can you successfully support OLTP and BI applications using the same mainframe DB2 system?
Some people would be pretty quick to answer that question with a negative, and the chief concern of those folks might be the unfavorable impact of BI queries on the performance of online transactions. Is that a valid concern? It certainly can be, depending on the extent to which one can or cannot eliminate areas of contention between the two workloads. Put another way, if OLTP- and BI-related SQL statements can execute without interfering with one another, use of one DB2 system for both workloads is a viable option. In this post, I'm going to briefly describe some sources of single-system OLTP and BI workload contention, and ways in which such contention can be either eliminated or at least largely mitigated. In a Part 2 entry that will follow in a few days, I'll zero in on the challenge of managing query parallelization in a single OLTP- and BI-supporting DB2 environment. On now to potential sources of OLTP-BI interference:
The disk subsystem. This, of course, has very much to do with whether or not OLTP- and BI-related SQL statements are targeting the same DB2 tables. You might be thinking, "Well of course they wouldn't - you'd have one set of tables for OLTP programs and another set for the query and reporting applications." I'll agree that this is the PREFERRED scenario versus having OLTP and BI users access the same tables. I'd certainly want to place the OLTP database objects and the BI objects on two different sets of disk volumes. That said, some mainframe DB2 shops do, in fact, use the same tables for OLTP and for query/reporting. It's not that they're crazy. They may have had to give query and reporting capabilities to a group of users in a quick and inexpensive way, and allowing online access to the OLTP tables might have been seen as the best near-term alternative. Working in these folks' favor is the fact that disk-level contention, while definitely not a non-issue, is not as severe a performance concern as it was back in the 1990s, before people had gigabytes of cache memory on their disk control units and before DB2 and System z had 64-bit addressing (buffer pools can be MUCH larger in DB2 V8 and V9 subsystems versus prior releases). More DB2 buffer pool and storage controller cache hits mean fewer accesses to spinning disk, and that means less contention. Still, if you use the same tables for OLTP and BI, keep an eye (via your DB2 monitor) on wait time per DB2 synchronous read I/O. I like to see a figure that's below 10 milliseconds.
Note that reduced disk contention is only one of the benefits associated with the use of separate tables for OLTP and BI applications. Physical table separation also provides an opportunity to make the BI tables more appropriate for a data warehouse environment, perhaps through database design changes (e.g., normalization/denormalization, or maybe a change to a dimensional design featuring so-called star schema table arrangements) and/or data transformations that might include replacement of codes with more user-meaningful data values.
The DB2 buffer pools. As with disk-level contention, reducing contention for buffer pool resources depends largely on the use of separate database objects for OLTP and BI purposes. Different tablespaces and indexes can be assigned to different buffer pools. If OLTP and BI applications have to use the same database objects (and I would hope that this would be a temporary set-up on the way towards physical separation), you will be primarily focused on keeping BI-related prefetch activity from pushing table and index pages accessed by OLTP programs out of the shared buffer pools. To this end, you might want to adjust VPSEQT (the threshold for the percentage of buffer pool space that can hold pages brought in via prefetch) down somewhat from the default of 80 for shared buffer pools. If you use query parallelism, you might also want to alter VPPSEQT (similar in concept to VPSEQT, but for pages brought in via queries that are split and executed in a parallel fashion) down somewhat from the default of 50 for shared pools.
In protecting OLTP transaction performance in a shared buffer pool environment, look to keep the rate of disk read I/Os at a reasonable level. I like it when the combined number of synchronous and asynchronous - i.e., prefetch - read I/Os is less than 100 per second. If the read I/O rate is high, you can generally reduce it by enlarging the size of the buffer pool in question (but don't over-burden server memory - the rate of demand paging from auxiliary storage, available via your z/OS monitor, should ideally be in the single digits or low double digits per second).
Locking. Once again, the best result is achieved through the use of separate tables for OLTP and BI. If dual-purpose tables are a necessity for a while, consider having the BI queries run with the UR (uncommitted read) isolation level (this can be accomplished by binding packages through which BI queries are executed with ISOLATION(UR), or by specifying WITH UR at the end of SELECT statements). UR results in essentially no locking. That, of course, means that queries could read uncommitted data changes that are subsequently rolled back because of an error situation encountered by a data-changing program, but this may not be a concern for BI users. BI queries often involve large-scale data aggregations, so a result obtained by a UR query is likely to be very close (if not identical) to one obtained with the default isolation level of cursor stability in effect, and very close is typically good enough for decision support applications.
If you can use separate tables for BI applications, put these (and their associated indexes) in databases other than those used for the OLTP tables (and here I'm using the term "database" in the very specific DB2 for z/OS sense). Databases have associated with them control blocks called database descriptors, or DBDs, and the S-lock taken on a DBD by a dynamic SQL statement can conflict with data definition language statements (e.g., CREATE, ALTER, and DROP) targeting objects in the same database. If you have to use one set of tables for OLTP and BI, you can reduce DBD locks associated with dynamic SQL statements (and BI-related queries tend to be dynamic) by taking advantage of DB2's dynamic statement caching capability (dynamic statement caching can also provide you with a wealth of information about your dynamic SQL workload, this by way of the SQL statement EXPLAIN STMTCACHE ALL).
Long-running dynamic queries can also interfere with the drain locking mechanism used by DB2 for z/OS utilities such as online REORG. A running SELECT statement will cause a read claim to be acquired and held by the issuing process, and that claim will be held until the process commits. In a BI environment, commits are typically taken - often automatically by a query tool - for each statement after it completes, but the read claim can be held for a long time in the case of a long-running query, and some BI queries can run for hours. If the same tables have to be used for OLTP and BI purposes, you might need to shut off query submission at some time (e.g., 7:00 PM), and then allow two or three hours for any long-running queries to complete. After that time you may need to cancel threads associated with still-executing BI queries so that utilities targeting dual-purpose tablespaces and indexes will be able to run unimpeded until BI activity starts up again at the beginning of the business day.
CPU. Ideally, there will be enough CPU capacity on the DB2 system to satisfy the requirements of both the OLTP and BI workloads. If there is not enough CPU to go around, you can try to free some up through tuning the performance of OLTP- and/or BI-related SQL statements; otherwise, you may need to adjust z/OS Workload Manager (WLM) settings to favor one workload over the other, or perhaps to give priority to SQL statements that run quickly (weather OLTP- or BI-related) versus those that take longer to complete.
Organizations running DB2 in data sharing mode on a parallel sysplex have an interesting option that's not available in a standalone DB2 environment: they can use one or more members of the data sharing group exclusively for BI work, while other members process the data requests of OLTP programs. This arrangement can virtually eliminate OLTP-BI workload contention with respect to server memory and server processing capacity. Combine that with the use of separate tables for OLTP and BI, and voila: you have the two different workloads executing on the same DB2 system (the data sharing group) in a most non-contentious way.
Whether you're using DB2 in standalone or in data sharing mode, an important aspect of managing CPU contention between OLTP and BI applications running on the same system is the management of DB2 query parallelization, particularly as it pertains to the BI queries. I'll cover that topic in the part 2 companion to this blog entry, which I'll post within the next few days. As we say here in Georgia, y'all come back.
Some people would be pretty quick to answer that question with a negative, and the chief concern of those folks might be the unfavorable impact of BI queries on the performance of online transactions. Is that a valid concern? It certainly can be, depending on the extent to which one can or cannot eliminate areas of contention between the two workloads. Put another way, if OLTP- and BI-related SQL statements can execute without interfering with one another, use of one DB2 system for both workloads is a viable option. In this post, I'm going to briefly describe some sources of single-system OLTP and BI workload contention, and ways in which such contention can be either eliminated or at least largely mitigated. In a Part 2 entry that will follow in a few days, I'll zero in on the challenge of managing query parallelization in a single OLTP- and BI-supporting DB2 environment. On now to potential sources of OLTP-BI interference:
The disk subsystem. This, of course, has very much to do with whether or not OLTP- and BI-related SQL statements are targeting the same DB2 tables. You might be thinking, "Well of course they wouldn't - you'd have one set of tables for OLTP programs and another set for the query and reporting applications." I'll agree that this is the PREFERRED scenario versus having OLTP and BI users access the same tables. I'd certainly want to place the OLTP database objects and the BI objects on two different sets of disk volumes. That said, some mainframe DB2 shops do, in fact, use the same tables for OLTP and for query/reporting. It's not that they're crazy. They may have had to give query and reporting capabilities to a group of users in a quick and inexpensive way, and allowing online access to the OLTP tables might have been seen as the best near-term alternative. Working in these folks' favor is the fact that disk-level contention, while definitely not a non-issue, is not as severe a performance concern as it was back in the 1990s, before people had gigabytes of cache memory on their disk control units and before DB2 and System z had 64-bit addressing (buffer pools can be MUCH larger in DB2 V8 and V9 subsystems versus prior releases). More DB2 buffer pool and storage controller cache hits mean fewer accesses to spinning disk, and that means less contention. Still, if you use the same tables for OLTP and BI, keep an eye (via your DB2 monitor) on wait time per DB2 synchronous read I/O. I like to see a figure that's below 10 milliseconds.
Note that reduced disk contention is only one of the benefits associated with the use of separate tables for OLTP and BI applications. Physical table separation also provides an opportunity to make the BI tables more appropriate for a data warehouse environment, perhaps through database design changes (e.g., normalization/denormalization, or maybe a change to a dimensional design featuring so-called star schema table arrangements) and/or data transformations that might include replacement of codes with more user-meaningful data values.
The DB2 buffer pools. As with disk-level contention, reducing contention for buffer pool resources depends largely on the use of separate database objects for OLTP and BI purposes. Different tablespaces and indexes can be assigned to different buffer pools. If OLTP and BI applications have to use the same database objects (and I would hope that this would be a temporary set-up on the way towards physical separation), you will be primarily focused on keeping BI-related prefetch activity from pushing table and index pages accessed by OLTP programs out of the shared buffer pools. To this end, you might want to adjust VPSEQT (the threshold for the percentage of buffer pool space that can hold pages brought in via prefetch) down somewhat from the default of 80 for shared buffer pools. If you use query parallelism, you might also want to alter VPPSEQT (similar in concept to VPSEQT, but for pages brought in via queries that are split and executed in a parallel fashion) down somewhat from the default of 50 for shared pools.
In protecting OLTP transaction performance in a shared buffer pool environment, look to keep the rate of disk read I/Os at a reasonable level. I like it when the combined number of synchronous and asynchronous - i.e., prefetch - read I/Os is less than 100 per second. If the read I/O rate is high, you can generally reduce it by enlarging the size of the buffer pool in question (but don't over-burden server memory - the rate of demand paging from auxiliary storage, available via your z/OS monitor, should ideally be in the single digits or low double digits per second).
Locking. Once again, the best result is achieved through the use of separate tables for OLTP and BI. If dual-purpose tables are a necessity for a while, consider having the BI queries run with the UR (uncommitted read) isolation level (this can be accomplished by binding packages through which BI queries are executed with ISOLATION(UR), or by specifying WITH UR at the end of SELECT statements). UR results in essentially no locking. That, of course, means that queries could read uncommitted data changes that are subsequently rolled back because of an error situation encountered by a data-changing program, but this may not be a concern for BI users. BI queries often involve large-scale data aggregations, so a result obtained by a UR query is likely to be very close (if not identical) to one obtained with the default isolation level of cursor stability in effect, and very close is typically good enough for decision support applications.
If you can use separate tables for BI applications, put these (and their associated indexes) in databases other than those used for the OLTP tables (and here I'm using the term "database" in the very specific DB2 for z/OS sense). Databases have associated with them control blocks called database descriptors, or DBDs, and the S-lock taken on a DBD by a dynamic SQL statement can conflict with data definition language statements (e.g., CREATE, ALTER, and DROP) targeting objects in the same database. If you have to use one set of tables for OLTP and BI, you can reduce DBD locks associated with dynamic SQL statements (and BI-related queries tend to be dynamic) by taking advantage of DB2's dynamic statement caching capability (dynamic statement caching can also provide you with a wealth of information about your dynamic SQL workload, this by way of the SQL statement EXPLAIN STMTCACHE ALL).
Long-running dynamic queries can also interfere with the drain locking mechanism used by DB2 for z/OS utilities such as online REORG. A running SELECT statement will cause a read claim to be acquired and held by the issuing process, and that claim will be held until the process commits. In a BI environment, commits are typically taken - often automatically by a query tool - for each statement after it completes, but the read claim can be held for a long time in the case of a long-running query, and some BI queries can run for hours. If the same tables have to be used for OLTP and BI purposes, you might need to shut off query submission at some time (e.g., 7:00 PM), and then allow two or three hours for any long-running queries to complete. After that time you may need to cancel threads associated with still-executing BI queries so that utilities targeting dual-purpose tablespaces and indexes will be able to run unimpeded until BI activity starts up again at the beginning of the business day.
CPU. Ideally, there will be enough CPU capacity on the DB2 system to satisfy the requirements of both the OLTP and BI workloads. If there is not enough CPU to go around, you can try to free some up through tuning the performance of OLTP- and/or BI-related SQL statements; otherwise, you may need to adjust z/OS Workload Manager (WLM) settings to favor one workload over the other, or perhaps to give priority to SQL statements that run quickly (weather OLTP- or BI-related) versus those that take longer to complete.
Organizations running DB2 in data sharing mode on a parallel sysplex have an interesting option that's not available in a standalone DB2 environment: they can use one or more members of the data sharing group exclusively for BI work, while other members process the data requests of OLTP programs. This arrangement can virtually eliminate OLTP-BI workload contention with respect to server memory and server processing capacity. Combine that with the use of separate tables for OLTP and BI, and voila: you have the two different workloads executing on the same DB2 system (the data sharing group) in a most non-contentious way.
Whether you're using DB2 in standalone or in data sharing mode, an important aspect of managing CPU contention between OLTP and BI applications running on the same system is the management of DB2 query parallelization, particularly as it pertains to the BI queries. I'll cover that topic in the part 2 companion to this blog entry, which I'll post within the next few days. As we say here in Georgia, y'all come back.