Data Warehousing on DB2 for z/OS
Not so long ago, a lot of people had the idea that if you wanted to do advanced, large-scale (multi-terabyte) data warehousing with DB2, your best bet, platform-wise, was DB2 for Linux, UNIX, and Windows (LUW). This widespread impression was based on several factors, including:
- IBM's own product marketing and packaging efforts, which emphasized DB2 for LUW as the data warehouse platform of choice.
- The absence on the z/OS platform of some product features that were very useful from a business intelligence perspective (including materialized query tables, better star-join query optimization, and 64-bit addressability).
- A richer SQL on the DB2 for LUW platform, including support for common table expressions (which enable powerful data manipulation via recursive SQL) and convenient result set comparisons via EXCEPT and INTERSECT operations.
- A perception that the mainframe platform was not a cost-effective data warehouse solution.
It's time now for people who have looked upon DB2 for z/OS as an also-ran BI platform to rethink their opinions, because the situation with respect to all of the factors cited above has changed:
- IBM is promoting DB2 for z/OS as well as DB2 for LUW as a prime-time data warehousing platform.
- DB2 for z/OS has materialized query tables (which can dramatically improve response times for queries that involve data aggregations and/or large-scale joins), advanced star-join query optimization (important for so-called dimensional data warehouses), and 64-bit addressing support (great for reducing I/O wait times via extra-large buffer pools).
- SQL on the DB2 for z/OS platform has been enriched with BI-friendly features such as common table expressions (enabling you to do with one SQL statement what might have taken several SQL statements and some user programming before), and (in V9) the INTERSECT and EXCEPT clauses of SELECT (both of which make the coding of result set comparisons much easier). Also available on System z now are advanced XML data management and query capabilities that were introduced with DB2 9 for LUW.
- The pricing story on Z keeps getting better and better. On top of the ongoing price-performance gains delivered with successive generations of System z microprocessors, IBM recently provided the DB2 for z/OS Value Unit Edition, making DB2 on System z available for a One-Time Charge for eligible net-new applications or workloads, including data warehousing applications. And another thing: at a time of historically high energy costs people are appreciating all the more the ability of System z servers to deliver maximum bang for your kilowatt expenditures.
It's actually a really exciting time for mainframers where data warehousing is concerned. Organizations are out there right now, running multi-terabyte data warehouse systems using dimensional databases and sophisticated front-end query and reporting tools, all on a DB2 for z/OS foundation. Not only is the technology first-class, the IBM support is first-rate, as a number of the best and brightest in IBM's DB2 for z/OS development organization are dedicated to helping companies achieve success with mainframe-based BI applications.
And what of DB2 for LUW? It's still an outstanding data warehouse platform, and it also gets additional BI goodies with each new release. That hasn't changed. What has changed is the choice between DB2 for z/OS and DB2 for LUW as a foundation for enterprise data warehouse solutions. That choice is now tougher -- in a good way.
A Note on DB2 for z/OS Page-Fixed Buffer Pools
I've recently had the occasion to give DB2 for z/OS page-fixed buffers a spin, and I've been pleased by the results I've seen. In this post I want to give you some information about this relatively new mainframe DB2 feature.
The ability to page-fix DB2 buffer pools was introduced with DB2 for z/OS V8, via the PGFIX option of the -ALTER BUFFERPOOL command. The default value of the parameter is NO, which gives you the business-as-usual situation with regard to the way in which z/OS treats real storage page frames occupied by DB2 buffers. [Page frames are like cubbyholes within the real storage resource managed by z/OS on a mainframe or a mainframe logical partition (LPAR). Real storage, in turn, is just a term for mainframe server memory -- so called to distinguish it from virtual storage.] Generally speaking, z/OS retains the right to move something out of a real storage page frame and into auxiliary storage (the latter likely being some files in the disk subsystem). z/OS does this so that it can support an amount of virtual storage, spread across multiple address spaces, that exceeds the real storage available on the system. Because they are usually referenced quite frequently, DB2 buffers in real storage page frames are not likely to be shipped off to auxiliary storage (z/OS tends to steal page frames occupied by pages that have gone a while without being referenced. That said, it's possible that a page frame occupied by a DB2 buffer could be stolen by z/OS.
Because a DB2 buffer could be moved out of real storage by z/OS, DB2 has to do something to ensure that the buffer will NOT be stolen in the midst of an I/O operation (i.e., while DB2 is in the process of reading a page into the buffer from the disk subsystem, or externalizing a previously updated page in the buffer to disk). DB2 does this by asking z/OS to fix (in other words, make non-stealable) the page frame holding the buffer, until the related I/O operation has completed. DB2 subsequently tells z/OS that the page frame can revert back to stealable status, and the previously fixed buffer becomes unfixed from a real storage perspective.
z/OS is very good at assigning to tasks (meaning, dispatchable pieces of work in the system) the costs associated with services that it performs on behalf of said tasks, and so it is that the CPU cost of DB2 buffer page fix and unfix requests is assigned to the DB2 database services address space (aka DBM1) for asynchronous (i.e., prefetch) read I/Os and database writes, and to allied address spaces (through which SQL statements get to DB2 -- examples include the DB2 DDF address space, CICS application-owning regions, and batch initiators) for synchronous (single-page) read I/Os. The CPU cost of fixing and unfixing a page frame is pretty small, but when disk I/O operations related to a buffer pool number in the hundreds per second, the cumulative cost adds up.
Enter buffer page-fixing. When PGFIX(YES) is specified for a buffer pool, the page frames that hold the pool's buffers will be fixed in real storage (this is sometimes referred to as V=R, which is short for virtual storage = real storage). Because these pages can't be stolen, there is no need for DB2 to issue fix and unfix requests, and CPU consumption is reduced (again, for both the DB2 DBM1 address space and allied address spaces associated with SQL statements that target objects assigned to the buffer pool in question).
Sounds good, eh? You might wonder why the option to page-fix buffer pools wasn't delivered before DB2 for z/OS V8. My thinking is that the capability would not have been very useful prior to DB2's exploitation of 64-bit virtual and real storage addressing, something else delivered with V8. When the maximum amount of real storage addressable by z/OS was 2GB, you probably wouldn't have wanted to page-fix DB2 buffer pools for fear of putting too much stress on other users of mainframe memory (CICS, batch, TSO, etc.). Now that people are running DB2 V8 and V9 on system Z servers with scores of gigabytes of real storage, using PGFIX(YES) to fix a chunk for the purpose of reducing the CPU consumption of a DB2 workload seems like a very reasonable proposition.
Now, do you want to page-fix all of your DB2 buffer pools? Probably not. I'd lean towards using it for pools that have the most associated I/O activity (you can check this out using your favorite DB2 monitor, or by way of the good old DB2 -DISPLAY BUFFERPOOL command (what I like to do is issue the command - DISPLAY BUFFERPOOL(ACTIVE) DETAIL(INTERVAL), and then issue the same command 60 minutes later to get an hour's worth of activity). Also, make sure that the z/OS LPAR on which the DB2 subsystem is running has a sufficiency of real storage.
One other thing: page-fixing a buffer pool has nothing to do with fixing the contents of the buffers. DB2 manages the moving of pages into buffers, and the replacement of pages already in buffers with other pages (buffer stealing), for a page-fixed buffer pool as it would for a non-fixed pool. Fixing a pool's buffers in mainframe memory and "pinning" an object, in its entirety, in a buffer pool are two very different things. Page-fixing a buffer pool isn't about keeping more data and index pages cached in server memory -- it's about caching the same amount of stuff in a more CPU-efficient manner.
So, take a look at your buffer pool configuration, and give serious consideration to page-fixing the pools that have the most associated I/O activity.
DB2 for z/OS People: Don't be Memory Misers
When I first started working with DB2 on the mainframe platform, back in the mid-1980s, plenty of people were still running a version of the operating system called MVS/370. That OS limited the size of an address space to 16 MB, and the amount that a subsystem such as DB2 actually had to work with was less than that, since some megabytes were occupied by system code and by common storage areas (i.e., areas of virtual storage that were shared across address spaces). Since server memory was tight, people were frugal in their use of it. Early on, it was common to find organizations running DB2 in production with a total buffer pool configuration of 1000 4K buffers (4 MB of virtual storage) -- often allocated to a single pool, BP0.
MVS/370 was supplanted in the late 1980s by MVS/XA, which upped the maximum address space size to a whopping 2 GB (the XA in the OS name stood for eXtended Addressing). Though the amount of memory available to a DB2 subsystem had expanded by more than two orders of magnitude, many mainframe systems programmers were hesitant to significantly increase the size of DB2 buffer pools. I vividly recall seeing eyes widen when I told people that a DB2 subsystem running under MVS/XA should have 10,000 buffers in the pool at a bare minimum. 10,000 buffers! That's 40 MB! True, but 40 MB was peanuts in a 2 GB address space. On top of that, DB2 was designed to utilize large server memory resources to maximum advantage: the more virtual storage you gave it, the better it performed. Still, widespread acceptance of big buffer pools (meaning at least several hundred megabytes) was slow in coming.
Jump ahead to the late 00s (or whatever you call our present decade), and you see a repeat of that old treat-mainframe-memory-like-gold mindset. This time around, the big change was 64-bit hardware and virtual storage addressing on the Z platform, initially exploited by DB2 for z/OS Version 8. Organizations now have System Z servers with dozens of gigabytes of central storage, on which they have production DB2 V8 and V9 subsystems running with buffer pool configurations that use a gigabyte (or even less) of virtual storage for buffers. Come on, people! As was true when MVS/XA hit the scene, DB2 doesn't just tolerate large address spaces - it THRIVES in large address spaces. Here's a modest proposition for you: if you have a production DB2 V8 or V9 subsystem running in a z/OS LPAR (logical partition) that has a central storage resource of X gigabytes, think in terms of using at least 25% of X for DB2 buffers. Recall that folks routinely used 70-80% of a 2GB address space for DB2 buffers. Twenty-five percent is conservative. So, if your DB2-hosting z/OS LPAR has 40 GB of central storage, go for a 10 GB buffer pool configuration for starters, and be ready to expand further from there.
If you're planning on growing your DB2 buffer pool configuration to leverage 64-bit mainframe storage technology, should you do this by just enlarging all of your existing pools (assuming that you are using multiple pools -- and you should be -- for different categories of database objects)? No! Be smart about buffer pool expansion, and add buffers to pools in a way that will give you the biggest bang for your central storage buck. Your principal aim in bulking up buffer pools should be to reduce the rate of DB2 disk I/O operations. If you have two pools of equal size, and the rate of synchronous read I/Os (and/or the rate of prefetch I/Os -- both are important) is 100 per second for pool A and 5 per second for pool B, add buffers to pool A (and if you decide to make both pools bigger, add more buffers to A than to B). The rationale for focusing on disk I/O is simple: doing so gives you a double-barreled benefit: fewer disk I/Os means reduced elapsed times for DB2 queries and DB2-accessing programs (these are typically I/O-bound) AND reduced CPU consumption for a given DB2 workload (it doesn't take much CPU time to drive a disk I/O operation, but a small amount of CPU processing resource expended a zillion times can really add up). You can get I/O rates for DB2 buffer pools from several sources, including DB2 monitor products and the good old DB2 DISPLAY BUFFERPOOL command.
Something else to keep in mind: mainframe memory tends to cost less than mainframe MIPS. If you want to boost DB2 throughput, leverage the central storage resource at hand before adding engines.
Something else to keep in mind: fewer I/Os does NOT mean fewer GETPAGEs (a GETPAGE is a DB2 request to view a table or index page). Fewer I/O operations translates into CPU savings, but if you want to take a really big bite out of the CPU cost of a DB2 query or program or workload, you need to achieve a big reduction in GETPAGEs. That is basically a matter of access path analysis and associated query and/or database design changes. And that is a subject for another post. Till then, don't be shy about using more mainframe central storage for DB2 (V8 and above) buffer pools.
For Mainframers Interested in DB2 for Linux/UNIX/Windows...
Like many a veteran IT person, I knew a lot about DB2 for z/OS before I ever touched DB2 for Linux, UNIX, and Windows (DB2 for LUW was several years from existence when I started working with DB2 for the mainframe platform in 1986). By the late 1990s, DB2 was getting to be pretty interesting from my perspective, due to its growing sophistication and robustness (and due also to the enthusiasm of young colleagues of mine on IBM's DB2 National Technical Support team, such as Monty Wright and Michael Logan -- both still DB2 gurus in IBM's employ).
When I joined CheckFree Corporation (now part of FiServ) in 2000 (thereby moving over to the user side of the DB2 community), opportunities for gaining DB2 for LUW knowledge expanded: shortly after I came on board, CheckFree decided to implement its enterprise data warehouse on a DB2 for AIX (with the Data Partitioning Feature) platform, and not long after that the Company's CRM application was migrated to PeopleSoft with a DB2 for Solaris database. Both of these databases went relatively quickly into multi-terabyte territory, and I had a lot of fun learning about the technology (annual CheckFree visits to the Executive Briefing Center at IBM's Toronto Lab were highlights).
When I finally scratched my entrepreneurial itch by starting my own company last summer, I continued to work with DB2 for LUW as well as DB2 for z/OS. Take it from me: if you are a mainframe DB2 person, you can become productive as a DB2 for LUW person if you want to. In this post I'll share with you a few of the things I learned in the course of becoming a cross-platform DB2 practitioner; to wit:
- Run it on your own system. Not all of us have our own mainframes. One thing that I really like about DB2 for LUW is having it on my laptop, in two flavors, to boot: under Windows, and also under Linux in a virtual machine. For me, DB2 9.5 Express-C is the ideal choice for my PC. The free version has just about everything with the exception of HADR (High Availability Disaster Recovery, an advanced failover clustering technology) and built-in data replication capability (these features are available with the DB2 Express-C paid subscription option).
- Get a compact Linux helper guide (if you decide to run DB2 under Linux). There are several of these little books from different publishers. You don't have to be a Linux jock to use DB2 for Linux, but a small-footprint reference is nice to have when you need to look up a command to invoke an editor or create a directory or display the contents of a file.
- Get some freeware for access to Linux from Windows (again, if you run DB2 under Linux). PuTTY is a nifty open-source program that enables you to securely log in to a remote Linux server (such as one on which DB2 is installed) from a Windows system. WinSCP, another piece of open-source software, makes it easy to move files between Windows and Linux systems. If you want to be able to use, on your local Windows system, a server-side DB2 GUI (such as the Setup Wizard that can be used to install DB2) that runs on a remote Linux system, a free program called Cygwin/X is your ticket. [This is NOT needed to use a client-side DB2 GUI such as the Control Center, which runs on YOUR system and enables you to interact with a remote DB2.]
- Get comfortable with the file system. As far as I'm concerned, the biggest difference between DB2 for z/OS and DB2 for LUW, from an administrative perspective, is the need to know more about the file system when you're using the latter. At one time, mainframe DB2 DBAs had to know something about Access Method Services, the operating system component used to create the VSAM files used by DB2. Nowadays, just about all mainframe DB2 installations use DB2-defined objects via STOGROUPs, and people let DB2 handle the Access Method Services stuff. Not only that, but it's now common practice to have z/OS System Managed Storage take care of placement (within the disk subsystem) for DB2 tablespace and index files through the use of an asterisk as the volume designator on CREATE STOGROUP statements. While DB2 for LUW has made great strides in reducing one's need to deal with file-related plumbing (especially with the automatic storage feature introduced with DB2 9.1 for LUW), you still have to know more about files on this platform versus System Z. Don't be intimidated. You can pretty quickly learn enough about drive letters in Windows and file paths in Linux environments to get you by.
- FixPacks versus PUT tapes. DB2 maintenance is packaged differently on LUW systems. One thing I learned early on: installing a DB2 for LUW FixPack is a two-step process. Know about the installFixPack and db2iupdt commands.
- Parameters here, parameters there. On System Z, you have DB2 ZPARMs, and the technical term "database" is not much beyond a designation for a set of tables. On LUW systems, you have instance-level DB2 parameters and database-level parameters. An example of the former is the agent pool size (don't worry -- it can be set to AUTOMATIC), and an example of the latter is the log file size (each database -- and there can be several under one DB2 for LUW instance -- has its own transaction log files).
- Google, baby! Need to untar a file? Need to know what "gzip" means? Need to know how to change the behavior of SELinux? The answers are out there, my friends, and Google gets you to them PDQ. No kidding: when in doubt, use the Great Search Engine, and chances are you'll see that the same questions have been asked by others -- and answered!
There's more, of course, but I mostly want you to know that what you need to learn CAN be learned, and learned readily. Also, know that the great bulk of your DB2 for z/OS knowledge is directly applicable to the DB2 for LUW world. SQL programming is virtually identical on Z and LUW. In both places you have tablespaces, and bufferpools, and archive logs, and catalog tables (actually, catalog views on LUW), and clustering indexes, and range-partitioned tables, and online REORGs, and RUNSTATs, and prefetch, and query parallelism, and materialized query tables, and so on and so on. They really are much more alike than they are different. So, broaden your DB2 horizons, and get to know DB2 for LUW. It's fun to do, and who knows? It could open some new career doors for you.