Catterall Consulting: December 2008

Robert's Blog

Friday, December 26, 2008

Whither DB2 for z/OS Self-Tuning Memory Management?

In my own experience as a DB2 consultant, I've seen organizations put their trust in DB2 for Linux, UNIX, and Windows (LUW) when it comes to automatic management of the server memory resources utilized by DB2 - with positive results. Self-tuning memory management (STMM) was activated for databases by default starting with DB2 9.1 for LUW, and DB2 9.5 added several memory-related configuration parameters to the list of those for which a value of AUTOMATIC can be specified, thereby turning over to DB2 the task of optimizing the use of server memory for associated functions (DB2 9.5 also unified the use of the threaded architecture across platforms, versus the process model previously used on Linux and UNIX servers - an advantageous change in regard to STMM, as I pointed out in a previous post on the topic). DB2 for LUW STMM is very broad in scope. Memory use for buffer pools, package caching, sort operations, locking - all can be managed dynamically and automatically by DB2. My impression is that STMM has been well received by the DB2 for LUW user community.

IBM's DB2 for z/OS development organization made an important move with respect to STMM in the form of the AUTOSIZE option of ALTER BUFFERPOOL, introduced with DB2 for z/OS Version 9 (generally available since March of 2007). I say important because this is something of an acid test. Of all the performance tuning knobs and levers exposed by DB2 for z/OS, perhaps none gets as much attention as buffer pool sizing. Will DB2 for z/OS systems people be willing to turn the management of this crucial performance factor over to DB2 itself? If they will, the door will be wide open for DB2 on the mainframe platform, in future releases, to make available to users the option of having the DBMS manage most all aspects of subsystem memory utilization in an automatic and dynamic fashion.

This is sure to be a hotly debated topic within the DB2 for z/OS user community. I'll give you my take: it may take a few years, but I believe that DB2 STMM on the mainframe platform will come to be as broad in scope as it is on LUW servers (encompassing not only the management of buffer pool sizing, but of the EDM pool, the RID pool, and the sort pool, as well), and that utilization of comprehensive STMM among DB2 for z/OS-using organizations will become commonplace.

I believe that a parallel development can be found in mainframe DB2's past, that being the introduction by IBM in the 1990s of automatic DB2 data set placement within a disk subsystem (made possible by advances in both DB2 and z/OS file management capabilities). For years, DB2 DBAs had carefully assigned DB2 data sets to particular volumes by using specific volume serial numbers (disk volume IDs) in CREATE STOGROUP statements, and then assigning data sets to STOGROUPs to (among other things) physically separate tables from associated indexes, and partitions of partitioned tablespaces from other partitions belonging to the same tablespace. When DB2 made it possible to leave data set placement up to the operating system via the specification of '*' for VOLUMES in a CREATE STOGROUP statement, a lot of DB2 people were very hesitant to make that leap. Those that did often ended up exercising quite a lot of control over data set placement via so-called ACS routines. Eventually, though, VOLUMES('*') came to be a very common CREATE STOGROUP specification, and many sites eschewed complex ACS routines in favor of very simple set-ups that sometimes involved a single big disk volume pool for all DB2 user tables and indexes. Why the change in attitudes? Simple: system-managed placement of DB2 data sets worked (meaning that people let the system manage this and found that they still got excellent DB2 I/O response time). Why did it work? One reason was the sophistication of z/OS file management capabilities, but at least as big a reason was the big change in disk subsystem technology that made the particular physical location of a DB2 data set on disk relative to the location of other data sets more and more of a non-issue: the advent of very large (multi-gigabyte) cache memory resources on disk controllers, along with sophisticated algorithms to optimize the use of same.

I believe that STMM will work well on the mainframe platform, and aside from the efforts of the DB2 for z/OS developers to this end, a big factor will be the growing use of very large server memory resources that were made possible by the move to 64-bit addressing. The availability of vast amounts of server memory will not make the sizing of buffer pools unimportant, but it stands to reason that sizing buffer pools for good performance in a 1000+ transactions per second environment is easier when you have 100 GB (for example) of memory to work with, versus 2 GB.

Will DBAs who have spent lots of time monitoring and tuning buffer pools end up out in the cold when DB2 for z/OS STMM becomes widely used? I think not. They'll have more time for application-enablement work that will boost the value that they deliver to their employing organizations.

Anyway, that's what I think. Your comments would be most welcome.

Tuesday, December 16, 2008

DB2 DR: Tape as a Bulwark Against Human Error

In an entry posted to this blog almost a year ago, titled "Aggressive DB2 Disaster Recovery", I wrote about the tremendous - and very positive - impact of remote disk mirroring technology on organizations' DB2 disaster recovery capabilities: companies that had developed DB2 DR procedures with dozens of steps, and which aimed to have systems restored and ready to take application traffic within 2-3 days (or more) of a disaster event with a loss of the most recent 24 hours (or more) of database updates could, upon implementing a remote disk mirroring solution, realistically expect to have DB2-based application systems up and running at a DR site within 1-2 hours of a disaster event with a loss of maybe a very few seconds of previously-committed database changes (for asynchronous remote mirroring) or even zero data-update loss (in the case of a synchronous remote disk mirroring configuration, which has a practical distance limitation of around 20 straight-line miles or so between the primary and DR sites).

The quantum leap forward in DB2 DR capabilities delivered by remote disk mirroring motivated a number of companies - particularly financial services firms - to deploy the technology very soon after it became available back in the mid-1990s. The cost of these solutions, including the bandwidth needed to make them work, has declined over the years, and the cost of downtime and data loss have increased, so adoption has become more and more widespread. Many people are quite enamored with remote disk mirroring technology (I'm very big on it myself), and it is understandable that enthusiasts might come to have a "Forget about it!" attitude towards tape-based DR.

Such a dismissal of tape-based DR solutions might be ill-advised - a point that was brought home to me by a recent question I saw from a DB2 professional in Europe. The question had to do with a process for getting the latest DB2 for z/OS active log data archived and sent off-site for DR purposes in a parallel sysplex/data sharing environment (more on the particulars of the question momentarily). In providing an answer, I asked the questioner why he was interested in sending DB2 archive log files to a DR site, given that his company had implemented a remote disk mirroring solution (which included the all-important mirroring of the DB2 active log files). It turns out that this person's organization was required to have in place a DR procedure that could be used in case all data on disk - at both the primary and the DR sites - were to be lost. He went on to say that this requirement was a direct result of an incident - involving another company but of which his company had become aware - that caused an organization with a remote disk mirroring system in place to lose access to all data on disk at both the originating and replicate-to sites because of a human error (an accidental configuration change).

That brought to my mind a somewhat similar situation I read about some 15 years ago, in which a systems person made a mistake when entering a console command and all labels (containing control information) on all disk volumes were wiped out. The data was still there, but the system couldn't find it. A long outage ensued - lasting a day or two, as I recall. Ironically, this happened in a state-of-the-art data center that had been constructed so as to survive all manner of external disaster-type events. How do you protect mission-critical systems against internal threats of the human-error variety? Yes, you can (and should) put in place safeguards to help ensure that such errors will not occur (the WITH RESTRICT ON DROP option of the DB2 statement CREATE TABLE is an example of such a safeguard), but can you be certain that these measures are fail-safe? Can you anticipate any potentially devastating mistake that any person with access to your system might make (including your hardware vendors' maintenance and repair technicians)? Wonderful as remote disk mirroring is, you might sleep better at night knowing that a tape-based DR procedure (and "tape" is used loosely here - files could go to disk and be electronically transmitted to the DR site, perhaps to be transferred there to offline media) is documented, tested, and in operation (with respect to the regular sending of DB2 table backup and archive log files to the DR site). Hope that you won't ever have to use it (as previously mentioned, it does elongate disaster recovery time, and it risks loss of data changes made since the most recent log-archive operation), but know that elongated recovery with some data-update loss is way better than going out of business should your front-line DR solution fail you.

Now, about the particulars of that DB2 for z/OS DR question to which I referred: in a DB2 data sharing environment, tape-based recovery typically involves the periodic issuing of an -ARCHIVE LOG SCOPE(GROUP) command. This causes all active DB2 members in the data sharing group to truncate and archive the current active log data set. Output of command execution includes an ENDLRSN value (referring to a timestamp-based point in the log) from each DB2 subsystem, indicating the end-point of the just-archived log files. If you had to use these files to recover the data sharing group at the DR site, you'd use the smallest of these ENDLRSN values in a conditional restart to truncate all members' logs to the same point in time (important for data integrity). Suppose that you do the -ARCHIVE LOG SCOPE(GROUP) every hour to minimize potential data loss (assuming that you either don't use remote disk mirroring or you're establishing a safety net - as herein advocated - in case of a failure of the disk mirroring system). What if a DB2 member has to be down for several hours for some reason, so that the ENDLRSN of the last archive log from that member sent to the DR site is hours older than the end-points of the other DB2 members' log files at the DR site? Do you have to toss all of those more-current archive log files and use the oldest ENDLRSN value (the one for the most recent archive log of the member that's been down for hours) for your conditional restart? In fact, you don't have to do this, if you take the proper steps in shutting the member down. Here's what you'd do to shut down member A (for example): first, quiesce the subsystem, so that there are no in-flight, in-doubt, in-abort, or postponed-abort units of recovery. Then, just before shutting the subsystem down, do an -ARCHIVE LOG (and subsequently send that archived log file to the DR site). What you will then end up with on the new current active log data set for member A is just a shutdown checkpoint record, and that's not needed for recovery at the DR site (you could run DSN1LOGP off the end of that last member A archive log, generated via the just-before-shutdown -ARCHIVE LOG operation, to verify that there were no incomplete URs on member A at the time of the shutdown). With member A shut down in this way, you would NOT have to throw out the more-current log data from the other DB2 data sharing group members when recovering at the DR site.

Tuesday, December 9, 2008

[Some of] the Best DB2 Things are Free

If you're a DB2 person, you've probably heard of DB2 Express-C. It's the full-function DB2 for Linux/UNIX/Windows (LUW) product that you can use and deploy - even in a production environment - for free. One of the really nice things about DB2 Express-C isn't even part of the product. I'm referring to the FreeDB2 blog written by IBM's Leon Katsnelson.

Leon has been an important part of the DB2 for LUW development organization for years and years, and throughout that time he's been an outstanding ambassador for the product, delivering great "news you can use" presentations at conferences all over the world and providing lots more DB2 information to users via electronic channels. A lot of what I know about DB2, especially from a client/server perspective, I learned from Leon.

FreeDB2.com has, as you'd expect, plenty of great information related to DB2 Express-C, but it also gets into free software in general, and free DBMS software in particular (I like the way Leon puts it: that's "free as in beer"). There's also a "Chat with Leon" feature on the site (Leon, energetic as he is, is not always online, but you can leave him a message when he isn't).

When the work that a person does reflects his passion, the result is usually very good work indeed. For as long as I've known Leon, I've been impressed with his passion for application development in a DB2 context. He knows that database software, however technically advanced it might be, is of little value if it is not serving applications that enable organizations to better serve their customers, employees, and stakeholders. DB2 Express-C is all about broadening the community of people who develop applications on a DB2 data-serving foundation. The same can be said of FreeDB2.com. Thanks, Leon, for continuing to carry that torch.

Monday, December 1, 2008

Virtual Storage Loses Mind Share

Last week, I taught a class on DB2 for z/OS data sharing recovery and restart to a group of people from a large Asia-based financial institution. The students appeared to be relatively young - I'd guess that most were in their 20s or 30s. At one point, I was explaining that a rebuild of the DB2 lock structure (an area of coupling facility storage used for global lock contention detection and for the recording of currently held data-change locks) is a fast operation because information required for the rebuild is in virtual storage on the z/OS systems on which the DB2 members of the data sharing group are running. A woman on the front row furrowed her brow and asked, "When you say that the needed data is in virtual storage, do you mean that it is in server memory?" I started to explain that data in virtual storage is not necessarily in server memory (aka "real" storage), owing to things like demand paging and auxiliary storage, but I ended up going pretty much with "Yes" as my answer. At that moment, I was struck by how irrelevant the concept of virtual storage is getting to be among people who are coming of age as IT professionals in a world of 64-bit addressing.

Now, don't get me wrong here. I know that virtual storage is still very important in some respects - particularly as it relates to software development. To make efficient use of server memory, an operating system has to be able to place chunks (4KB blocks, for example) of related code and/or data here and there in real storage in a non-contiguous fashion; otherwise, you'd have a lot of "holes" in server memory that would amount to a lot of wasted space. Programmers, on the other hand, need to be able to refer to areas of memory in which, for example, variable values are held, without worrying about where those areas of memory might physically be located in server memory. Virtual storage gives a developer a uniform, non-fragmented canvas on which to create software, while enabling the operating system to get maximum bang from real storage bucks through virtual-to-real address translation that makes the developer's view of memory compatible with the server resource as managed by the operating system. This is not what I'm talking about when I say that virtual storage is losing mind share. I refer instead to the way in which virtual storage can make a server memory of size X appear to be much larger.

There was a time when this "objects in mirror appear large than they really are" function of virtual storage was VERY important. When I started my IT career in 1982, IBM's MVS operating system (forerunner of z/OS) had 24 bits to work with as far as addressing is concerned, and that allowed for a 16 MB virtual storage address space (and support for 16 MB of real storage on a mainframe). The thing is, there were lots of active address spaces on a typical production mainframe system (every logged-in TSO user and every batch initiator had one, and there were also plenty of IMS message regions and CICS application-owning regions), and if you added up all the virtual storage being used in those address spaces you'd get an amount that was WAY bigger than the real storage resource on the server. To give users the impression that all of their in-memory stuff actually was in memory, MVS would shovel not-very-recently-referenced pages of code or data out of real storage and into auxiliary storage (so-called page data sets on disk), and bring said pages back into real storage when needed, to the tune of maybe several hundred per second during periods of peak workload. Whole address spaces (especially of TSO users) would get swapped out (i.e., paged out to auxiliary storage en masse) when they weren't active. We would tweak parameters used by the MVS System Resource Manager in an effort to fine-tune paging and swapping operations. In extreme cases we would "fence" certain address spaces to ensure that the amount of related in-memory data actually in real storage would not go below some minimum amount. Page data set I/O performance was monitored very closely. All this analysis and tuning of virtual and real storage occupied a good bit of a lot of people's time.

The move, in the late 1980s, to 31-bit addressing and 2GB address spaces (and servers that could be configured with 2GB of real storage) made a difference, for sure, but from the DB2 perspective it still wasn't enough. Pretty soon, we got expanded storage on mainframes (page-addressable versus byte-addressable), and exploitation of same via DB2 hiperpools. Even with 2000 megabytes of real storage available versus the 16 megabytes of old, we still used a second tier of storage (expanded) to enable a buffer pool configuration to sort of grow beyond the limit imposed by 31-bit addressability (I say "sort of" because only "clean" pages - those either not updated or updated and subsequently externalized to disk - could be written to a hiperpool, and a page in a hiperpool had to be moved back to a "regular" DB2 buffer pool before a program could access it). Lesson learned: an extra 7 bits of addressability can get used up pretty quickly. How about another 33 bits, taking the total available for address representation to 64 bits? Now, we're talking.

Do you know what an exabyte is? It's one quintillion bytes. One million terabytes. One billion gigabytes. 64-bit addressing can support 16 exabytes of byte-addressable memory. Already, you can get an IBM z10 mainframe with up to 1.5 terabytes of central storage, and you know it's going to go up from there. Want to configure a 4 GB DB2 buffer pool? Go ahead. 20 gigabytes? No big deal. 40 gigabytes? Why not, if you have the memory? We are entering an era in which people will still have page data sets on disk, just for insurance purposes, but they'll have enough server memory so that paging to auxiliary storage will basically be nil. In-virtual will in fact mean in-real, even for very large, very busy DB2 data-serving systems. No more having to make a real resource of size X appear to be size 2X (or whatever). The real resource of size X will be big enough for everything that needs to be in virtual storage. At a number of DB2-using organizations, this is true already.

And I'll update a slide in a presentation to read, "Lock structure rebuild is fast because all the required information is in server memory on the systems on which the DB2 members of the data sharing group are running..."

Previous Posts

Archives