Catterall Consulting: IBM IOD 2009 - Day 4 (Belated Re-Cap)

Apologies for the delay in getting this entry posted to my blog -- the time since IBM's 2009 Information On Demand conference concluded on October 29 has been very busy for me. Now I have a little downtime, so I can share with you what I picked up on day 4 of the conference.

"Not Your Father's Database System," indeed - Guy Lohman, of IBM's Almaden (California) Research Center, delivered a very interesting presentation on the Smart Analytics Optimizer, a just-around-the-corner product (meaning, not yet formally announced) about which you'll be hearing a lot in the weeks and months to come. Developed jointly by the Almaden Research Center and IBM's Silicon Valley and Boeblingen (Germany) software labs, the IBM Smart Analytics Optimizer (ISAO) is a business intelligence query-acceleration system that network-attaches to a mainframe server running DB2 for z/OS. The way it works: using a GUI, a DBA copies a portion of a data warehouse (one or more star schemas -- fact tables and their associated dimension tables) to the ISAO (in effect, you set up a data mart on the ISAO). Thereafter, queries that are submitted to DB2 (the ISAO is transparent from a user perspective) will be routed by DB2 to the ISAO if 1) the queries reference tables that have been copied to the ISAO, and 2) DB2 determines that they will run faster if executed on the ISAO. Here's the interesting part: the longer a query would run if executed in the DB2 system, the greater the degree of acceleration you'll get if it runs on the ISAO.

When I say "acceleration," I mean big-time speed-up, as in ten to one hundred times improvement in query run times (the ISAO "sweet spot" is execution of queries that contain aggregation functions -- such as AVERAGE and SUM -- and a GROUP BY clause). How is this accomplished? The ISAO hardware is commodity stuff: multi-core microprocessors with a lot of server memory in a blade center configuration (and several of these blade centers can be tied together in one ISAO system). The query processing software that runs on the ISAO hardware is anything but commodity -- it's a built-from-the-ground-up application that implements a hybrid row-store/column-store in-memory data server. Want DBA ease-of-use? You've got it: there's no need to implement indexes or materialized query tables or any other physical database design extras in order to get great performance for otherwise long-running queries. This is so because the ISAO does a simple thing -- scan data in one or more tables -- in a very advanced, multi-threaded way to deliver consistently good response time (typically less than 10 seconds) for most any query sent its way by DB2. [Caveat: As the ISAO does no I/Os (all data that it accesses is always in memory), it runs its CPUs flat-out to get a single query done as quickly as possible before doing the same for the next query; thus, if queries are sent to the ISAO by DB2 at a rate that exceeds the rate at which the ISAO can process the queries, response times could increase to some degree -- this is just basic queuing theory.]

The ISAO is what's known as disruptive technology. As previously mentioned, you'll soon be hearing a lot more about it (the IOD session I attended was a "technology preview"). I'll be watching that space for sure.

A DB2 for z/OS data warehouse tune-up - Nin Lei, who works at IBM's System z benchmark center in Poughkeepsie (New York), delivered a presentation on performance management of a data warehouse mixed query workload ("mixed" referring to a combination of short- and long-running queries). A couple of the points made in the course of the session:

You might want to cap the degree of query parallelization on the system - There is a DB2 for z/OS ZPARM parameter, PARAMDEG, that can be used to set an upper limit on the degree to which DB2 will split a query for parallelized execution. For some time now, I've advocated going with a PARAMDEG value of 0 (the default), which leaves the max degree of parallelization decision up to DB2. Nin made a good case for setting PARAMDEG to a value equal to twice the number of engines in the z/OS LPAR in which DB2 is running. I may rethink my PARAMDEG = 0 recommendation.
The WLM_SET_CLIENT_INFO stored procedure is available on the DB2 for z/OS platform, too - This stored procedure, previously available only on the DB2 for Linux/UNIX/Windows and DB2 for System i platforms, was added to mainframe DB2 V8 and V9 environments via the fix for APAR PK74330. WLM_SET_CLIENT_INFO can be used to change the value of the so-called client special registers on a DB2 for z/OS server (CURRENT CLIENT_ACCTNG, CURRENT CLIENT_USERID, CURRENT CLIENT_WRKSTNNAME, and CURRENT CLIENT_APPLNAME). This capability provides greater flexibility in resource management and monitoring with respect to a query workload.

For fans of Big Memory - Chris Crone, Distinguished Engineer and member of the DB2 for z/OS team at IBM's Silicon Valley Lab, gave a presentation on 64-bit addressing in the mainframe DB2 environment. He said that development of this feature was motivated by a recognition that memory had become the key DB2 for z/OS system resource constraint as System z engines became faster and more numerous (referring to the ability to configure more central processors in a single z/OS image). Big DB2 buffer pools are needed these days because even a really fast I/O operation (involving a disk subsystem cache hit versus a read from spinning disk) can be painfully slow when a single mainframe engine can execute almost 1000 million instructions per second.

Here are a few of the many interesting items of information provided in Chris's session:

You can currently get up to 1.5 TB of memory on a System z server. Expect memory sizes of 3 TB or more in the near future.
The largest buffer pool configuration (aggregate size of all active buffer pools in a subsystem) that Chris has seen at a DB2 for z/OS site is 40 GB.
It is expected that the default RID pool size will be 400 MB in the next release of DB2 for z/OS (the RID pool in the DB2 database services address space is used for RID sort operations related to things such as multi-index access, list prefetch, and hybrid join).
The maximum size of the EDM pool components (EDM pool, skeleton pool, DBD pool, and statement pool) is expected to be much larger in the next release of DB2 (commonly referred to as DB2 X -- we'll get the actual version number at announcement time).
In the DB2 X environment, it's expected that 80-90% of the virtual storage needed for DB2 threads will be above the 2 GB "bar" in the DB2 database services address space. As a result, the number of threads that can be concurrently active will go way up with DB2 X (expect an upper limit of 20,000 for a subsystem, versus 2000 today).
DB2 data sharing groups (which run in a parallel sysplex mainframe cluster) could get really big -- IBM is looking at upping the limit on the number of DB2 subsystems in a group (currently 32).
Solid state storage is going to be a big deal -- the DB2 development team is looking at how to best leverage this technology.

After Chris's session, it was off to the airport to catch the red-eye back to Atlanta. I had a great week at IOD, and I'm looking forward to another great conference next year.

Previous Posts

Tuesday, November 3, 2009

IBM IOD 2009 - Day 4 (Belated Re-Cap)

0 Comments:

Post a Comment