Robert's Blog

Wednesday, September 30, 2009

DB2-Managed Disk Space Allocation - An Overlooked Gem?

Last week I was again teaching a DB2 for z/OS database administration course, this time for the half of the client's DBA team that minded the store while their colleagues attended the prior week's class. As I mentioned in my last blog entry, the organization had recently completed the migration of their production system to DB2 for z/OS Version 8, so we spent a good bit of class time discussing new features and functions delivered in that release of the product. In such a situation, it's always interesting to see what really grabs the attention of class participants. This time, a top attention-grabber was the automatic data set extent-size management capability introduced with DB2 V8 and enabled by default with DB2 9 for z/OS (more on this DB2 9 change momentarily). Facial expressions during our talks on this topic communicated the unspoken question, "DB2 can do that?"

It does seem almost too good to be true, especially to a DBA who has long spent more time than desired in determining appropriate primary and - especially - secondary space allocation quantities for DB2-managed (i.e., STOGROUP-defined) tablespace and index data sets. As you're probably aware, a non-partitioned tablespace (this will typically be a segmented tablespace) can grow to 64 GB in size by occupying thirty two 2 GB data sets; however, the first 2 GB data set has to fill up before DB2 can go to a second (and the second has to fill up before DB2 can go to a third, and so on), and DB2 won't fill up that 2 GB data set if it runs into the data set extent limit (255 extents). The same applies to non-partitioned indexes, except that the data set size is not set at 2 GB - rather, it's determined by the PIECESIZE specification on a CREATE INDEX or ALTER INDEX statement. In either case, you want to set the primary and secondary space allocation quantities (PRIQTY and SECQTY on the CREATE or ALTER statement for the object) so that the object can reach the data set size limit (and thus expand to multiple data sets) without first running into the aforementioned limit on the number of extents for a data set. A partition of a partitioned tablespace or partitioned index isn't going to spread across multiple data sets, but you still want to be able to reach the maximum data set size (specified via DSSIZE on the CREATE TABLESPACE statement) before hitting the data set extent limit.

What should your PRIQTY and SECQTY amounts be? Too small, and you might hit the extent limit before reaching the desired maximum data set size. Too large, and you might end up with a good bit of wasted (allocated and unused) space in your tablespaces and indexes. Even if reaching a maximum data set size is not an issue (and it won't be for smaller tablespaces and indexes), you still want to strike a balance between too many extents and too much unused space. Multiply this by thousands of objects in a database, and you've got a big chunk of work on your hands.

Enter the "sliding scale" secondary space allocation algorithm introduced with DB2 for z/OS V8. Here's how it works: First, set three ZPARM parameters as follows (all are on the DSNTIP7 panel of the DB2 installation CLIST):
  • TSQTY (aka "table space allocation"): 0 (this is the default value)
  • IXQTY (aka "index space allocation"): 0 (this is the default value)
  • MGEXTSZ (aka "optimize extent sizing"): YES (for DB2 V8 the default value is NO, and for DB2 V9 it's YES - this is what I meant by my "enabled by default" comment regarding DB2 V9 in the opening paragraph of this blog entry)
Then - and here's the really good part - you alter an existing tablespace or index with PRIQTY -1 and SECQTY -1, and voila! DB2 will manage primary and secondary allocation sizes for you. Specifically, the primary allocation for the tablespace or index will be 1 cylinder (thanks to your having specified 0 for the TSQTY and IXQTY ZPARMs, as recommended above), and the initial secondary space allocation will also be 1 cylinder (note that the primary space allocation for a LOB tablespace in this scenario would be 10 cylinders versus 1). After that, subsequent extents - up to the 127th - for the data set will be increasingly larger, with the sizes determined by a sliding-scale algorithm used by DB2. The size of extents beyond the 127th will be fixed, depending on the initial size of the data set: 127 cylinders for data set sizes up to 16 GB, and 559 cylinders for 32 GB and 64 GB data sets. For new tablespaces and indexes, DB2-managed primary and secondary space allocation sizing is enabled by simply not including the PRIQTY and SECQTY clauses in the CREATE TABLESPACE or CREATE INDEX statement.

Pretty great, huh? No muss, no fuss with regard to space allocation for DB2-managed data sets. Best of all, it works. When you let DB2 handle data set extent sizing, it is highly unlikely that you'll hit the data set extent limit before reaching the maximum data set size, and the start-small-and-slowly-increase approach to secondary allocation requests keeps wasted space to a minimum. What I find interesting is the fact that many DB2 people don't know about this great DBA labor-saving device. In a recent thread on the DB2-L discussion list, Les Pendlebury-Bowe, a DB2 expert based in the UK, referred to DB2-managed data set space allocation as "a real success story that never seems to get much press." In that same thread, DB2 jocks Myron Miller (a Florida-based consultant) and Roger Hecq (with financial firm UBS in Connecticut) added their endorsements of DB2-managed space allocation. Myron noted that with a specification of PRIQTY -1 and SECQTY -1, "I never have to even worry about the number of rows at any time in the tablespace" (and keep in mind that the -1 values are for ALTER TABLESPACE or ALTER INDEX - as noted previously, just don't specify PRIQTY and SECQTY on CREATE statements to enable DB2 management of space allocation for new objects). Roger stated that he's been pleased to "let DB2 do all the work" related to disk space management, and that his organization has "not had any issues" with their use of this DB2 capability.

[By the way, if you are not a DB2-L subscriber, you should be. It's a great - and free - DB2 technical resource.]

There you have it - a real gem of a DB2 feature that has been overlooked by plenty of DB2 people. It's easy to use, and people who have implemented DB2 management of data set space allocation like it a lot. Give it a go.

Sunday, September 20, 2009

Of DB2 for z/OS Indexes, PartitioneDUH and Otherwise

Last week I taught a DB2 for z/OS database administration class for an organization that fairly recently completed the migration of their production environment to DB2 V8 in Conversion Mode. That being the case, a good bit of class time was spent in discussion of new features and functions introduced with DB2 V8. One topic of particular interest to the class attendees was table-controlled partitioning. Our conversations on this subject became rather amusing, for reasons I'll describe momentarily. First, a little background.

For years and years, there was only one way to partition a mainframe DB2 table: you created a tablespace with the NUMPARTS clause (through which you specified the number of partitions into which a table's rows would be divided), created a table in the tablespace, and created on the table a partitioning index. The index controlled the partitioning of the data, as it was at this level that you defined the partitioning key (comprised of one or more columns) and the limit values for each partition (e.g., with a partitioning key of CUSTOMER_NUMBER, partition 1 might hold rows with a CUSTOMER_NUMBER value of '05000000' or less, while partition 2 would hold rows with CUSTOMER_NUMBER values larger than '0500000' and less than '1000000', and so on).

The table-controlled partitioning feature of DB2 V8 was a big improvement over index-controlled partitioning, largely because it provided users with the ability to partition data with one key while clustering data within partitions by way of another key. With index-controlled partitioning, the partitioning index was also the clustering index, like it or not. With table-controlled partitioning, assignment of rows to partitions based on a key value is a table-specified thing, and clustering is an index-defined thing. So, if you want rows to be divided among partitions by date (perhaps one month per partition), you specify that on the CREATE TABLE statement, and if you want rows within a partition (i.e., for a given month) to be ordered by CUSTOMER_NUMBER, you accomplish that by defining an index, with the CLUSTER clause, on the CUSTOMER_NUMBER column of the table. That kind of set-up is often ideal for a data warehouse database, allowing a query searching across several months to be parallelized by DB2, while providing good locality of reference for the split queries within partitions if a certain customer number or set of customer numbers is targeted.

Table-controlled partitioning also delivered very useful features such as dynamic addition of a partition to a table, and a ROTATE PARTITION FIRST TO LAST option of ALTER TABLE that enables the use of a fixed number of partitions for, say, a rolling 52 weeks of data (every week, data in the "oldest week" partition is purged and archived, and the empty partition is adjusted to receive data for the upcoming week). Still, with all this goodness there comes the task of learning to speak of indexes on table-controlled partitioned tables in a new way. Formerly, a partitioned tablespace had one partitioning index - that being the one by which the partitioning key and partition limit key values were specified. The partitioning index was also the only one on the table that could itself be partitioned (i.e., with index entries spread across index partitions that matched up with the corresponding partitions of the tablespace). With a table-controlled partitioned table, any index that starts with the partitioning key column (or columns) is referred to as a partitioning index, and any index that is defined with the PARTITIONED clause (whether or not it starts with the partitioning key) will have its entries divided among index partitions that correspond to the table's partitions.

Thus, the amusing (to me) discussions in the aforementioned class about indexes defined on table-controlled partitioned tables: to make ourselves clear, we very strongly emphasized the last consonant sound of the type of index about which we were speaking, as in:

DBA: Is a partitioninGUH index necessarily partitioneDUH?

Me: No. If a partitioninGUH index is defined without the partitioneDUH clause, it will not be partitioneDUH.

DBA: Why would someone create a partitioninGUH index without making it partitioneDUH?

Me: Good question. I'd think that you'd always want a partitioninGUH index to be partitioneDUH.

This kind of exchange would crack me up (I'm easily amused). Anyway, the discussions were much appreciated on my part, as an interactive class is a fun class for me. It's also rewarding to help people explore the possible uses of new DB2 features (think about the new flexibility of partitioning in a DB2 V8 or V9 environment, especially in light of the fact that you can now specify up to 4096 partitions for a single table - talk about slicing and dicing). Note, if you're on or moving to DB2 for z/OS V9, that all of this partitioninGUH and partitioneDUH talk is relevant to the new partition-by-range universal tablespaces, but not to partition-by-growth universal tablespaces, as there is no sense of a partitioning key with respect to the latter. I wrote about partition-by-growth tablespaces in a previous entry, and you can check that out if you're interested in learning more about this new type of database object.

That's all for now. Thanks for stopping by the bloGUH.

Monday, September 7, 2009

OLTP and BI on the Same DB2 for z/OS System (Part 2)

A few days ago, I posted part one of a 2-part entry on the subject of using the same DB2 for z/OS system (meaning a single logical DB2 system image - it could be a multi-subsystem DB2 data sharing group on a parallel sysplex) for both OLTP and business intelligence (BI) workloads. In that entry I focused on minimizing OLTP-BI workload contention on a number of levels: the disk subsystem, the DB2 buffer pools, DB2 locks, and CPU. With regard to that last contention category, I mentioned that "an important aspect of managing CPU contention between OLTP and BI applications running on the same system is the management of DB2 query parallelization, particularly as it pertains to the BI queries." In this part 2 post, I'll expand on that statement.

DB2 query parallelization is a very good thing when it comes to improving the performance of queries that involve scanning large numbers of data and/or index pages. In case you're not familiar with this DB2 feature or you need a little refresher, query parallelism has been around for quite some time, having been delivered in the mid-1990s with DB2 for z/OS Version 4. The technology enables DB2 to take a particular query and - on determining that parallelization would be beneficial for run-time reduction - split it into several tasks that can be executed concurrently on different engines within a mainframe server (or even on several servers, if we're talking about sysplex query parallelism in a DB2 data sharing group). Depending on the nature of the query, DB2 might start returning rows to the requester as they are qualified by the split queries, or the split-query result sets may be consolidated before any rows are returned (as when a result set sort is required or an aggregate function such as SUM is utilized). The larger the number of pieces into which a query is split, the greater the potential is for better response time.

Generally speaking, parallelized queries split along tablespace partition lines, so greater degrees of parallelism can be expected when target tablespaces have a lot of partitions (assuming that qualifying rows will come from multiple partitions) and the mainframe server has a pretty good number of fast CPUs (the number of CPUs is, of course, NOT an upper bound on the degree of query parallelization, as the split queries are likely to be I/O bound and the CPU portion of these can be interleaved on one engine as I/O wait events occur). So, into how many pieces might DB2 split a query? The answer to that question depends in large part on the setting of two DB2 ZPARMs (i.e., subsystem-level parameters): CDSSRDEF and PARAMDEG.

CDSSRDEF specifies the default value of the CURRENT DEGREE special register for a DB2 subsystem. This value will be 1 if you don't change it, and that means that a DYNAMIC query will not be parallelized by DB2 unless it is preceded by the SQL statement SET CURRENT DEGREE = 'ANY' (a static SELECT statement will be a candidate for parallelization if it's associated with a package bound with the DEGREE(ANY) specification). This default value for CDSSRDEF is the right one for many - and perhaps most - situations because it gives you statement-level control over the use of parallelization by DB2 for dynamic queries (in a BI environment queries tend to be dynamic). Making ALL dynamic queries candidates for parallelization by setting CDSSRDEF to ANY would increase CPU overhead for your BI workload. Why? Because DB2 would have to consume extra cycles just to determine whether or not parallelization would be beneficial for each and every dynamic query. When the determinations is "no, it would not" (as would likely be the case for a query targeting a non-partitioned table or for a query that would retrieve rows from one partition of a partitioned table), that extra CPU consumption in query optimization will not yield a benefit in terms of query execution time.

That said, sometimes a specification of ANY for CDSSRDEF is necessary for dynamic query parallelization, because the BI queries may be generated by PC-based end-user tools that do not allow for insertion of a SET CURRENT DEGREE = 'ANY' statement. If you have a DB2 data sharing group and your BI queries and your OLTP transactions run on different members, you can have CDSSRDEF = ANY on the BI-supporting subsystem (or subsystems), and CDSSRDEF = 1 on the OLTP-supporting DB2 members. If you have a single DB2 subsystem on which you run OLTP and BI work, what should you do if SET CURRENT DEGREE = 'ANY' is not an option for the BI queries? I'd lean towards setting CDSSRDEF to ANY, and then limiting the degree of parallelization for queries through the PARAMDEG specification in ZPARM. The default value of this parameter is 0, and that means that DB2 will determine the degree of parallelization for a query that it decides to split. I like that default because I feel that DB2 does this well and z/OS is very good at managing a complex and dynamic workload (as when it throttles down the processing resources allocated to a parallelized query in order to accommodate new work entering the system). If, however, an OLTP workload were running on the same DB2 subsystem as the BI workload, I'd want to put a relatively low upper bound on the degree of parallelization for dynamic queries, the better to deliver consistent response times for the OLTP transactions. I might go for something as low as 3 or 4 for PARAMDEG, so that I'd get some significant (if not huge) run-time reduction for some of the BI queries while limiting variations with respect to OLTP transaction execution times.

In addition to placing an upper bound on query parallelization when running OLTP and BI work on the same DB2 subsystem, you might want to think about limiting query parallelization to only a portion of the dynamic queries that run on your system. The DB2 for z/OS resource limit facility (RLF) provides a way to do this. What you can do is create a resource limit specification table (RLST) in which you put one or more rows with a value of '4' in the RLFFUNC column (this disables query parallelism) and the names of packages for which you DO NOT want associated dynamic queries to be parallelized in the RLFPKG column (in the LUNAME column of this RLST, you can have a blank value for the local location, or PUBLIC for TCP/IP-connected remote requesters). In addition (or instead), you could disable query parallelism by authorization ID using the AUTHID column of your RLST. Then you can set CDSSRDEF to ANY and know that dynamic queries associated with packages and/or auth IDs specified in your RLST rows will not be candidates for parallelization.

In summary: take advantage of DB2 query parallelization for your BI queries, but use it conservatively when you have OLTP transactions running on the same subsystem.