Robert's Blog

Monday, January 26, 2009

DB2 Data Sharing and Row-Level Locking

As a member of IBM's DB2 National Technical Support team in the 1990s, I worked with data sharing technology before DB2 Version 4 (which introduced the feature) was generally available. An important part of my job at the time was to boost people's understanding of data sharing, and that included clearing up related misperceptions. Some of those misperceptions persist to this day, most notably the one about row-level locking in a data sharing environment. Just last week I was reminded that a number of people, including some who work for organizations that run DB2 for z/OS in data sharing mode, believe that row-level locking cannot be used in a data sharing system. I'm not talking about people who think that you shouldn't use row-level locking with DB2 data sharing. I'm talking about people who feel that you can't use row-level locking in that environment, even if you want to - that a DB2 data sharing system somehow can't process a request to globally lock table data at a row level. This is most definitely not the case, and I hope to make that clear via this blog entry.

First, a declaration: there is NO kind of lock request that DB2 can make on a program's behalf in a standalone-subsystem environment that it can't handle in a data sharing group on a parallel sysplex ("parallel sysplex" being IBM's term for the mainframe-cluster technology that provides the infrastructure on which DB2 data sharing operates). Tablespace-level locks? Table-level? Page-level? Row level? All are available whether DB2 is running in data sharing or non-data sharing mode. To put it another way, at the logical level (i.e., from an application programming perspective) there is no difference between locking in a data sharing environment and locking in a standalone DB2 system.

This is not to say that DB2 data sharing doesn't carry with it any changes with regard to data locking. In fact data sharing introduced many locking changes, but these basically amounted to additions to the lock-related operations seen in a standalone DB2 environment. To the "regular" locks that exist in both data sharing and non-data sharing systems (and which are known as "logical locks" in data sharing-speak), DB2 data sharing adds what are called physical locks. This is where the row-level locking connection comes in.

When there is inter-DB2 read/write interest in database objects (i.e., when multiple members of a DB2 data sharing group have a given database object open for read/write - versus read-only - access), DB2 will issue page-level physical lock requests for pages that cannot be locked at the logical level. These include index pages, space map pages, and pages of tablespaces defined with LOCKSIZE ROW. Thus, an important point can be made: while concomitant physical lock (aka P-lock) activity makes row-level locking more expensive in a data sharing environment than it otherwise would be (as will be explained momentarily), row-level locking does not introduce page P-locking to a data sharing system; rather, it simply adds to the page P-lock activity that would be there anyway because of P-lock requests related to index and space map pages.

So, about the overhead cost of row-level locking in a DB2 data sharing environment: let's suppose that a data sharing group has two members, DB2A and DB2B. Tablespace TS1 in the shared database is defined with LOCKSIZE PAGE, and tablespace TS2 with LOCKSIZE ROW. Suppose further that both tablespaces are group buffer pool-dependent (meaning, basically, that both DB2A and DB2B have read/write interest in both of these database objects). Program ProgA1 on DB2A issues an UPDATE for row R1 in table T1 in tablespace TS1, and program ProgA2 (also connected to DB2A) does the same for row R2 in table T2 in tablespace TS2. Right after that, programs ProgB1 and ProgB2, running on DB2B, issue updates for rows R11 in table T1 and and R22 in table T2, respectively. Rows R1 and R11 are on the same page, P1, in table T1, and rows R2 and R22 are in the same page, P2, in table T2. In the case of program ProgB1, DB2B requests for it an X-type logical lock on page P1 of table T1. It can't immediately get that lock, because ProgA1 already has it (assuming that ProgA1 hasn't yet issued a COMMIT). ProgB1 waits, and gets the requested X-type logical lock on page P1 after ProgA1 commits its update operation. Meanwhile, DB2B requests for ProgB2 an X-type logical lock on row R22 in table T2, and ProgB2 gets this lock because ProgA2 has an X-type logical lock on a different row, R2, that happens to be (as previously mentioned) in the same data page. Because row-level locking is in effect for tablespace TS2, DB2B also requests an X-type P-lock (physical lock) on page P2 in table T2. It doesn't initially get that P-lock, because DB2A holds the X-type P-lock on page P2; however, DB2A transfers the X-type P-lock on page P2 to DB2B (this can be done for P-locks, even before a program has issued a COMMIT). As part of this page P-lock transfer, DB2A sends page P2 (including the updated row R2) to DB2B by way of the group buffer pool associated with tablespace TS2. When DB2B gets this page in its local buffer pool, the update of row R22 by ProgB2 proceeds.

With row-level locking in effect, there is some additional CPU consumption, versus the page-level locking scenario, owing to extra global lock requests (page-level P-locks in addition to row-level logical locks) and the on-demand transfer of page P2 from DB2A to DB2B (in the page-level locking scenario, the page updated on DB2A will get to DB2B as part of a process related to the committing of the data change made on DB2A). This brings up another important point: in many cases, row-level locking can be used in a non-data sharing environment with virtually no increase in CPU cost versus page-level locking; however, there will almost always be an extra CPU cost to be paid for row-level locking in a data sharing system. So, can you use row-level locking with DB2 data sharing? Yes. Should you use row-level locking with DB2 data sharing? That depends on whether the extra CPU cost so incurred provides a justifying benefit by eliminating or at least mitigating a concurrency problem.

How would you make that call? Here's what I would do: in a data sharing environment, I'd have all my tablespaces defined with LOCKSIZE PAGE (or LOCKSIZE ANY, which makes page-level locking the first option for DB2 but enables automatic escalation to a less-granular locking scope in some situations). Absent any serious contention problems (i.e., if your DB2 monitor does not show a lot of timeouts or deadlocks), that's where I'd leave things. If I did have a serious locking problem, I'd trace it to the target tablespaces (DB2 provides you with the information to do this). With the contention-challenged tablespaces identified, I'd consider going with row-level locking for those tablespaces, and I'd probably effect that change if the programs getting the deadlocks and/or timeouts couldn't be readily modified so as to resolve the concurrency issues. If using row-level for the tablespaces caused only a small increase in CPU overhead for programs accessing data in the tablespaces, I'd consider the problem solved (DB2 accounting trace data displayed via your monitor will give you cost-comparison information). If the overhead of row-level locking turned out to be too high, I'd either stay with it and keep the tablespaces from becoming group buffer pool dependent (there are ways to ensure this), or go back to page-level locking and try other approaches to resolving the concurrency problem (one of which could be the choice of a different clustering key for associated tables).

OK, quite a few words to make a simple point: in a DB2 data sharing environment, you can use row-level locking, but you should only use it if you need to in order to resolve a concurrency problem. If you do have a concurrency problem in a data sharing system, keep in mind that going with row-level locking is probably one of several mitigating actions that could be taken. Go with it if it's your best option.


Anonymous Anonymous said...

In general what lock levels would you recommend when a web app is hit by 1000s of people in an hour, and hit same transaction tables but never lock same rows?

November 1, 2011 at 9:04 AM  
Blogger Robert Catterall said...

Assuming that you're talking about a Web app backed by a DB2 for z/OS database, I'd be inclined to use page-level locking UNLESS that level of granularity caused a lot of timeout and/or deadlock errors (-911 DB2 SQL code, also known by SQLSTATE 40001). IF you were using page-level locking and saw a lot of DB2 timeout/deadlock errors, I'd consider the use of row-level locking, but only for the table space(s) associated with the timeout/deadlock errors (the table space is named in the text of the error message). For other table spaces, I'd continue to use page-level locking.

November 1, 2011 at 1:13 PM  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home