"What is DB2 data sharing?"
When data sharing was delivered with DB2 for z/OS (OS/390 at the time) Version 4, more than ten years ago, I was part of IBM's DB2 National Technical Support team at the Dallas Systems Center. I was tasked at the time with learning all that I could about data sharing, so that I could help IBM customers (and IBMers) understand, implement, and successfully use the technology. That was a really fun time in my IT career because people had lots of questions about DB2 data sharing, and I very much enjoyed answering those questions (sometimes in-person at locations in Asia, Australia, Europe, and throughout the USA).
Over time, as DB2 data sharing knowledge became more widely diffused, and as people such as my good friend Bryce Krohn shouldered more of the knowledge-transfer load, there were fewer questions for me to answer -- thus my delight in fielding a query a couple of weeks ago. The questioner was the general manager of a company that provides IT training services, and he and I were talking about a request from one of his clients for data sharing education. "So," asked Mr. General Manager, "what is DB2 data sharing?"
[Before I tell you what I told the GM, I'll provide a brief technical response to the question. Data sharing is a shared-disk scale-out solution for DB2 running on an IBM mainframe cluster configuration called a parallel sysplex. In a data sharing group, multiple DB2 subsystems co-own and have concurrent read/write access to a database (there is also a single catalog -- DB2's metadata repository -- that is shared by the DB2 subsystems in the group). Specialized devices called coupling facilities -- existing as standalone boxes and/or as logical partitions within mainframe servers -- provide the shared memory resources in which group buffepools (for data coherency protection) and the global lock structure reside (the latter provides concurrency control and data integrity preservation).]
I thought for a second or two, and told the GM: "DB2 data sharing is the most highly available, highly scalable data-serving platform on the market."
Now, it's been more than eight years since I last worked for IBM, so that wasn't a sales line. It is, I think, a factually correct answer. Consider those two qualities, availability and scalability. The thing about data sharing is, you start with the highest-availability standalone data-serving platform -- an IBM mainframe running the z/OS operating system -- and you make it even MORE highly available by clustering it in a shared-disk configuration. Does anyone dispute my contention that DB2 on a mainframe is the alpha dog when it comes to uptime in a non-clustered environment? Seems to me that fans of competing platforms speak of "mainframe-like" availability. The mainframe is still the gold standard with respect to rock-solid reliability -- the platform against which others are measure d. It's legendary for its ability to stay up under the most extreme loads, able to cruise along for extended periods of time at utilization levels well over 90%, with mixed and widely-varying workloads, keeping a huge number of balls in the air and not dropping a single one. You put several of those babies in a shared-disk cluster, and now you have a system that lets you apply maintenance (hardware or software) without a maintenance window. It also limits the scope of component failures to a remarkable extent, and recovers so quickly from same that a DB2 subsystem or a z/OS image or a mainframe server can crash (and keep in mind that such events are quite rare) and the situation can be resolved (typically in an automated fashion -- the system restores itself in most cases) with users never being aware that a problem occurred (and I'm talking about the real world -- I have first-hand experience in this area).
How about scalability? You might think that the overhead of DB2 data sharing would reach unacceptable levels once you get to four or five servers in a parallel sysplex, but you'd be wrong there. The CPU cost of data sharing is often around 10% or so in a 2-way data sharing group (meaning that an SQL statement running in a DB2 data sharing group might consume 10% more CPU time than the same statement running in a standalone DB2 environment). Add a third DB2 to the group, and you might add another half percentage point of overhead. Add a fourth member, and again overhead is likely to increase by less than one percent. So it goes, with a fifth DB2 member, a sixth, a seventh, an eighth, a ninth -- the increase in overall system capacity is almost linear with respect to the addition of an extra server's cycles. Again, I'm talking about the real world, in which organizations treat a data sharing group like a single-image system and let the parallel sysplex itself distribute work across servers in a load-balancing way, with no attempt made to establish "affinities" between certain servers and certain parts of the shared database in order to reduce overhead. Intense inter-DB2 read/write interest in database objects (tables and indexes) is the norm in data sharing systems I've worked on, and that's so because DB2 (and z/OS and Coupling Facility Control Code) handles that kind of environment very well. DB2 and parallel sysplex architecture allow for the coupling of up to 32 mainframes in one data-sharing cluster, and no one has come close to needing that amount of compute power to handle a data-serving workload (keep in mind that the capacity of individual mainframes continues to grow at an impressive clip).
Hey, there are plenty of good data-serving platforms out there, but when it comes to the ultimate in availability and scalability, the king of the hill is a DB2 for z/OS data sharing group on a parallel sysplex.
Any questions?
Over time, as DB2 data sharing knowledge became more widely diffused, and as people such as my good friend Bryce Krohn shouldered more of the knowledge-transfer load, there were fewer questions for me to answer -- thus my delight in fielding a query a couple of weeks ago. The questioner was the general manager of a company that provides IT training services, and he and I were talking about a request from one of his clients for data sharing education. "So," asked Mr. General Manager, "what is DB2 data sharing?"
[Before I tell you what I told the GM, I'll provide a brief technical response to the question. Data sharing is a shared-disk scale-out solution for DB2 running on an IBM mainframe cluster configuration called a parallel sysplex. In a data sharing group, multiple DB2 subsystems co-own and have concurrent read/write access to a database (there is also a single catalog -- DB2's metadata repository -- that is shared by the DB2 subsystems in the group). Specialized devices called coupling facilities -- existing as standalone boxes and/or as logical partitions within mainframe servers -- provide the shared memory resources in which group buffepools (for data coherency protection) and the global lock structure reside (the latter provides concurrency control and data integrity preservation).]
I thought for a second or two, and told the GM: "DB2 data sharing is the most highly available, highly scalable data-serving platform on the market."
Now, it's been more than eight years since I last worked for IBM, so that wasn't a sales line. It is, I think, a factually correct answer. Consider those two qualities, availability and scalability. The thing about data sharing is, you start with the highest-availability standalone data-serving platform -- an IBM mainframe running the z/OS operating system -- and you make it even MORE highly available by clustering it in a shared-disk configuration. Does anyone dispute my contention that DB2 on a mainframe is the alpha dog when it comes to uptime in a non-clustered environment? Seems to me that fans of competing platforms speak of "mainframe-like" availability. The mainframe is still the gold standard with respect to rock-solid reliability -- the platform against which others are measure d. It's legendary for its ability to stay up under the most extreme loads, able to cruise along for extended periods of time at utilization levels well over 90%, with mixed and widely-varying workloads, keeping a huge number of balls in the air and not dropping a single one. You put several of those babies in a shared-disk cluster, and now you have a system that lets you apply maintenance (hardware or software) without a maintenance window. It also limits the scope of component failures to a remarkable extent, and recovers so quickly from same that a DB2 subsystem or a z/OS image or a mainframe server can crash (and keep in mind that such events are quite rare) and the situation can be resolved (typically in an automated fashion -- the system restores itself in most cases) with users never being aware that a problem occurred (and I'm talking about the real world -- I have first-hand experience in this area).
How about scalability? You might think that the overhead of DB2 data sharing would reach unacceptable levels once you get to four or five servers in a parallel sysplex, but you'd be wrong there. The CPU cost of data sharing is often around 10% or so in a 2-way data sharing group (meaning that an SQL statement running in a DB2 data sharing group might consume 10% more CPU time than the same statement running in a standalone DB2 environment). Add a third DB2 to the group, and you might add another half percentage point of overhead. Add a fourth member, and again overhead is likely to increase by less than one percent. So it goes, with a fifth DB2 member, a sixth, a seventh, an eighth, a ninth -- the increase in overall system capacity is almost linear with respect to the addition of an extra server's cycles. Again, I'm talking about the real world, in which organizations treat a data sharing group like a single-image system and let the parallel sysplex itself distribute work across servers in a load-balancing way, with no attempt made to establish "affinities" between certain servers and certain parts of the shared database in order to reduce overhead. Intense inter-DB2 read/write interest in database objects (tables and indexes) is the norm in data sharing systems I've worked on, and that's so because DB2 (and z/OS and Coupling Facility Control Code) handles that kind of environment very well. DB2 and parallel sysplex architecture allow for the coupling of up to 32 mainframes in one data-sharing cluster, and no one has come close to needing that amount of compute power to handle a data-serving workload (keep in mind that the capacity of individual mainframes continues to grow at an impressive clip).
Hey, there are plenty of good data-serving platforms out there, but when it comes to the ultimate in availability and scalability, the king of the hill is a DB2 for z/OS data sharing group on a parallel sysplex.
Any questions?