A DB2 Person in an Agile Software Development Discussion
I'm posting this week from Orlando, Florida, USA, site of the winter meeting of the SHARE user group (www.share.org). I came here primarily for the purpose of delivering a presentation at the conference (the topic was SOA for DB2 people), but when I attend a conference as a speaker I also try to sit in on some presentations delivered by others (never stop learning, I say). Thus it was that I made my way yesterday afternoon to a session on agile software development. As it turned out, the speaker couldn't attend the conference due to a last-minute travel-related problem. I was thinking that we'd just walk, but the session moderator asked us if we wanted to spend the hour in an informal "free-for-all" discussion centered on agile development. Most of the people in the room liked that idea, and the ensuing discussion actually turned out to be, for me, one the highlights of my day at the conference.
Most of the people in the room were, as you might imagine, software developers. I was the only database person there, and I fessed up to that early on. We had some interesting conversations about the ways in which database people can help their coding colleagues who are wanting to do their thing in an agile way. I'm not an expert on the subject, but I'd say that agile development is largely about iterative development of an application, with lots of smaller-scale releases versus one or very few great big releases. It's also characterized by a cross-functional team-based approach to development, with differently-skilled people (not just programmers, but folks with specialized skills in testing, data, requirements analysis, etc., as well) working together to get each iterative release ready within the defined (usually pretty short) time frame (a good overview of agile software development can be found in wikipedia, at www.wikipedia.org). With respect to the role of data people in an agile development environment, this is what I heard from programmers during our impromptu "birds-of-a-feather" session:
- Database people are very much welcome at the table. All too often, an organization's database specialists are not involved up-front in the development of new applications, and this is not just a matter of application programmers and architects not issuing invitations. Frequently, it's DBAs themselves who pass on opportunities ("not my job"), or it's an IT organizational structure that has DBAs working only to put together development databases in response to requirements that are "thrown over the transom" by programming groups. Sometimes a DBA has to take it upon himself (or herself) to get engaged early on with developers working on an agile application development project, but if that initiative is taken, chances are the programmers and architects will be very pleasantly surprised.
- Speed is of the essence. The programmers might need a data store, but what they want is something done in a hurry. The database developed to facilitate agile development of an application doesn't have to be designed to the nth degree with the production environment in mind. Get something out there fast, and make schema changes (or what have you) as you deem advisable as you go along. Asking the developers to hold off on writing code until a really rigorous logical and physical database design is completed is probably not going to cut it. That's the heavy, old way of doing things.
- Schema abstraction is very much appreciated. The developers want a database, but they'd rather not have to know about the underlying schema. There are various ways to abstract the application-database interface. One way is to package the SQL in server-side stored procedures. These can be called by all kinds of programs in all kinds of environments (definitely including Java and .NET), and to the calling programs DB2 stored procedures don't look any different from stored procedures in other DBMS environments. An even better step might be to expose the stored procedures as Web services (note that this can be done quite readily using IBM's new Data Studio product, which can be downloaded for free if you're going to use it for development purposes - see http://www-306.ibm.com/software/data/studio/). In any case, abstraction is your friend, too, because it enables you to make those database design changes you want to make (perhaps to improve application performance) in a behind-the-scenes way, without changing the interface to the data as the programmers see it.
Some of the other interesting comments made by session participants had to do with the CULTURAL change needed within many organizations to support an agile programming environment. You see, the traditional application development process has people working in a highly specialized way as though they were standing by an assembly line in a manufacturing plant. You know, coders code, database people design databases, testers test, and so on, but instead of working in cross-functional teams they man their posts and work on the stuff that comes their way. It's like put on the tires, put on the tires, put one the tires, etc. in a car factory. The work that goes on ahead of what you do (up the assembly line) and after what you do (down the line) is not your concern. You just make sure that the tires go on right. That approach may create comfort zones for some people ("put the tires on right and you'll be OK"), but it can lead to tunnel vision and to communication that flows too much in a vertical way ("I just do what my manager tells me to do") and too little in a horizontal way. People feel less personally invested in the finished product, and that can result in people not performing up to their potential since breakthrough results often rely in large part on the degree to which people feel an emotional attachment to what they do (i.e., when work really means something to people, work quality and productivity can soar).
The cultural change that may have to precede the establishment of a successful agile software development environment may be analogous to the revolutionary cross-functional team approach to automobile manufacturing introduced by Japanese car companies two or three decades ago. The face-to-face work done by cohesive groups of differently-skilled people engaged in the start-to-finish building of automobiles had a lot to do with the quantum leaps in quality, efficiency, and innovation seen in those factories. It set the standard that the rest of the auto-manufacturing world subsequently had to work towards. Similarly, agile development can be hindered, if not outright emasculated, by an emphasis within IT organizations on hierarchy, turf definition and protection, and work silo-ization. Of course, big-time organizational change has to be led from the top, but that can be challenging because many at the top are products of the legacy way of doing things. In large part, change may come down to trust: can technical professionals be trusted to perform as needed in roles that are less rigid and less precisely defined than the historical norm, with more floor-level leadership and initiative and less in the way of management-imposed structure? Personally, I believe that the rewards to an organization that can come from extending that kind of trust to the IT tech folks - from giving cross-functional teams an end goal and setting them lose to achieve it using their own creativity and drive - can be enormous.
Not a bad way to spend an hour at a conference, eh?
DB2 for LUW Label-Based Access Control
Last week, I delivered a presentation on DB2 "ultra-availability" at a meeting of SIRDUG, the DB2 regional user group (RUG) in Charlotte, North Carolina, USA. One of the things that I really like about presenting at RUG meetings is hearing what other presenters on the agenda have to say. At the SIRDUG meeting I was treated to a very interesting presentation on DB2 for Linux/UNIX/Windows (LUW) label-based access control (aka LBAC - introduced with DB2 for LUW V9.1 and enhanced in DB2 V9.5). The presenter was Roger Sanders, a well-known DB2 author and consultant who recently joined the team at EMC.
I'd heard of label-based access control, but I hadn't thought much about how organizations would leverage the technology to their advantage (I was more focused on other capabilities of DB2 V9 for LUW, such as pureXML and Deep Compression). Roger's presentation opened my eyes.
I won't try to provide in this post the comprehensive overview of DB2 V9 LBAC contained in Roger's presentation (he'll be presenting it at the 2008 IDUG North American Conference in May, and he covered the topic in an article published last year in DB2 Magazine), but I'll give you a real-world example of a data protection situation that could be handled in an effective, robust, and straightforward way with DB2 LBAC, and which otherwise would pose a considerable challenge.
Consider a company's customer care application. The data for this application is stored in a relational database, and in that database is a table with lots of data rows. Some of the rows in the table can be viewed by customer care representatives (the people responding to phone calls from the organization's customers). Those plus some other rows can be viewed by the supervisors of the customer care reps. The Director of Customer Care Operations can view all the rows that are accessible to the Customer Care supervisors, plus additional rows.
Without LBAC, how would you deal with this situation? If the application SQL is dynamic, you could create a view for the customer care reps that excludes rows (and maybe columns, as well) that can't be accessed by this group of employees. You'd create another view for the supervisors, and another for the Director of Customer Care Operations. But the views can't all have the same fully qualified name, can they? So, you might use a different high-level qualifier for each of these views, and use fully-qualified names in the dynamic SQL statements (with the application providing the right high-level qualifier based on the user's ID). If the application SQL is static, you can again go with multiple views with differently-qualified names, but you can use unqualified view names in the application SQL statements if you bind programs once for each view (placing the packages in different collections whilst supplying for each the appropriate high-level qualifier via the QUALIFIER option of bind) and use SET CURRENT PACKAGESET to point to the right collection (and thus the right view) at execution time (again, this requires that the application recognize the user type and provide the appropriate collection name in the SET CURRENT PACKAGESET statement).
You could also do without the views and have the application handle the security. Basically, the code would check a DB2 table (or some file) to determine the role of a given user and, based on that, use one set of static SQL statements or another (or build dynamic SQL statements of one type or another) so that restricted rows (and columns, if applicable) are not presented to the user.
There is a hassle factor for each of these approaches. Use of views means more work for DBAs, and application-based security puts more on the plate of developers who'd prefer to work on business functionality versus who-can-do-what logic (and who don't want to be concerned with different object qualifiers for different users if views are used). More important than the hassle factor is the lack of real robustness inherent in these solutions. The application-based approach to security makes an internal breach possible due to the malevolent coding of a disgruntled developer. Both the application-based and view-based approaches overlook the fact that certain DB2 support people - such as those with SYSADM or DBADM authority - can view ALL OF THE ROWS in ANY table in the database. I'm sure that you trust your SYSADMs and DBADMs, and probably with good reason, but you should know (if you don't already) that many (if not most) serious data breaches result from the actions of company insiders.
Not a pretty picture. Now, add label-based access control to the mix. With LBAC, DB2 itself can have knowledge of an installation-defined hierarchy of user categories (a simple straight-line hierarchy or one of a more complex nature) that have different data access levels, or a non-hierarchical set of such categories. Not only that, but the DB2 database can contain information about several such user-category-based data access arrangements (referred to as "security label components" in LBAC lingo). Security policies that reference these security label components are defined, as are security labels that reference policies and elements of components. Security labels are assigned to rows in a table, and they are granted to users. Once this is done, when a user attempts to access a given row in an LBAC-protected table, such access will be allowed by DB2 only if the row's label protection matches up with the user's label (in addition to rows, table columns can be label-protected).
This may seem a little complicated at first, but a lot of that has to do with the new nomenclature that goes along with LBAC. Once you understand the terminology, it's fairly straightforward. Sure, there's up-front work to set this up, but once LBAC is in place for a database, ongoing maintenance is not an onerous task. And here's the really good part (depending on your point of view): data in an LBAC-protected table can't be accessed by ANY user lacking the required data access label, even if the user in question has SYSADM or DBADM authority. Furthermore, the ability to create and change LBAC-related controls does not belong by default to a DBADM or even to a SYSADM. LBAC set-up and administration is associated with a new DB2 authority type called Security Administrator, or SECADM for short. And, while a SECADM can set up LBAC security for a database, a person with SECADM authority cannot, by default, read any data in any table. It's truly a separation of control over data access from data access itself, and that can significantly reduce the threat of data security breaches for an organization.
I expect that data privacy and security people are going to eat this up (as DB2 for LUW V9 has been pretty widely implemented, I imagine that some organizations are already benefiting - in terms of less sleep lost over worries about data protection - from the LBAC advantage).
Hats off, again, to the folks at IBM's Toronto Lab (home of DB2 for Linux, UNIX, and Windows). LBAC is powerful stuff for the security-minded.
The SQL Programming Language?
During the mid-1990s, when I was with IBM, I did some consulting work for a large media company. These folks had a very sophisticated data warehouse set up, with DB2 for z/OS being at the heart of it all. One day, a person on the company's data warehouse team asked me if I wanted to see the cool program he'd written in SQL. "Excuse me?" I said. "Don't you mean the cool program you wrote that contains SQL?" No, he really did mean "written in SQL." He showed me an SQL statement that was six or seven pages long, with lots of CASE expressions, that did some very sophisticated data manipulation and generated some very useful information from. About all I could say was, "Wow."
Keep in mind, folks, that this was more than 10 years ago. Think of what's happened since then on the mainframe side of the DB2 family:
- DB2 for z/OS Version 6 added a ton of new scalar functions, including CEILING, DAYOFWEEK, IFNULL, REPEAT, and TRUNCATE. Also new with V6 were triggers, user-defined functions (aka UDFs), user-defined data types (UDTs), and large objects (LOBs).
- DB2 for z/OS V7 enabled SQL writers to order result set data according to an expression with operators (not just by a column name or integer). V7 also introduced scrollable cursors, declared temporary tables, and, of course, the ability to write stored procedures entirely in SQL (using what has come to be called SQL PL, or SQL Procedural Language).
- DB2 for z/OS V8 gave us common table expressions, which in turn enabled people to do things using recursive SQL that had previously been very challenging (such as returning multiple values of one column from rows with the same key - a table design meant to eliminate repeating groups - as multiple column values of a single row). V8 also enriched SQL PL with statements such as ITERATE and SIGNAL.
- DB2 for z/OS V9 delivered all kinds of XML-related functionality, along with OLAP specifications, ROW CHANGE expressions, and new scalar functions such as BIGINT, CHARACTER_LENGTH, LOCATE_IN_STRING, and SOUNDEX. There are also new SQL DML statements, including MERGE and TRUNCATE.
The next version of DB2 for z/OS will make SQL even more powerful. The story on Linux/UNIX/Windows (LUW) platforms is much the same, with each release of DB2 expanding the range of things that can be done via SQL (SQL in the mainframe and LUW DB2 environments is almost entirely the same, with new SQL functionality sometimes showing up in DB2 for z/OS before being delivered in DB2 for LUW, and vice versa). I wondered, ten years ago, as to whether or not there would be a class of DB2 professionals who essentially work as SQL programmers. I wonder no more. Earlier today, in Googling "DB2 SQL PL", I saw a link to a job posting for a DB2 SQL PL developer.
This SQL coding specialization is a timely development (no pun intended) within the DB2 community. For some time now, I've been a proponent of an architecture in which a mainframe DB2 platform functions as a super-scalable, super-available, super-secure data server, processing requests from Java or .NET (or PERL or Python or Ruby or whatever) programs running on off-mainframe application servers (such an architecture is already relatively common in DB2 for LUW environments). I'm also big on SOA and the related concept of logically separate presentation, business, and data access logic layers within an application system. The data access layer is where SQL programmers can really shine (I'm a fan of server-side SQL).
Looking to get into the IT business, or looking for a new career direction? Consider becoming an SQL programmer. There's enough variety and creativity possible in that space to make for a rewarding work life.
An Object-ive View of Data
My introduction to the idea of a logical data structure came back in 1982, during my first year out of college (Rice U in Houston) and my first year as an IBM employee (first of my 17 years with Big Blue). I was training to become an IBM Systems Engineer (field technical support person), and as a part of that training I learned something about the hierarchical form of data organization implemented through IBM's IMS database management system. While that structure made some sense to me (I'd seen organization charts and family tree diagrams that were hierarchical in nature), it took me a while to understand how data elements that were on the same level of an IMS database hierarchy but under different parents were linked to each other. I wasn't so good with pointers (though I knew people who were pointer wizards and could fix these things if they broke).
A couple of years later, I attended a branch office meeting in which we learned of a newly announced database management product called DB2. To me, the relational model, with data elements arranged in tables with rows and columns, made all kinds of sense, as did the intuitive and set-oriented Structured Query Language (I was an applied math major in college, and relational database technology is a prime example of applied mathematics). Over the next 16 years, I learned a lot more about DB2 and happily worked with people who were leveraging the technology for their employers' benefit, serene in my belief that the theory of data structuring had advanced as far as it needed to. Relational was a great thing. Who would want to deal with data in any other logical form?
Then, early in 2000, I left IBM to work for CheckFree, a leading user of DB2 and other relational DBMSs in the financial services industry (CheckFree is now a part of Fiserv Corporation). That's where I first worked extensively with application architects, and that's where I got my first exposure to object-oriented programming. Let me tell you, I had a tough time getting my arms around the notion of data being logically presented in the form of an object. I went to a couple of my Strategic Technology Team colleagues, Rick McMichael (still at CheckFree, now a VP there) and Mark Harris (now a senior consultant at Magenic), and asked, "Would you explain object-oriented programming to me?" Rick and Mark are a couple of very sharp guys, and they were patient with me, but still I struggled mightily to understand even basic OO concepts. "Well, an object is an instance of a class," one of them would say. My response: "What's a class?" "Look, you might have a set of plans for a house, and then you have the house itself. An object is to a class kind of as the house is to the building plans. Got it?" Me: "Uh..." When Rick delved a little into user interface programming and started talking about how, in the OO world, it was pretty easy to programmatically tell a box to draw itself, my eyes probably started to glaze over, and that particular discussion didn't last much longer.
Over the course of the next several years, I gradually absorbed more object knowledge, first from Rick and Mark and later from other CheckFree colleagues such as Don Barton, Matt Humphrey (now a Senior Program Manager at Microsoft), and Eric Drudge (now with Moneta). Eventually, I came to be somewhat comfortable talking about objects. I'll try now to put the concept of objects, as they pertain to data, in my own words. If you're an OO expert, cut me a little slack (but feel free to offer critique via a comment).
At the most basic level, I think of an object as an informational instantiation of something. Going back to the analogy mentioned previously, a house is a material instantiation of a set of building plans. Computer programs deal with data, not with two-by-fours and nails and copper tubing, and that's why I call an object an informational instantiation of something. Of what? Of a class. My former CheckFree colleague Matt described a class "as a convenient way to package data and behavior (code)." Added Eric Drudge, "a class is sort of the definition of an object." A class has attributes that indicate the defining characteristics of an associated object, and methods that indicate what the object can do.
Eric provided me with an analogy that really helped me to get my arms around the object concept. Suppose there is a class called Dog. Attributes of this class might include name, breed, color, gender, and age. The class could contain methods such as Bark(), Sit(), Sleep(), Eat(), and Fetch(). These methods would form the interface by which other programs would interact with an instance of the class. One such instance of the class (a dog object) might be "Fifi," a "poodle" that is "white," "female," and "2" years old.
Now, OO programmers don't think of data in an object sense just to confuse us database people. The object representation of data actually has some very useful characteristics when it comes to developing applications. One of the most important of these object aspects is called inheritance, and it has to do with a new class (again, an object is an instance of a class) being an extension of an existing one, and having all the attributes of the class on which it's based while also having some specialized attributes that are not shared with the base class. Consider, for example, the relationship between employees and managers (the latter being employees, but with attributes that are not shared with non-management employees). This relationship is pretty straightforward from an object point of view. On the relational database side, this relationship can be represented in more than one way, involving perhaps an EMPLOYEE table and a MANAGER table, but that schema view of the data is going to differ from the object-oriented view. Presenting data from a relational database to programmers in object form generally requires some code (either user-written or purchased, in the form of an object-to-relational mapping tool), and that means an investment of time and/or money.
Is making that investment important? If you're interested in programmer productivity, I believe that it is. The veteran OO programmers that I know don't want an object representation of data because they can't deal with schemas. They certainly know what a schema is, and they can deal with that and many of them have a good knowledge of SQL. These people want to deal with an object representation of data because it enables them to do what they do best - write code that delivers valuable business functionality - in a more time-efficient manner than would be the case if they had to work with a schema representation of data. In addition, your organization can benefit from another important aspect of object-oriented programming, called encapsulation. The idea here is that a class should be designed with a well-defined interface, so that it can be operated on only via specific methods. In other words, programs dealing with the data part of a class should act on the objects defined by the class, as opposed to acting on the data directly. The advantage of encapsulation - and this is a recurring theme of service-oriented architecture, or SOA - is application change isolation. Application maintenance will be less onerous if aspects of a class that could change in the future are hidden from programs that might use the class.
Am I an object expert? Nope. Not by a long shot. I have found, however, that learning what I can about object-oriented development and the related view of data helps me to walk a ways in an OO programmer's shoes. That puts me in a better position to be of help, as a data-focused person, to OO developers. These are people that you want to help, because they turn the data you manage into information, and that is a key driver of organizational success.