Programmers Only: More Joys of Commitment
More Joys of Commitment
How to know when it's time to COMMIT
By Bonnie Baker
Quarter 1, 2003

Printer-Friendly Version
Email this Story
Bookmark to del.ico.us
Digg It!

In a column called " The Woes of Commitment ," ( DB2 Magazine , Programmers Only, Fall 1998), I discussed the negative impact of COMMIT logic and many methods of controlling those negative issues. In the most recent Programmers Only column (" The Joys of Commitment ," Quarter 4, 2002), I wrote about all reasons programs should issue COMMIT s. In this issue, I'll continue with the good reasons by discussing why even read-only programs should issue COMMIT s. I'll also give you some pointers on determining COMMIT frequency: What is the appropriate timing for COMMIT s? How often should you COMMIT , and how can you gracefully control the frequency?

SHOULD I PUT COMMITS IN READ-ONLY PROGRAMS?

Absolutely. The reasons include:

1. Utilities. In the last column, I discussed the use of CLAIM s and DRAIN s to manage the concurrency and serialization between SQL and utilities. Some utilities can't run concurrently with any SQL, not even read-only SQL. If someone kicks off a read-only program during your utility batch window, the program's thread will have a CLAIM on the tablespaces (and partitions) and indexspaces (and partitions) until COMMIT . The unfriendly utility, trying to take over the object of its affections, will establish a DRAIN on the object and then wait for a DSNZPARM-specified interval for the SQL to release its CLAIM on the object. CLAIM s are released at COMMIT points. If the read-only program does not issue COMMIT s, the CLAIM won't be released until the implicit COMMIT that occurs at program termination. In other words, if you want utilities (especially online utilities such as online reorg) to win out over SQL, the SQL must release its CLAIM on the object. CLAIM s are released at COMMIT unless there's an open CURSOR WITH HOLD on the object.

2. AUTO-REBIND. I received a call recently from a client who was experiencing ROLLBACK s in a read-only batch program. He described the problem this way:

Program A calls Program B. Both programs have SQL in them and both are read-only. Recently, a SQL statement in Program B received a -911 (timeout or deadlock with ROLLBACK ), and he noticed that the logs were used to roll back the unit of recovery. Program A had retry logic and retried three times unsuccessfully. Again, all three times Program B created undo/redo log records and then used the undo log records to roll back the unit of work. What work needed to be rolled back? And why would a read-only program write to the logs?

This one took a while to answer, but after asking a few questions, I figured it out. Program B was trying to run with an invalid package. The package is invalidated when a selected index is dropped, ALTER s are performed, and so on. The DBA either didn't notice the invalidation or assumed that AUTO-REBIND would take care of the problem the first time the program was run. So, when Program A called Program B, Program B did a successful AUTO-REBIND . AUTO-REBIND writes to the DB2 Catalog and Directory and becomes part of the first unit of work for a called program (in this case, Program B). Until a COMMIT occurs, the unit of work can be rolled back. A -911 has an implicit ROLLBACK incorporated into its logic. When the read-only program received a -911 , the unit of work rolled back and the work of the AUTO-REBIND was undone. Four times. The solution? Called Program B now does a COMMIT the first time it's called by Program A.

3. Temporary tables. Both created temporary tables (CTTs) and declared temporary tables (DTTs) consume resources. CTTs acquire space from the generic workspace (DSNDB07) inside DB2. This workspace is also used for sort-work files, sort-out files, and, when necessary, view materialization. DTTs consume space in the TEMP database that's also used by others for DTTs and scrollable cursors. When a program has used a CTT or a DTT and is through with the temporary table, it should deallocate the space. CTTs are deallocated according to the value of the BIND RELEASE parameter. If that value is COMMIT , the program can do a COMMIT and release this valuable space. That is, of course, if the reader recognizes that the read-only program has a responsibility to COMMIT . Because DTTs are created, they can be dropped using a DROP TABLE statement to free up the TEMP space.

4. ISOLATION RR or RS. Obviously, if you've bound your package with ISOLATION RR or RS and are acquiring (and acquiring and acquiring) S locks on page after page or row after row, the program must do COMMIT s in order to release those locks and avoid the negative impact of running out of Internal Resource Lock Manager (IRLM) resources, exceeding the maximum number of locks allowed, and enduring the unpopularity and havoc caused by lock escalation. Enough said.

To read more about COMMIT s in read-only programs, see Pranav Sampat's article " Go Ahead, Commit on a Read ," in the Quarter 4, 2002 issue of DB2 Magazine .

OTHER ITEMS TO CONSIDER

If you've read both last issue's Programmers Only column and the beginning of this column, you're now convinced that both maintenance and read-only programs should do COMMIT s. Generally, screen-driven programs handle COMMIT s by using transaction manager logic (such as CICS pseudo-conversational programming) to avoid holding locks and resources across screen displays. However, on some platforms (such as TSO ISPF, SPUFI, or QMF), the programmer must understand that COMMIT logic must be incorporated into the code. The programmer can either code standard COMMIT logic; or use autoCOMMIT(yes) in SPUFI; or use confirm(no) in QMF, always being sure to go to the bottom of the report and reset the query or save the data.

But what about batch programs? How do you decide how often to COMMIT ? Well, intelligent COMMIT logic considers many factors. The program logic may best be coded as: "During time window A, if some number of updates have been done or some number of seconds or minutes have passed or some number of checkpoint intervals have occurred, or, during time window B, if some number of updates have been done, then COMMIT ."

Let's look at each of the factors mentioned and any other factors you should consider when deciding what the "adaptive" COMMIT frequency should be. As I cover each of the considerations, I'll assume that the tablespace LOCKSIZE parameter is set to the most common value of PAGE .

1. Number of updates. I often see programs coded to do a COMMIT after every n updates. But that approach assumes that:

  • Throughput is static
  • Batch windows don't lengthen or shorten
  • Programs always run successfully at the regularly scheduled time
  • The number of concurrent conflicting users doesn't change
  • There are no fluctuations in device contention
  • DB2 performance doesn't improve with each release.

In fact, all of these factors vary from day to day and sometimes minute to minute. For example, back in the bad old days a program might take 15 minutes to do 10,000 updates. Today, the same updates may take three seconds. Sometimes a COMMIT may be appropriate every five updates, sometimes every 5,000. And the program should be coded to consider the variables (those I mentioned and many others) that should be factored in when making this decision. I'll show you some techniques for coding these variables later in this column.

2. Watching the clock. Update, insert, and delete logic causes your program to create log records and acquire locks. Log buffers and lock manager resources are shared by all DB2 subsystem users. It's your program's responsibility to be a good citizen and free up those resources before it consumes more than its fair share. Also, a unit of work that goes forward can go backward. Long running units of work have even longer running rollbacks. And locks will be held for the entire duration of the forward or backward cycle. You should COMMIT while the forward plus backward interval is within a tolerable length of time (lock duration) for concurrently running programs. But remember, if you decide to COMMIT every five minutes because you can tolerate a 10-minute lock duration (worst-case forward and rollback scenario), the five minutes that covers a reasonable number of updates today may improve to the point that far too much work (considering resources consumed and locks acquired) is accomplished in that same time interval tomorrow. Five minutes is a long time considering today's DB2 throughput. If you do have concurrent online access, five minutes means that you will probably have timeouts for online work. How long will your users wait? Those who are querying data wait only a few seconds in most situations. Those who are updating might wait a little more. If the updates are asynchronous, you have a little more time; however, you don't want to affect the consistency and coherency of your data.

DSNZPARM SETTINGS

The DB2 subsystem is set up with various parameters (DSNZPARMs) that are used to police and maintain the health of the system. COMMIT logic should weigh these parameters to ensure that the program does not become a victim of one of them:

System checkpoint interval. For example, the system checkpoint interval is used to determine how often to flush the log buffers to disk. You can set it to a time value (for example, every two minutes) or to a number of updates (for example, every 100,000 log records). Because log records are written to memory (read "fast"), for rollback you would probably prefer that the undo records be read from the same memory, not from disk. If the program doesn't COMMIT within the checkpoint interval and your program rolls back, the rollback will be from disk (read "slow") or, if the disk files have been archived, from tape (read "crawling"). That's the primary reason that rollbacks may take longer than go-forwards. One of your goals should be to code your program so that you will rollback from memory if at all possible.

Long-running unit of work detector. There is a DSNZPARM that allows the powers that be to find jobs that are running without doing COMMIT s. This long-running unit of work detector is a tattletale that will report your program if it passes more than some number of checkpoint intervals without a COMMIT . You certainly want to code your program to COMMIT before it gets reported.

Maximum locks for any tablespace and maximum locks for this tablespace. Another DSNZPARM you should consider is the one that sets the maximum number of locks allowed for one user on any one tablespace. When this limit is exceeded, one of three unpleasant things happens:

  • Page-level locking is escalated to table, partition, or tablespace level, blocking all users from accessing the resource.
  • The escalation is attempted and fails when the higher-level lock is unavailable. The user receives SQLCODE -911 because other users have incompatible locks.
  • No limit is set (the DDL MAXLOCKS parameter was set to 0), escalation is not attempted, and the greedy program acquires and acquires and acquires until some other resource limit is reached. COMMIT s should certainly be executed before the maximum number of locks per tablespace is reached for any of the tablespaces accessed in the unit of work. And they should most certainly be executed before the program consumes so many resources that the next limit, maximum locks per user, is reached. One more note: When this systemwide DSNZPARM is overridden for a single tablespace using the DDL MAXLOCKS parameter and the MAXLOCKS parameter is set to a smaller number than the DSNZPARM, you should shorten your COMMIT interval even further.

Maximum locks for any user. Another DSNZPARM sets the total (across all tablespaces) maximum number of locks allowed for any one user. This maximum can't be overridden at the DDL level and, when it's reached, the SQL receives a -904 resource unavailable. Your COMMIT timing should be cognizant of this parameter so that the -904 is avoided. An even worse scenario is one where the maximum number of locks per user is set unreasonably high. In this case, the IRLM can run out of memory and the user can receive a -904 just as DB2 is coming down around the user's ears.

These limits are set to protect shareable, limited system resources. If they are ignored or set to unrealistically high values, DB2 can become so constrained that it can't function. Two points need to be made here. Point 1: Programs should honor these limits and COMMIT well within them. Point 2: If programs are escalating or receiving -911 s or -904 s because the maximum number of locks is reached, don't reward the bad behavior by increasing the limits. If the limits are unreasonably high, your unruly program with its lack of COMMIT logic can cause the whole system to come down.

Lock wait time. Probably the most obvious DSNZPARM to consider is lock wait time. If the amount of time a user will wait for a lock is set to 30 seconds but your shop standard for COMMIT s for batch programs that run concurrently with other programs is every five minutes, then the wait is unlikely to be rewarded. The COMMIT frequency should be within the lock wait time. Program A needs a lock and can't get it because Program B has an incompatible lock. Program A waits. Good citizen B should COMMIT before Program A times out. Whenever I ask customers, "What is your lock timeout parameter?" and they say, "Five minutes," I know that their concurrently running programs aren't doing COMMIT s often enough. Who would want to be the online user who is waiting five minutes for a screen to come back, especially if the screen carries the message that the data is unavailable?

On the other end of the spectrum is the company with a rigid service level agreement of subsecond response for all online applications. That company needs its lock wait time to be less than one second. And, with a DB2 fix, the company is able to set it to a half second, putting the burden on concurrently running programs to COMMIT even more frequently in order for the wait to be successful.

Utility wait time. Often, utilities run concurrently with application programs. Some utilities are willing to share with everyone; others will share with readers only; and a few need exclusive access to the object for the entire duration of the utility (load replace) or for a small whisper of time (online reorg). Programs that will share with readers only and those that need exclusive access must drain the object of writers or readers for the duration required. For the drain to be successful, the SQL must release its claim on the object. CLAIM s are released at COMMIT . A DSNZPARM tells utilities how long to wait for the DRAIN to be successful and for the claim counters to decrement to zero so that the utility can take over the object. In order for utilities to run, the application programs must COMMIT within the utility wait time.

ADAPTIVE TECHNIQUES

A program that considers all the factors I've mentioned is a well-behaved program. If that program can adapt to changes in these DSNZPARMs, it's even better. One technique I use is a DB2 table that can be read to retrieve COMMIT scope parameters for batch programs. In this table, there's a row that applies to every program and rows that apply to each specific program. I call it a program control table. Using this table I pass the DSNZPARM values mentioned above and other information to consider. My program COMMIT s before any of the critical points I mentioned have been reached. My program also considers the time of day and the day of the week and knows about utility windows.

I wrote this logic once and I replicate the logic in every program. You can code a similar table and COMMIT logic routine that can be used to heuristically change COMMIT frequency appropriately for your mix of users, your DSNZPARMs, your time of day, and so on.

A FEW LAST WORDS

Jim Ruddy, utility developer and guru at the IBM Silicon Valley Lab, says, "Programs (even read-only ones) that don't do COMMIT s are the scourge of online utilities." Companies that don't logically and intelligently control COMMIT frequency will be hampering their future success with availability. And as more and more online utilities are developed, programs must be well behaved to allow the concurrently running utilities to be successful. Both availability and concurrency depend on well-thought-out patterns for units of recovery.


Bonnie Baker is a consultant and educator specializing in applications performance issues on the DB2 OS/390 and z/OS platforms. She is an IBM DB2 Gold Consultant, a five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame. She is best known for her series of seminars entitled "Things I Wish They'd Told Me 8 Years Ago. She can be reached through Bonnie Baker Corporation at 1-800-992-7192 or bkbaker@bonniebaker.com .


Comments? Questions?

Give us your feedback or ask a question of the author.

Please enter your e-mail address below:

CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 


:: IBM Database Magazine ::