In a column
called "
The Woes of
Commitment
," (
DB2 Magazine
, Programmers Only, Fall 1998), I discussed the negative impact of
COMMIT
logic and many methods of controlling those negative issues. In the most recent Programmers
Only column ("
The Joys of Commitment
,"
Quarter 4, 2002), I wrote about all reasons programs
should
issue
COMMIT
s. In this issue, I'll continue with
the good reasons by discussing why even read-only
programs should issue
COMMIT
s. I'll also give you some pointers on determining
COMMIT
frequency:
What is the appropriate timing for
COMMIT
s? How often should you
COMMIT
, and how can you gracefully
control the frequency?
SHOULD I PUT COMMITS IN READ-ONLY PROGRAMS?
Absolutely. The reasons include:
1. Utilities.
In the last column, I discussed the use of
CLAIM
s
and
DRAIN
s to manage the
concurrency and serialization between SQL and utilities. Some utilities can't run concurrently with
any SQL, not even read-only SQL. If someone kicks off a read-only
program during your utility batch
window, the program's thread will have a
CLAIM
on the tablespaces (and partitions) and indexspaces
(and partitions) until
COMMIT
. The unfriendly utility, trying to take over the object of its
affections, will establish a
DRAIN
on the object and then wait for a DSNZPARM-specified interval for
the SQL to release its
CLAIM
on the object.
CLAIM
s are released at
COMMIT
points. If the read-only
program does
not issue
COMMIT
s, the
CLAIM
won't be released until the
implicit
COMMIT
that occurs at
program termination. In other words, if you want utilities (especially online utilities such as
online reorg) to win out over SQL, the SQL must release its
CLAIM
on the object.
CLAIM
s are released
at
COMMIT
unless there's an open
CURSOR WITH HOLD
on the object.
2. AUTO-REBIND.
I received a call recently from a client
who was
experiencing
ROLLBACK
s in a
read-only batch program. He described the problem this way:
Program A calls Program B. Both programs have SQL in them and both are read-only. Recently, a SQL
statement in Program B received a
-911
(timeout or deadlock with
ROLLBACK
),
and he noticed that the
logs were used to roll back the unit of recovery. Program A had retry logic and retried three times
unsuccessfully. Again, all three times Program B created undo/redo
log records and then used the
undo log records to roll back the unit of work. What work needed to be rolled back? And why would a
read-only program write to the logs?
This one took a while to answer, but after asking a few questions, I figured it out. Program B
was trying to run with an invalid package. The package is invalidated when a selected index is
dropped,
ALTER
s are performed, and so on. The DBA either didn't notice the invalidation or assumed
that
AUTO-REBIND
would
take care of the problem the first time the program was run. So, when Program
A called Program B, Program B did a successful
AUTO-REBIND
.
AUTO-REBIND
writes to the DB2 Catalog
and Directory and becomes part of the first unit of work for a called program (in this case, Program
B). Until a
COMMIT
occurs, the unit of work can be rolled back. A
-911
has an implicit
ROLLBACK
incorporated into its logic. When the read-only program received a
-911
, the unit of work rolled
back and the work of the
AUTO-REBIND
was undone. Four times.
The solution? Called Program B now does
a
COMMIT
the first time it's called by Program A.
3. Temporary tables.
Both created temporary tables (CTTs) and declared temporary tables (DTTs)
consume resources. CTTs acquire space from the generic workspace (DSNDB07) inside DB2. This
workspace is also used for sort-work files, sort-out files, and, when necessary, view
materialization. DTTs consume space in the TEMP database that's also used by others for DTTs and
scrollable cursors. When a program has used a CTT or a DTT and is through with the temporary table,
it should deallocate the space. CTTs are deallocated according to the value of the
BIND RELEASE
parameter. If that value is
COMMIT
, the program can do a
COMMIT
and release this valuable space.
That is, of course, if the reader recognizes that the read-only program has a responsibility
to
COMMIT
. Because DTTs are created, they can be dropped using a
DROP TABLE
statement to free up the
TEMP space.
4. ISOLATION RR or RS.
Obviously, if you've bound your package with
ISOLATION RR
or
RS
and are
acquiring (and acquiring and acquiring) S locks on page after page or row after row, the program
must do
COMMIT
s in order to release those locks and avoid the negative impact of running out of
Internal Resource Lock
Manager (IRLM) resources, exceeding the maximum number of locks allowed, and
enduring the unpopularity and havoc caused by lock escalation. Enough said.
To read more about
COMMIT
s in read-only programs, see Pranav Sampat's article
"
Go Ahead, Commit
on a Read
," in the Quarter 4, 2002 issue of
DB2 Magazine
.
OTHER ITEMS TO CONSIDER
If you've read both last issue's Programmers Only column and the beginning of this column,
you're
now convinced that both maintenance and read-only programs should do
COMMIT
s. Generally,
screen-driven programs handle
COMMIT
s by using transaction manager logic (such as CICS
pseudo-conversational programming) to avoid holding locks and resources across screen displays.
However, on some platforms (such as TSO ISPF, SPUFI, or QMF), the programmer must understand that
COMMIT
logic must be incorporated into the code. The programmer can either
code standard
COMMIT
logic; or use
autoCOMMIT(yes)
in SPUFI; or use
confirm(no)
in QMF, always being sure to go to the
bottom of the report and reset the query or save the data.
But what about batch programs? How do you decide how often to
COMMIT
? Well, intelligent
COMMIT
logic considers many factors. The program logic may best be coded as: "During time window A, if some
number of updates have been done or some number of seconds or minutes have passed or
some number of
checkpoint intervals have occurred, or, during time window B, if some number of updates have been
done, then
COMMIT
."
Let's look at each of the factors mentioned and any other factors you should consider when
deciding what the "adaptive"
COMMIT
frequency should be. As I cover each of the considerations, I'll
assume that the tablespace
LOCKSIZE
parameter is set to the most common value of
PAGE
.
1. Number of updates.
I
often see programs coded to do a
COMMIT
after every
n
updates. But that
approach assumes that:
- Throughput is static
- Batch windows don't lengthen or shorten
- Programs always run successfully at the regularly scheduled time
- The number of concurrent conflicting users doesn't change
- There are no fluctuations in device contention
- DB2 performance doesn't improve with each release.
In fact, all of these factors vary from day
to day and sometimes minute to minute. For example,
back in the bad old days a program might take 15 minutes to do 10,000 updates. Today, the same
updates may take three seconds. Sometimes a
COMMIT
may be appropriate every five updates, sometimes
every 5,000. And the program should be coded to consider the variables (those I mentioned and many
others) that should be factored in when making this decision. I'll show you some techniques for
coding these variables later in this column.
2.
Watching the clock.
Update, insert, and delete logic causes your program to create log records
and acquire locks. Log buffers and lock manager resources are shared by all DB2 subsystem users.
It's your program's responsibility to be a good citizen and free up those resources before it
consumes more than its fair share. Also, a unit of work that goes forward can go backward. Long
running units of work have even longer running rollbacks. And locks will be held for the entire
duration of the forward or
backward cycle. You should
COMMIT
while the forward plus backward
interval is within a tolerable length of time (lock duration) for concurrently running programs. But
remember, if you decide to
COMMIT
every five minutes because you can tolerate a 10-minute lock
duration (worst-case forward and rollback scenario), the five minutes that covers a reasonable
number of updates today may improve to the point that far too much work (considering resources
consumed and locks acquired) is
accomplished in that same time interval tomorrow. Five minutes is a
long time considering today's DB2 throughput. If you do have concurrent online access, five minutes
means that you will probably have timeouts for online work. How long will your users wait? Those who
are querying data wait only a few seconds in most situations. Those who are updating might wait a
little more. If the updates are asynchronous, you have a little more time; however, you don't want
to affect the consistency and coherency of your
data.
DSNZPARM SETTINGS
The DB2 subsystem is set up with various parameters (DSNZPARMs) that are used to police and
maintain the health of the system.
COMMIT
logic should weigh these parameters to ensure that the
program does not become a victim of one of them:
System checkpoint interval.
For example, the system checkpoint interval is used to determine how
often to flush the log buffers to disk. You can set it to a time value (for example, every two
minutes) or to a
number of updates (for example, every 100,000 log records). Because log records are
written to memory (read "fast"), for rollback you would probably prefer that the undo records be
read from the same memory, not from disk. If the program doesn't
COMMIT
within the checkpoint
interval and your program rolls back, the rollback will be from disk (read "slow") or, if the disk
files have been archived, from tape (read "crawling"). That's the primary reason that rollbacks may
take longer than
go-forwards. One of your goals should be to code your program so that you will
rollback from memory if at all possible.
Long-running unit of work detector.
There is a DSNZPARM that allows the powers that be to find
jobs that are running without doing
COMMIT
s. This long-running unit of work detector is a tattletale
that will report your program if it passes more than some number of checkpoint intervals without a
COMMIT
. You certainly want to code your program to
COMMIT
before it gets reported.
Maximum locks for any tablespace and maximum locks for this tablespace.
Another DSNZPARM you
should consider is the one that sets the maximum number of locks allowed for one user on any one
tablespace. When this limit is exceeded, one of three unpleasant things happens:
- Page-level locking is escalated to table, partition, or tablespace level, blocking all users
from accessing the resource.
- The escalation is attempted and fails when the
higher-level lock is unavailable. The user
receives
SQLCODE -911
because other users have incompatible locks.
- No limit is set (the
DDL MAXLOCKS
parameter was set to 0), escalation is not attempted, and the
greedy program acquires and acquires and acquires until some other resource limit is reached.
COMMIT
s should certainly be executed before the maximum number of locks per tablespace is reached
for any of the tablespaces accessed in the unit of work. And they should
most certainly be executed
before the program consumes so many resources that the next limit, maximum locks per user, is
reached. One more note: When this systemwide DSNZPARM is overridden for a single tablespace using
the
DDL MAXLOCKS
parameter and the
MAXLOCKS
parameter is set to a smaller number than the DSNZPARM,
you should shorten your
COMMIT
interval even further.
Maximum locks for any user.
Another DSNZPARM sets the total (across all
tablespaces) maximum
number of locks allowed for any one user. This maximum can't be overridden at the DDL level and,
when it's reached, the SQL receives a
-904
resource unavailable. Your
COMMIT
timing should be
cognizant of this parameter so that the
-904
is avoided. An even worse scenario is one where the
maximum number of locks per user is set unreasonably high. In this case, the IRLM can run out of
memory and the user can receive a
-904
just as DB2 is
coming down around the user's ears.
These limits are set to protect shareable, limited system resources. If they are ignored or set
to unrealistically high values, DB2 can become so constrained that it can't function. Two points
need to be made here. Point 1: Programs should honor these limits and
COMMIT
well within them. Point
2: If programs are escalating or receiving
-911
s or
-904
s because the maximum number of locks is
reached, don't reward the bad behavior
by increasing the limits. If the limits are unreasonably
high, your unruly program with its lack of COMMIT logic can cause the whole system to come down.
Lock wait time.
Probably the most obvious DSNZPARM to consider is lock wait time. If the amount
of time a user will wait for a lock is set to 30 seconds but your shop standard for
COMMIT
s for
batch programs that run concurrently with other programs is every five minutes, then the wait is
unlikely to be rewarded. The
COMMIT
frequency should be within the lock wait time. Program A needs a
lock and can't get it because Program B has an incompatible lock. Program A waits. Good citizen B
should
COMMIT
before Program A times out. Whenever I ask customers, "What is your lock timeout
parameter?" and they say, "Five minutes," I know that their concurrently running programs aren't
doing
COMMIT
s often enough. Who would want to be the online user who is waiting five minutes for a
screen to come back,
especially if the screen carries the message that the data is unavailable?
On the other end of the spectrum is the company with a rigid service level agreement of subsecond
response for all online applications. That company needs its lock wait time to be less than one
second. And, with a DB2 fix, the company is able to set it to a half second, putting the burden on
concurrently running programs to
COMMIT
even more frequently in order for the wait to be
successful.
Utility wait
time.
Often, utilities run concurrently with application programs. Some utilities
are willing to share with everyone; others will share with readers only; and a few need exclusive
access to the object for the entire duration of the utility (load replace) or for a small whisper of
time (online reorg). Programs that will share with readers only and those that need exclusive access
must drain the object of writers or readers for the duration required. For the drain to be
successful, the SQL must release its
claim on the object.
CLAIM
s are
released at
COMMIT
. A DSNZPARM
tells utilities how long to wait for the
DRAIN
to be successful and for the claim counters to
decrement to zero so that the utility can take over the object. In order for utilities to run, the
application programs must
COMMIT
within the utility wait time.
ADAPTIVE TECHNIQUES
A program that considers all the factors I've mentioned is a well-behaved program. If that
program can
adapt to changes in these DSNZPARMs, it's even better. One technique I use is a DB2
table that can be read to retrieve
COMMIT
scope parameters for batch programs. In this table,
there's a row that applies to every program and rows that apply to each specific program. I call it
a program control table. Using this table I pass the DSNZPARM values mentioned above and other
information to consider. My program
COMMIT
s before any of the critical points I mentioned have been
reached. My
program also considers the time of day and the day of the week and knows about utility
windows.
I wrote this logic once and I replicate the logic in every program. You can code a similar table
and
COMMIT
logic routine that can be used to heuristically change
COMMIT
frequency appropriately for
your mix of users, your DSNZPARMs, your time of day, and so on.
A FEW LAST WORDS
Jim Ruddy, utility developer and guru at the IBM Silicon Valley Lab, says, "Programs (even
read-only ones) that don't do
COMMIT
s are the scourge of online utilities." Companies that don't
logically and intelligently control
COMMIT
frequency will be hampering their future success with
availability. And as more and more online utilities are developed, programs must be well behaved to
allow the concurrently running utilities to be successful. Both availability and concurrency depend
on well-thought-out patterns for units of recovery.
Bonnie Baker
is a consultant and educator specializing in applications
performance issues on the DB2 OS/390 and z/OS platforms. She is an IBM DB2 Gold Consultant, a
five-time winner of the IDUG Best Speaker award, and a member of the IDUG Speakers' Hall of Fame.
She is best known for her series of seminars entitled "Things I Wish They'd Told Me 8 Years Ago. She
can be reached through Bonnie Baker Corporation at 1-800-992-7192 or
bkbaker@bonniebaker.com
.
Comments? Questions?
Give us your feedback or ask a question of the author.