Many people working with DB2 for z/OS or OS/390 believe that it's not necessary to
COMMIT
on a read-only access. These folks think that because they
haven't
changed any data, the
COMMIT
is unnecessary. But
COMMIT
does more
than simply apply changes; it also releases locks and claims. So
COMMIT
affects
application availability.
I'll explain the availability considerations of introducing a
COMMIT
in an
application doing read-only access and the concepts of claims and drains. The examples I'll
use are specific to read-only environments; they're not designed to cover
COMMIT
considerations with regard to update activity. The scope of this article is limited to
COMMIT
s for read-only application programs in an environment that may include
execution of utilities and other
applications doing updates. In this article, I'll use the
term "application availability" to mean availability with regard to permitting read access.
TAKING A CLAIM
If you're familiar with DB2, you're probably well aware that incompatible locks
(such as X and S) on an object (such as row, page, table, or tablespace) reduce concurrency.
But what happens if you bind your read-only program with
ISOLATION
Un
COMMIT
ted Read (UR)? Doing so means your read-only program may
not take any
locks. Keep in mind that programs bound with
ISOLATION UR
do take a
MASS DELETE
lock on the table or tablespace if a mass delete is involved.
An
IX
lock is also taken on a tablespace if it uses work file database. However,
this article covers a read-only scenario.
So the question is, will having a
COMMIT
in an application program bound with
ISOLATION UR
doing a read-only access have any impact on the overall application
availability?
The answer is yes.
Although the application program bound with
ISOLATION UR
and doing read-only
access doesn't take a lock, it does claim the object. And claims, as I'll explain later in
the article, can be the cause of reduced availability.
COMMIT
, as I mentioned
earlier, releases claims.
Now, in an environment in which application availability is of paramount importance,
it's best to schedule the database maintenance jobs (such as
REORG
or
IMAGE COPIES
) when the update activity is minimal. Doing so increases the
chances of the
REORG
job running concurrently with long-running, read-only batch
jobs and that's where availability becomes an issue.
REORG
, like other utilities, requires a drain lock on the object (in some phase
or another) and will wait until it can acquire one. Draining is a mechanism used to take over
an object and serialize access to it. A drain lock is
acquired when all the claims on the object
are released and there are no pre-existing drain locks. Drain requestors prevent any new claims
from being taken on the already drained object.
From an application standpoint, you might wonder how the availability issue will play out.
Will the REORG job wait until the application job finishes? It will if the application program
acquired the claim before the
REORG
utility could request a drain. But
REORG
won't wait if it requested a
drain first.
Of course, the application's job can be initiated prior to the
REORG
utility
to ensure that
REORG
waits for the batch job. However, if another batch job or
online transaction comes along, it will go into a wait because it can't acquire a claim on
the object. (The new job or transaction has to acquire a claim, but there's a drain requestor,
REORG
, that would prevent any claims on the object.)
The degree of application unavailability
caused as a result of both the read-only program and
the
REORG
utility being ready for execution (both may or may not execute
simultaneously, as I'll explain) depends on:
- The
SHRLEVEL OPTION
specified on the
REORG
- The
COMMIT
frequency of the batch job executing concurrently with
REORG
.
A REORG utility needs to drain all the claim classes before it terminates.
This drain happens in the
SWITCH
phase if the
REORG
is executing with
SHRLEVEL REFERENCE
or
SHRLEVEL CHANGE
and in the
RELOAD
phase if the utility is executing with
SHRLEVEL NONE
.
The
COMMIT
frequency of the long-running process also affects availability.
By introducing a
COMMIT
, the claim on the object is released. Note that if the
batch job is using a cursor defined with
WITH HOLD
, a claim is retained past a
COMMIT
point. In all other cases the claim on the object is released at
COMMIT
. The duration of the claim on the object doesn't depend on the
RELEASE
parameter (
COMMIT
or
DEALLOCATE
) specified
during the
BIND
of a plan or package.
Table 1
shows the effect of the
REORG SHRLEVEL
parameter
and the
COMMIT
frequency on the application availability. To keep things simple,
let's assume a
nonpartitioned tablespace. The table also assumes that the batch job is executing
and the
REORG
job is triggered at time
T1
. Let's examine the cases
shown in the table.
Table 1
: Comparing processes with and without
COMMIT
s
Case 1.
A process without any interim
COMMIT
s is executing. A
REORG
job with
SHRLEVEL
NONE
starts at time
T1
.
Case 2.
A process without any interim
COMMIT
s is executing. A
REORG
job with
SHRLEVEL REFERENCE
or
CHANGE
starts at time
T1
.
Case 3.
A process that does an interim
COMMIT
at time
T3
is executing.
A
REORG
job with
SHRLEVEL NONE
starts at time T1.
Case 4.
A process that does an interim
COMMIT
at
time
T3
is
executing. A
REORG
job with
SHRLEVEL REFERENCE
or
CHANGE
starts at time
T1
.
Case 5.
A process without any interim
COMMIT
s is executing, but the
execution takes a longer time to complete compared with Case 2.
Case 6.
A process that does an interim
COMMIT
at time T3 is executing,
but the execution takes longer to complete when compared to Case 4.
From Table 1, you can
draw the following conclusions:
- A batch job without interim
COMMIT
s executing concurrently with a
REORG
(
SHRLEVEL NONE
) job would result in maximum unavailability.
The degree of unavailability would depend on the duration of the process (most likely a
batch job) and the size of the tablespace being reorganized
- From Cases 2 and 4, it appears that the unavailability period is same when
REORG
executes
concurrently with the batch job
irrespective of the
COMMIT
frequency; however,
this isn't true. If the process in Case 2 were to execute longer, it would result in
REORG
waiting for a longer period, increasing the application unavailability time.
A similar situation in Case 4 would result in a small wait time for the process (at time
T5
), while the switch phase of the
REORG
utility executes and the utility
terminates. However, the unavailability period wouldn't increase (see Cases 5
and 6).
- If the wait time of the process exceeds the
IRLMRWT
value in the
DSNZPARM
, then the process will terminate with a
-911
(timeout).
If this is a common occurrence in the application, consider increasing the
IRLMRWT
value or incorporating retry logic in the program.
- If the wait time of the utility exceeds the utility timeout period then the utility times
out. The utility timeout period is determined by multiplying
UTIMOUT
and
IRLMRWT
. You might be tempted to increase the utility timeout period by increasing
either or both the parameters. But be careful with this approach. Note that in all cases,
whenever the
REORG
job goes to wait status (waiting to acquire a drain lock),
it results in application unavailability. Therefore, you should be cautious when increasing
the utility timeout value as it will have a direct impact on availability.
THE SOLUTION
Executing the
REORG
utility with
SHRLEVEL REFERENCE
(for permitting
concurrent read access) or
SHRLEVEL CHANGE
(to permit concurrent updates) and
having the long-running process issue interim
COMMIT
s is a good solution to the
availability problem. Key factors in determining the
COMMIT
frequency are the maximum
unavailability tolerance and the value of utility timeout as defined in
DSNZPARM
.
The
COMMIT
frequency should be less
than the maximum unavailability tolerance.
If not, you've learned in this article that it can result in potential unavailability (when
REORG
is waiting to drain all claims, no new claims are allowed).
Even if the
COMMIT
frequency is lower than the maximum unavailability tolerance,
with DB2 version 6 (or lower), the
SWITCH
phase for a partitioned tablespace can
take quite some time (for the
IDCAMS
rename), resulting in unavailability. With DB2
version 7,
using
FASTSWITCH YES
(which is the default) alleviates this problem. If the
COMMIT
frequency is greater than the utility timeout value, then it leaves
the room for the
REORG
utility to time out without accomplishing anything.
DB2 version 7 introduced new options,
DRAIN WAIT
(specifying a time that overrides
the utility timeout value) and
RETRY
(number of retries prior to issuing a timeout),
to prevent the
REORG
from timing out.
JUST DO IT
Although there's an additional overhead (at minimum, an extra database call) associated
with introducing interim
COMMIT
s, with careful planning it's possible to
achieve good returns (application availability) with a fairly low investment (CPU costs).
Pranav Sampat
is a consultant with Cognizant Technology Solutions.
He is an IBM certified DB2 DBA specializing in database design, performance,
recovery, and
availability. You can reach him at
spranav@email.cognizant.com
.
Comments? Questions?
Give us your feedback or ask a question of the author.