DB2 programmers: Are your DBAs holding out on you? If they haven't told you about utility functions you could use instead of maintaining Cobol or C programs, you should probably ask them to. Programmers Only columnist Bonnie Baker suggested that I put together a column about some topics typically addressed to DBAs that would also be useful to a programmer.
I've assembled a list of ways programmers can use DB2 utilities to lighten the programming load. (Bonnie Baker fans take heart she'll return to the Programmers Only slot in the next issue.)
REPLACE INSERT PROGRAMS WITH ONLINE LOAD RESUME
Sometimes, DB2 for z/OS
customers try (usually only once) to use the
LOAD
utility to add data to tables while SQL operations are underway. But they soon discover that regular
LOAD
drains all access to the tablespace. (DB2 keeps track of application interest in a tablespace or index with a mechanism called claim. Some utility functions may need to limit read and write access, so they drain an application's read and write claims. DB2 waits until all the applications have relinquished their claims and blocks
any new claims. See
Programmers Only, Quarter 4, 2002
for more information.) Although
LOAD
has better performance, none of the data in the entire tablespace is available for transaction
INSERT
,
UPDATE
, or
DELETE
while
LOAD
is running. So, customers turn to
INSERT
programs instead.
But
INSERT
programs cause contention with online transaction processing
programs if they don't commit or don't commit frequently. They're also costly to design, especially if you want them to be restartable or support different input formats. So programmers often end up writing another
INSERT
program and then another, and another. Soon, they find themselves wishing they could use the
LOAD
utility, even if they have to trade availability for performance.
DB2 UDB version 7 added the options
SHRLEVEL NONE
and
SHRLEVEL CHANGE
to
LOAD RESUME YES
to address some of these situations.
LOAD RESUME YES SHRLEVEL NONE
operates just the way regular
LOAD RESUME YES
did in prior releases. But online
LOAD RESUME YES SHRLEVEL CHANGE
will operate the way SQL
INSERT
programs do: It claims, as SQL transactions do, instead of using drains, as regular utilities do. It also uses Data Manager
INSERT
instead of Data Manager
LOAD
. And, because it's
inserting, it will try to maintain the order of the table, also known as the cluster order, using free space within a page and free pages (which regular
LOAD
doesn't). Only
LOG YES
is allowed, because
LOAD RESUME YES SHRLEVEL CHANGE
is intended to run concurrently with SQL, and no copy is required after the
LOAD
completes.
You can define table triggers as
FOR EACH STATEMENT
or
FOR EACH ROW
; however each row loaded is treated as
a single statement. So, both
BEFORE
and
AFTER INSERT
triggers are fired before or after each
row
is loaded. In contrast, a multirow
INSERT
with an
AFTER INSERT
...
FOR EACH STATEMENT
trigger would fire only once. If you don't have any
FOR EACH STATEMENT
triggers, then table triggers fire the same way for both
INSERT
s and the new
LOAD RESUME YES SHRLEVEL CHANGE
.
In order to avoid lock
timeout, lock escalation, and exceeding the number of locks per user, the
LOAD
utility intelligently manages the commit scope by monitoring the time since the last commit and comparing that time to the system lock timeout value, the number of locks held on the tablespace, and the number of locks allowed for each user. The commit scope for each unit of recovery is then adapted to these variables so that the
LOAD
job doesn't interfere with or become a victim of these parameters. (For more
information about the commit scope, see
Programmers Only, Quarter 1, 2003
issue.)
The
LOAD
is restartable, and the utility correctly positions the next input record. And
LOAD
partition parallelism allows multiple table partitions to be loaded concurrently from a single
LOAD
job.
IBM benchmarks have found that
LOAD RESUME YES SHRLEVEL CHANGE
is about 25 percent faster in elapsed time and
uses 30 percent less CPU time than a comparable
INSERT
program. On the other hand, it's about twice the elapsed time of
LOAD RESUME YES SHRLEVEL NONE
. You're trading performance for availability, but using
LOAD
is still much easier in maintenance terms than most customer-written
INSERT
programs.
REPLACE DELETE PROGRAMS WITH REORG DISCARD
Some applications periodically need to delete a significant percentage of data rows. Some companies in
certain industries can't delete data for many years. In those circumstances, the data must be archived. In either case, reorganizing the tablespace to reclaim space and improve index access efficiency is usually necessary after the data is deleted.
REORG TABLESPACE SHRLEVEL NONE
or
REFERENCE
(or
CHANGE
in v.8) deletes data during reorganization. The data to be deleted is qualified by the
DISCARD FROM TABLE
table name
WHEN
condition. As the data is unloaded, DB2 detects rows matching the specified discard conditions and will either throw away those rows or convert them to an external format, write them to a data set, or generate a corresponding
LOAD
statement matching the column positions for loading into archive table. SQL
DELETE
s of the same amount of data rows would be time consuming, and they generate recovery log records that can cause bottlenecks on highly loaded systems. DB2 keeps concurrent
transactions from writing over each other's log records with something called a "log latch." Massive
DELETE
processing can dominate the log latch and be the cause of a performance bottleneck on the entire system. Specifying LOG NO on the
REORG
significantly reduces logging and avoids this impact on system overhead. The discarded rows are efficiently removed and afterward the tablespace is completely organized.
The
WHEN
condition syntax used to qualify the rows to be
discarded is a simple subset of the SQL
WHERE
.
WHEN
conditions allow
AND
ing and
OR
ing of column and literal value comparisons. Precedence of the comparisons can be controlled with parentheses. Date and timestamp expressions are supported, but other expressions and subqueries are not. Supported comparisons include:
=
,
<>
,
<
,
>
,
<=
,
>=
,
^>
,
^<
,
IS NULL
,
IS NOT NULL
,
LIKE
,
NOT LIKE
,
BETWEEN
,
NOT BETWEEN
,
IN
, and
NOT IN
.
WHEN
selection conditions are restricted by what a stage 1 predicate can do and the
REORG
utility doesn't use any indexes to qualify rows because it's already performing a tablespace scan. (Note: You should only specify literal values in hexadecimal for ASCII or Unicode tables.)
Listing 1
shows an example of the
REORG DISCARD
approach used (in this case) to reorganize tablespace
DSN8D71A.DSN8S71E
and discard from the
DSN8710.EMP
table all records added in the last year when the value of the
WORKDEPT
column is
E11
or
D21
.
Listing 1
: Using
REORG DISCARD
to reorganize a tablespace.
TRANSFORM DATA WITH UNLOAD
As much as you may want your original application designs to last forever, you've probably discovered that applications must evolve to handle more than you ever anticipated. Once, you may have thought that a
SMALLINT
or
INTEGER
column could handle the quantity of objects you need to track, but now you need to use an
INTEGER
or
DECIMAL
column. Or perhaps you've decided that some column
would be better defined as a varying length character and you want to strip away trailing blanks to avoid padding its size. Or someone else has decided that the data should be stored as Unicode instead of EBCDIC.
The
UNLOAD
utility is one way you can accomplish these data transformations.
UNLOAD
converts numeric columns to other formats as long as there's no loss of precision (for example,
SMALLINT
can be unloaded as an
INTEGER
, but an
INTEGER
can't be unloaded as a
SMALLINT
). For character columns,
STRIP
can remove leading or trailing blanks or both leading and trailing blanks and
TRUNC
can be used to truncate a column to a specific length. When converting character data to EBCDIC, ASCII, or Unicode, you can also specify a particular CCSID. If you only need a subset of the data for some quick analysis or testing, you can reduce the quantity of unloaded rows by specifying a percentage using the
SAMPLE
parameter. You can specify the maximum number of rows to be unloaded using the
LIMIT
parameter. You can also qualify the subset with the
WHEN
conditions I described in the section on REORG.
TRANSFER AND TRANSFORM WITH CROSS LOADER
The DB2 Family Cross Loader was jointly designed by DB2 for OS/390 and z/OS and DB2 for Linux, Unix, and Windows development teams. Delivered first on DB2 for OS/390 and z/OS, it combines the power and performance of SQL function, distributed relational database architecture (DRDA) communications, and the IBM LOAD utility.
Transferring data from one location to another typically involves three
serial steps: unload, file transfer, and load. No step can start until the prior step completes. The Cross Loader is a single-step process that overlaps unload, transfer, and load processing. And it circumvents the operating system file-size limitations that exist on some platforms. (A table can hold more data than can be stored in a single unload file on some platforms.) In these cases, without Cross Loader, you'd manually unload portions into separate files and hope they all transfer to the target table.
Figure 1
shows the data sources possible through native DRDA servers and those accessible via DB2 DataJoiner or Relational Connect. Of course, the source can be the same DB2 as the target of the Cross Loader, which is particularly helpful if you need a high performance method of summarizing or restructuring data at a single location. The Cross Loader can convert the data types of columns (for example, changing from
INTEGER
to
FLOAT
) and change the data
encoding scheme (for example, from EBCDIC to Unicode). Because the data source is described using SQL, tables can be joined,
UNION
ed, and aggregated. These operations aren't possible with the
UNLOAD
utility.
Figure 1
: Data sources for the DB2 family Cross Loader.
Listing 2
shows an example of aggregating data from a data store at a
remote location. As part of copying the data from the location
WAREHOUSE
to
LOCAL
, let's say we want to summarize and scrub the data. This listing finds the maximum and minimum values for salary within job for those jobs where the maximum salary is greater than $50,000. Within the utility input stream, you'd declare a cursor to describe the data input. Next, you'd tell the
LOAD
utility to use the cursor as input with the
INCURSOR
parameter. In this case,
you're also getting an inline copy and collecting statistics inline.
Listing 2
: Aggregating data from a remote location using the Cross Loader.
NOT JUST FOR DBAS
Utilities aren't just for DBAs anymore. Programmers face tight schedules to deliver very large and highly available application systems. Taking advantage of utility functions as part of the overall system
design makes it easier to meet those challenges.
Jim Ruddy
is a senior software engineer in DB2 UDB for z/OS development at the IBM Silicon
Valley Laboratory in San Jose, Calif. He focuses on DB2 utility architecture and design. Previously, Jim was a senior principle scientist for Boeing.
You can reach him at
jaruddy@us.ibm.com
.
Comments?
Questions?
Give us your feedback or ask a question of the author.