Programmers Only: Programs vs. Utilities
Programs vs. Utilities
Before you write another INSERT or DELETE program, take a look at DB2 utilities.
By Jim Ruddy
Quarter 2, 2003

Printer-Friendly Version
Email this Story
Bookmark to del.ico.us
Digg It!

DB2 programmers: Are your DBAs holding out on you? If they haven't told you about utility functions you could use instead of maintaining Cobol or C programs, you should probably ask them to. Programmers Only columnist Bonnie Baker suggested that I put together a column about some topics typically addressed to DBAs that would also be useful to a programmer.

I've assembled a list of ways programmers can use DB2 utilities to lighten the programming load. (Bonnie Baker fans take heart — she'll return to the Programmers Only slot in the next issue.)

REPLACE INSERT PROGRAMS WITH ONLINE LOAD RESUME

Sometimes, DB2 for z/OS customers try (usually only once) to use the LOAD utility to add data to tables while SQL operations are underway. But they soon discover that regular LOAD drains all access to the tablespace. (DB2 keeps track of application interest in a tablespace or index with a mechanism called claim. Some utility functions may need to limit read and write access, so they drain an application's read and write claims. DB2 waits until all the applications have relinquished their claims and blocks any new claims. See Programmers Only, Quarter 4, 2002 for more information.) Although LOAD has better performance, none of the data in the entire tablespace is available for transaction INSERT , UPDATE , or DELETE while LOAD is running. So, customers turn to INSERT programs instead.

But INSERT programs cause contention with online transaction processing programs if they don't commit or don't commit frequently. They're also costly to design, especially if you want them to be restartable or support different input formats. So programmers often end up writing another INSERT program — and then another, and another. Soon, they find themselves wishing they could use the LOAD utility, even if they have to trade availability for performance.

DB2 UDB version 7 added the options SHRLEVEL NONE and SHRLEVEL CHANGE to LOAD RESUME YES to address some of these situations. LOAD RESUME YES SHRLEVEL NONE operates just the way regular LOAD RESUME YES did in prior releases. But online LOAD RESUME YES SHRLEVEL CHANGE will operate the way SQL INSERT programs do: It claims, as SQL transactions do, instead of using drains, as regular utilities do. It also uses Data Manager INSERT instead of Data Manager LOAD . And, because it's inserting, it will try to maintain the order of the table, also known as the cluster order, using free space within a page and free pages (which regular LOAD doesn't). Only LOG YES is allowed, because LOAD RESUME YES SHRLEVEL CHANGE is intended to run concurrently with SQL, and no copy is required after the LOAD completes.

You can define table triggers as FOR EACH STATEMENT or FOR EACH ROW ; however each row loaded is treated as a single statement. So, both BEFORE and AFTER INSERT triggers are fired before or after each row is loaded. In contrast, a multirow INSERT with an AFTER INSERT ... FOR EACH STATEMENT trigger would fire only once. If you don't have any FOR EACH STATEMENT triggers, then table triggers fire the same way for both INSERT s and the new LOAD RESUME YES SHRLEVEL CHANGE .

In order to avoid lock timeout, lock escalation, and exceeding the number of locks per user, the LOAD utility intelligently manages the commit scope by monitoring the time since the last commit and comparing that time to the system lock timeout value, the number of locks held on the tablespace, and the number of locks allowed for each user. The commit scope for each unit of recovery is then adapted to these variables so that the LOAD job doesn't interfere with or become a victim of these parameters. (For more information about the commit scope, see Programmers Only, Quarter 1, 2003 issue.)

The LOAD is restartable, and the utility correctly positions the next input record. And LOAD partition parallelism allows multiple table partitions to be loaded concurrently from a single LOAD job.

IBM benchmarks have found that LOAD RESUME YES SHRLEVEL CHANGE is about 25 percent faster in elapsed time and uses 30 percent less CPU time than a comparable INSERT program. On the other hand, it's about twice the elapsed time of LOAD RESUME YES SHRLEVEL NONE . You're trading performance for availability, but using LOAD is still much easier in maintenance terms than most customer-written INSERT programs.

REPLACE DELETE PROGRAMS WITH REORG DISCARD

Some applications periodically need to delete a significant percentage of data rows. Some companies in certain industries can't delete data for many years. In those circumstances, the data must be archived. In either case, reorganizing the tablespace to reclaim space and improve index access efficiency is usually necessary after the data is deleted.

REORG TABLESPACE SHRLEVEL NONE or REFERENCE (or CHANGE in v.8) deletes data during reorganization. The data to be deleted is qualified by the DISCARD FROM TABLE table name WHEN condition. As the data is unloaded, DB2 detects rows matching the specified discard conditions and will either throw away those rows or convert them to an external format, write them to a data set, or generate a corresponding LOAD statement matching the column positions for loading into archive table. SQL DELETE s of the same amount of data rows would be time consuming, and they generate recovery log records that can cause bottlenecks on highly loaded systems. DB2 keeps concurrent transactions from writing over each other's log records with something called a "log latch." Massive DELETE processing can dominate the log latch and be the cause of a performance bottleneck on the entire system. Specifying LOG NO on the REORG significantly reduces logging and avoids this impact on system overhead. The discarded rows are efficiently removed and afterward the tablespace is completely organized.

The WHEN condition syntax used to qualify the rows to be discarded is a simple subset of the SQL WHERE . WHEN conditions allow AND ing and OR ing of column and literal value comparisons. Precedence of the comparisons can be controlled with parentheses. Date and timestamp expressions are supported, but other expressions and subqueries are not. Supported comparisons include: = , <> , < , > , <= , >= , ^> , ^< , IS NULL , IS NOT NULL , LIKE , NOT LIKE , BETWEEN , NOT BETWEEN , IN , and NOT IN . WHEN selection conditions are restricted by what a stage 1 predicate can do and the REORG utility doesn't use any indexes to qualify rows because it's already performing a tablespace scan. (Note: You should only specify literal values in hexadecimal for ASCII or Unicode tables.)

Listing 1 shows an example of the REORG DISCARD approach used (in this case) to reorganize tablespace DSN8D71A.DSN8S71E and discard from the DSN8710.EMP table all records added in the last year when the value of the WORKDEPT column is E11 or D21 .

Listing 1 : Using REORG DISCARD to reorganize a tablespace.

TRANSFORM DATA WITH UNLOAD

As much as you may want your original application designs to last forever, you've probably discovered that applications must evolve to handle more than you ever anticipated. Once, you may have thought that a SMALLINT or INTEGER column could handle the quantity of objects you need to track, but now you need to use an INTEGER or DECIMAL column. Or perhaps you've decided that some column would be better defined as a varying length character and you want to strip away trailing blanks to avoid padding its size. Or someone else has decided that the data should be stored as Unicode instead of EBCDIC.

The UNLOAD utility is one way you can accomplish these data transformations. UNLOAD converts numeric columns to other formats as long as there's no loss of precision (for example, SMALLINT can be unloaded as an INTEGER , but an INTEGER can't be unloaded as a SMALLINT ). For character columns, STRIP can remove leading or trailing blanks or both leading and trailing blanks and TRUNC can be used to truncate a column to a specific length. When converting character data to EBCDIC, ASCII, or Unicode, you can also specify a particular CCSID. If you only need a subset of the data for some quick analysis or testing, you can reduce the quantity of unloaded rows by specifying a percentage using the SAMPLE parameter. You can specify the maximum number of rows to be unloaded using the LIMIT parameter. You can also qualify the subset with the WHEN conditions I described in the section on REORG.

TRANSFER AND TRANSFORM WITH CROSS LOADER

Resources

Programmers Only, Quarter 4, 2002

Programmers Only, Quarter 1, 2003

The DB2 Family Cross Loader was jointly designed by DB2 for OS/390 and z/OS and DB2 for Linux, Unix, and Windows development teams. Delivered first on DB2 for OS/390 and z/OS, it combines the power and performance of SQL function, distributed relational database architecture (DRDA) communications, and the IBM LOAD utility.

Transferring data from one location to another typically involves three serial steps: unload, file transfer, and load. No step can start until the prior step completes. The Cross Loader is a single-step process that overlaps unload, transfer, and load processing. And it circumvents the operating system file-size limitations that exist on some platforms. (A table can hold more data than can be stored in a single unload file on some platforms.) In these cases, without Cross Loader, you'd manually unload portions into separate files and hope they all transfer to the target table.

Figure 1 shows the data sources possible through native DRDA servers and those accessible via DB2 DataJoiner or Relational Connect. Of course, the source can be the same DB2 as the target of the Cross Loader, which is particularly helpful if you need a high performance method of summarizing or restructuring data at a single location. The Cross Loader can convert the data types of columns (for example, changing from INTEGER to FLOAT ) and change the data encoding scheme (for example, from EBCDIC to Unicode). Because the data source is described using SQL, tables can be joined, UNION ed, and aggregated. These operations aren't possible with the UNLOAD utility.

Figure 1 : Data sources for the DB2 family Cross Loader.

Listing 2 shows an example of aggregating data from a data store at a remote location. As part of copying the data from the location WAREHOUSE to LOCAL , let's say we want to summarize and scrub the data. This listing finds the maximum and minimum values for salary within job for those jobs where the maximum salary is greater than $50,000. Within the utility input stream, you'd declare a cursor to describe the data input. Next, you'd tell the LOAD utility to use the cursor as input with the INCURSOR parameter. In this case, you're also getting an inline copy and collecting statistics inline.

Listing 2 : Aggregating data from a remote location using the Cross Loader.

NOT JUST FOR DBAS

Utilities aren't just for DBAs anymore. Programmers face tight schedules to deliver very large and highly available application systems. Taking advantage of utility functions as part of the overall system design makes it easier to meet those challenges.


Jim Ruddy is a senior software engineer in DB2 UDB for z/OS development at the IBM Silicon Valley Laboratory in San Jose, Calif. He focuses on DB2 utility architecture and design. Previously, Jim was a senior principle scientist for Boeing. You can reach him at jaruddy@us.ibm.com .


Comments? Questions?

Give us your feedback or ask a question of the author.

Please enter your e-mail address below:

CAREER CENTER
Ready to take that job and shove it?
SEARCH JOBS
RECENT JOB POSTINGS
CAREER NEWS
10 Search Engines You Don't Know About
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Subscribe to the new digital version of IBM Database Magazine
New Digital Version

Sponsored links:



Subscribe to the IBM Database Magazine Newsletter

Email Address *
First Name
Last Name
HTML Preference
HTML Text
 

Fields with * are required.

 


:: IBM Database Magazine ::