datori

Database administration for fun and profit

Populate a DB2 database table with pseudo-random BLOBs

How do we quickly populate a database table with arbitrary LOB data? This question often comes up when we need to generate some data for performance testing or tuning. We can, of course, write a little utility in our favourite programming language, such as Java or Perl.

The Java approach can be found in DB2 documentation, for example, here. It shows a stored procedure, but we can see how this idea can be used in a standalone application as well. All we need to do now is write the code, compile it hoping there are no errors (as if!), and wait while a few hundred thousand of LOBs get inserted over a JDBC connection. Fun!

But is there a quicker way? How about three simple Bash commands? We can use one of the DB2 utilities, such as IMPORT or LOAD, and the way they handle LOB data to put arbitrary fragments of any file into a LOB column.

Suppose we have a simple table like this:

$db2 "create table blobtest (key int not null generated always as identity, bval blob (2 m))"

First, we create an operating system named pipe, also known as FIFO, for “first in – first out”. This will let us save on unnecessary I/O and wasted disk space that would be needed to store intermediate results otherwise.

$mkfifo /tmp/blobtest.csv

Make sure the pipe is readable by the DB2 utility. Now we can start the utility:

$db2 "load from /tmp/blobtest.csv of del modified by lobsinfile identitymissing method p (1) insert into blobtest (bval) nonrecoverable">load.log&

The pipe looks just like any other file to the LOAD utility, and it waits for something to start pouring out of it. The lobsinfile modifier tells the utility that it should expect LOB Location Specifiers (LLS) in the input data — more on this later. We will let DB2 generate the identity values, hence the modifier identitymissing.

LLS is a way to tell LOAD where to find LOB data. It has this format: “<file path>.<offset>.<length>/”, e.g. /tmp/lobsource.dta.0.100/ to indicate that the first 100 bytes of the file /tmp/lobsource.dta should be loaded into the particular LOB column. Notice also the trailing slash. LLSes are placed in the input file instead of the actual data for each row and LOB column. More on this in the manual.

Notice also the & at the end of the command line. This will send the LOAD command in the background, otherwise we would need to start a new shell session for the next step.

Now we can start sending input data into the pipe:

$for i in {1..700000} ; do echo "/tmp/lobsource.dta.$RANDOM.$(( $RANDOM * 10 ))/"; done>/tmp/blobtest.csv

The Bash $RANDOM built-in function generates a pseudo-random integer in the range between 0 and 32767. We use it to create LLSes pointing to pseudo-random fragments in a sufficiently large binary file called /tmp/lobsource.dta (any readable file will do). In our case the LLS offset values will vary from 0 to 32767 and the LOB length will be in the range between 0 and 327670, with 10 byte increments.

As a result of the LOAD command above, using the generated on the fly input file, we will populate the test table with 700,000 rows of arbitrary BLOBs. We can, of course, use the same approach to generate “random” CLOBs from a large text file.

Leave a Reply

Your email address will not be published. Required fields are marked *