datori

Database administration for fun and profit

A perl one-liner to extract all URLs from an HTML document

Just in case I ever forget how I did it… I was trying to download some 40 page PDF brochure from a government web site – I wanted to print it out and read it off-line. However, it was cleverly split into 20 different PDFs – no doubt for convenience. Instead of spending 20 minutes clicking on those various links and printing 20 document fragments, I chose to spend twice that time trying to automate the process. And here it is, in all its glory:

curl -s "http://www.datori.org" \
| perl -n -e 'chomp;s/.*?(?:(?i)href)="([^"]+)".*?(?:$|(?=(?i)href))/$1\n/xg and print'

The “thing” downloads the specified page and extracts all linked URLs from it, as indicated by the “href” tags. You’ve got to appreciate the enormity of perl…

Unix time from DB2

Here’s one way to obtain the Unix time value (the number of seconds since midnight on January 1, 1970) from a DB2 timestamp:

values(
  (days(current_timestamp-current_timezone) - days('1970-01-01') )*86400 + 
  midnight_seconds(current_timestamp - current_timezone)
)

This can be used in a query directly or wrapped into a simple SQL user-defined function.

Top 5 SQL statements

Or may be top 10. Or 3. Whatever the number, we are often looking for the worst offenders kicking up the server’s CPU utilisation or I/O wait time to the skies. DB2 built-in snapshot functions are a great help. Run “select * from table (snapshot_dyn_sql('YOURDB', -1)) t order by rows_read desc fetch first 5 rows only” and you will get a list of the queries retrieving the most data. The only thing is, snapshot functions provide, well, snapshots of the DB2 monitor data and are not by themselves suitable for the collection of historical data. Monitor counters can be reset without your knowledge, for example when the database is deactivated or when the SQL statement expires from the package cache.

To ensure continuity of the data collected by snapshot functions we can quickly create a table where we will store historical snapshot information:

create table DYNSQLDATA as (select * from table (snapshot_dyn_sql('YOURDB', -1)) t) with no data

We will then insert output of the snapshot function into this table at regular intervals:

insert into DYNSQLDATA select t.* from table (snapshot_dyn_sql('YOURDB', -1)) t

Now we can easily determine which statements demand most resources and analyze their historical performance. Read More

Automatic monitoring of the SSL certificate expiration date

This is not really about database administration, but one of the problems I often face is monitoring of the expiration of SSL certificates on my clients’ web servers. It usually takes some time to renew a certificate, and it helps to know in advance that I need to get the process started.Here’s a little script that checks the certificate on a given web server and sends a reminder if it is about to expire:

#!/bin/bash# checks the ssl certificate expiration date of a given host
# Usage: ./checksslcert.sh <hostname> [<port>]
# Port defaults to 443 if not specified
test -z "$1" && echo "Usage: $0 <hostname> [<port>]" && exit 0
tempstr=$(openssl s_client -connect $1:${2:-443} 2>/dev/null >$0.log)
test $? -gt 0 && echo "Error accessing SSL certificate on $1" && exit 1
exptime=$(date -d"${tempstr#*=}" +"%s")
expdays=$(((${exptime} - $(date +"%s"))/84400))
echo "SSL certificate on $1 expires in $expdays days"
test $expdays -lt 45 && echo "Do something!" | mailx -s "SSL certificate on $1 expires in $expdays days" admin@domain.com

Run it daily by cron and you will never miss the expiration date again. The script needs the GNU date utility and openssl to be installed. It has been tested under bash, but you can easily modify it to run under other shells.