datori

Database administration for fun and profit

Top 5 SQL statements

Or may be top 10. Or 3. Whatever the number, we are often looking for the worst offenders kicking up the server’s CPU utilisation or I/O wait time to the skies. DB2 built-in snapshot functions are a great help. Run “select * from table (snapshot_dyn_sql('YOURDB', -1)) t order by rows_read desc fetch first 5 rows only” and you will get a list of the queries retrieving the most data. The only thing is, snapshot functions provide, well, snapshots of the DB2 monitor data and are not by themselves suitable for the collection of historical data. Monitor counters can be reset without your knowledge, for example when the database is deactivated or when the SQL statement expires from the package cache.

To ensure continuity of the data collected by snapshot functions we can quickly create a table where we will store historical snapshot information:

create table DYNSQLDATA as (select * from table (snapshot_dyn_sql('YOURDB', -1)) t) with no data

We will then insert output of the snapshot function into this table at regular intervals:

insert into DYNSQLDATA select t.* from table (snapshot_dyn_sql('YOURDB', -1)) t

Now we can easily determine which statements demand most resources and analyze their historical performance. Something like this to see what were the most resource-intensive queries in the past hour:

select sql_text, rows_read, num_executions , total_exec_time
from dynsqldata
where stmt_text in (
   select distinct stmt_text
   from dynsqldata
   where snapshot_timestamp > current_timestamp - 1 hour
   order by rows_read desc fetch first 5 rows only)

Only… we cannot do that, really: DB2 stores the SQL statement text as a CLOB, which is subject to certain limitations when it comes to using this datatype in queries: CLOB’s cannot be used with the DISTINCT clause, in the ORDER BY or GROUP BY clauses, with the IN predicate, etc. In other words, we can keep the statement history but we cannot analyze it, comparing entries with one another, without jumping through some hoops.

We could, for example, try using the SUBSTR() function to extract the first 1012 bytes of each CLOB, which would allow us to sort through the statements, but it is obvious that this approach is unacceptable if we are dealing with longer SQL statements. (DB2 9 increases the limit on the length of the sort key to 32 KB, but it still may not be enough in some cases.)

Luckily, there is another way of sorting through long SQL statements – without actually comparing their texts. Instead, we will be comparing their hash values.

Hash function is an algorithm that transforms an arbitrary value, an SQL statement in our case, into a shorter fixed-length value, sort of a fingerprint. The same input will always produce the same output, and while there is a probability that two different inputs will result in the same output, it is small enough that we can ignore it for our purposes.

Our first step will be to implement a user-defined function for the computation of an MD5 hash for a CLOB argument. The computation itself is performed by a Java class MessageDigest, which is a part of the core Java runtime environment since version 1.4.2 and is practically guaranteed to be present wherever DB2 is installed. You can download the Java source from here; after compiling it (“javac Md5.java” should do it) and putting it into sqllib/function you are ready to create the UDF:

CREATE FUNCTION md5( clob )
 RETURNS char(32)
 LANGUAGE java
 PARAMETER STYLE db2general
 NO SQL
 FENCED THREADSAFE
 DETERMINISTIC
 RETURNS NULL ON NULL INPUT
 NO EXTERNAL ACTION
 EXTERNAL NAME 'Md5!md5'
;

Now all we need to is add a column for the hash value to the snapshot table that we created earlier:

alter table DYNSQLDATA add column stmt_md5 char (32)

Snapshots will be added to the table along with the statement hash values:

insert into DYNSQLDATA select t.*, md5(t.stmt_text) from  table (snapshot_dyn_sql('YOURDB', -1)) t

Once we have collected enough snapshots we can proceed with analyzing historical performance data, discovering trends, and so on. As an example, we could come up with something like the crazy query below, which is trying to find 5 queries that read most rows each time they are executed within the past 8 hours.

with stats (stmt_md5, snapshot_timestamp, rows_per_exec, maxrows_exec, avgrows_stmt,
  avgnum_stmt, avgnum_all, avgrows_all, rownum) as (
  select
    stmt_md5,
    snapshot_timestamp,
    case num_executions when 0 then 0 else rows_read/num_executions end rows_per_exec,
    case num_executions when 0 then 0 else max(rows_read/num_executions) over (partition by stmt_md5) end maxrows_exec,
    case num_executions when 0 then 0 else avg(rows_read/num_executions) over (partition by stmt_md5 order by snapshot_timestamp desc rows 30 preceding) end avgrows_stmt,
    avg(num_executions) over (partition by stmt_md5 order by snapshot_timestamp desc rows 30 preceding) avgnum_stmt,
    avg (num_executions) over () avgnum_all,
    case num_executions when 0 then 0 else avg(rows_read/num_executions) over () end avgrows_all,
    rownumber() over (partition by stmt_md5 order by snapshot_timestamp desc) rownum
  from
    DYNSQLDATA
  where
    snapshot_timestamp > (select max(snapshot_timestamp) from DYNSQLDATA) - 8 hours and
    num_executions > 0
) select num_executions, rows_read, pool_data_p_reads, stmt_text, n from (
  select d.num_executions, d.rows_read, d.pool_data_p_reads, d.stmt_text,
  rownumber() over (order by stats.avgnum_stmt desc) n
  from
    DYNSQLDATA d, stats
  where d.stmt_md5 = stats.stmt_md5 and d.snapshot_timestamp = stats.snapshot_timestamp
    and
    stats.rownum = 1
    and
    stats.avgrows_stmt > 3*stats.avgrows_all
    and
    avgnum_stmt > 100
) tt
where tt.n <=5
order by tt.n desc

6 Responses to Top 5 SQL statements

  1. Sanjay says:

    Thank you for putting together detail information. I found it is quite usuful, however, last query, with stats (stmt_md5, snapshot_timestamp, rows_per_exec, maxrows_exec, avgrows_stmt,
    query does NOT work, it gives syntex error. Could you pl. send by right syntex?

  2. nick says:

    Thanks for catching this – there was a typo. Don’t forget to replace DYNSQLDATA with the actual table name where you store _your_ snapshots.

  3. Pingback: Hash Function - dBforums

  4. rudy says:

    example of “top 5 articles” query: http://r937.com/fail.gif

    ;o)

  5. Pingback: db2 md5 yapmaca | gel keyfim gel

  6. Geovan Borges says:

    This helped me. I´m looking for compare CLOB stmt and found this site. Thanks!

Leave a Reply

Your email address will not be published. Required fields are marked *