Quiz Night

If this is the closing section of thetkprof output from the trace file of a single end-user session that has a performance problem, what’s the most obvious deduction you can make about the cause of the problem, and what sort of action would you take next ?


OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      296      0.06       0.10          0        651          0           0
Execute    440      1.39       2.24          7       4664          0         146
Fetch      345     29.38      48.27          0    1709081          0         346
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1081     30.84      50.63          7    1714396          0         492

Misses in library cache during parse: 5
Misses in library cache during execute: 7

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     497        0.00          0.00
  SQL*Net message from client                   496       27.03         50.35
  direct path read                                1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   128199      6.94      11.46          0       2740          0           0
Execute 2274845    371.25     605.60         30   10031162          0       68200
Fetch   2225314     10.94      18.17          5     879297          0      577755
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   4628358    389.14     635.23         35   10913199          0      645955

Misses in library cache during parse: 9701
Misses in library cache during execute: 134

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                                1        0.00          0.00
  latch: shared pool                              3        0.00          0.00

12666  user  SQL statements in session.
  495  internal SQL statements in session.
13161  SQL statements in session.
********************************************************************************
Trace file: {concealed file name}.trc
Trace file compatibility: 11.1.0.7
Sort options: default

       1  session in tracefile.
   12666  user  SQL statements in trace file.
     495  internal SQL statements in trace file.
   13161  SQL statements in trace file.
    5266  unique SQL statements in trace file.
 39046623  lines in trace file.
     742  elapsed seconds in trace file.


There’s no absoutely right answer to the last question, by the way – there are a couple of variations in approach that are likely to get to the same result in the same time, and some of those variations might have made looking at the tail end data the 2nd or 3rd step in the sequence.

Update

One of the skills of using your time effectively when trouble-shooting is the way you balance the time you spend noting the little details that might be relevant, and the time you then spend following up each detail to check for relevance and corroborating evidence. The amateurs might throw themselves into in-depth research on the first detail they notice; or if they’ve started with a short list of observations simply spend too much time on the first before moving on to the second.

An exercise like looking at this tail-end this tkprof file and talking about what you spot and what you might do is just a little exercise in how to pace yourself as you tackle a problem. So, from my perspective, here a couple of obvious starting points:

  • I said it was an end-user session complaining about performance – the last line of the file tells use that the elapsed time was 742 seconds, and the summary of recursive statement tells us there were 2.2 million executions. How can ANYTHING an end-user wants to do “quickly” require 3,000 executions per second of 12 minutes? I can ask that question because I know that “recursive” executions aren’t necessarily “sys-recursive”, SQL statements executed inside a PL/SQL block are also recursive – on top of that I can see, anyway, that there are 12,466 (probably all different) USER statements in the file – the user (or the user’s code) is doing something it almost certainly shouldn’t be doing. Those 2.2M executions are responsible for 10M buffer visits – does that suggest a lot of single row processing ?
  • Associated with the 13,161 statements in the trace file there are 9,700 misses in the library cache during parse – that means “hard” parsing, probably means the user is getting through about 800 “new” statements per minute – but there are only 5,266 unique statements so part of the problem is that some recently used statements (or child cursors, at least) are being flushed from the library cache: again that suggests that they’re doing something wrong.
  • As one of the commentators pointed out – a detail that I hadn’t noted initially – the number of “current” buffers is zero. This isn’t a mini-batch updating the database, it’s just a report (done badly).

What would I do next ? If it wasn’t already immediately obvious I’d look for the statements that were responible for the very large numbers of executions; a call to tkprof with sort=execnt as a parameter would push the high execution counts to the top of the file.  (The worst one had over 900,000 executions, the next 600,000). Then I’d do a grep, sed, sort with uniq -c to find the statement which (I’d guess – and there were 3 of them responsible for a few thousand variants each) were the generated texts using concatenation instead of bind variables. Then I’d go and find the owner of the code and sit down with them to work out how it should be re-engineered.