syscall

Investigating kernel dives using ftrace.

This blogpost is about using the linux ftrace kernel facility. If you are familiar with ftrace and specifically the function_graph tracer, you might already be aware of this functionality. This is Linux specific, and this facility is at least available in kernel 2.6.39 (Oracle’s UEK2 kernel).

The Oracle wait interface granularity of measurement

The intention of this blogpost is to show the Oracle wait time granularity and the Oracle database time measurement granularity. One of the reasons for doing this, is the Oracle database switched from using the function gettimeofday() up to version 11.2 to clock_gettime() to measure time.

This switch is understandable, as gettimeofday() is a best guess of the kernel of the wall clock time, while clock_gettime(CLOCK_MONOTONIC,…) is an monotonic increasing timer, which means it is more precise and does not have the option to drift backward, which gettimeofday() can do in certain circumstances, like time adjustments via NTP.

The first thing I wanted to proof, is the switch of the gettimeofday() call to the clock_gettime() call. This turned out not to be as simple as I thought.

Dang it, people, they're syscalls, not "waits"...

So many times, I see people get really confused about how to attack an Oracle performance problem, resulting in thoughts that look like this:

I don't understand why my program is so slow. The Oracle wait interface says it's just not waiting on anything. ?

The confusion begins with the name "wait event." I wish Oracle hadn't called them that. I wish instead of WAIT in the extended SQL trace output, they had used the token SYSCALL. Ok, that's seven bytes of trace data instead of just four, so maybe OS instead of WAIT. I wish that they had called v$session_wait either v$session_syscall or v$session_os .

Here's why. First, realize that an Oracle "wait event" is basically the instrumentation for one operating system subroutine call ("syscall"). For example, the Oracle event called db file sequential read: that's instrumentation for a pread call on our Linux box. On the same system, a db file scattered read covers a sequence of two syscalls: _llseek and readv (that's one reason why I said basically at the beginning of this paragraph). The event called enqueue: that's a semtimedop call.