Oakies Blog Aggregator

Dang it, people, they're syscalls, not "waits"...

So many times, I see people get really confused about how to attack an Oracle performance problem, resulting in thoughts that look like this:

I don't understand why my program is so slow. The Oracle wait interface says it's just not waiting on anything. ?

The confusion begins with the name "wait event." I wish Oracle hadn't called them that. I wish instead of WAIT in the extended SQL trace output, they had used the token SYSCALL. Ok, that's seven bytes of trace data instead of just four, so maybe OS instead of WAIT. I wish that they had called v$session_wait either v$session_syscall or v$session_os .

Here's why. First, realize that an Oracle "wait event" is basically the instrumentation for one operating system subroutine call ("syscall"). For example, the Oracle event called db file sequential read: that's instrumentation for a pread call on our Linux box. On the same system, a db file scattered read covers a sequence of two syscalls: _llseek and readv (that's one reason why I said basically at the beginning of this paragraph). The event called enqueue: that's a semtimedop call.

Second, the word wait is easy to misinterpret. To the Oracle kernel developer who wrote the word WAIT into the Oracle source code, the word connoted the duration that the code path he was writing would have to "wait" for some syscall to return. But to an end-user or performance analyst, the word wait has lots of other meanings, too, like (to name just two):

  1. How long the user has to wait for a task to complete (this is R in the R = S + W equation from queueing theory).
  2. How long the user's task queues for service on a specific resource (this is W in the R = S + W equation from queueing theory).

The problem is that, as obvious and useful as these two definitions seem, neither one of them means what the word wait means in an Oracle context, which is:

wait n. In an Oracle context, the approximate response time of what is usually a single operating system call (syscall) executed by an Oracle kernel process.

That's a problem. It's a big problem when people try to stick Oracle wait times into the W slot of mathematical queueing models. Because they're not W values; they're R values. (But they're not the same R values as in #1 above.)

But that's a digression from a much more important point: I think the word wait simply confuses people into thinking that response time is something different than what it really is. Response time is simply how long it takes to execute a given code path.

To understand response time, you have to understand code path.

This is actually the core tenet that divides people who "tune" into two categories: people who look at code path, and people who look at system resources.

Here's an example of what code path really looks like, for an Oracle process:

begin prepare (dbcall)
execute Oracle kernel code path (mostly CPU)
maybe make a syscall or two (e.g., "latch: library cache")
maybe even make recursive prepare, execute, or fetch calls (e.g., view resolution)
end prepare
maybe make a syscall or two (e.g., "SQL*Net message...")
begin execute (another dbcall)
execute Oracle kernel code path
maybe make some syscalls (e.g., "db file sequential read" for updates)
end execute
maybe make a syscall or two
begin fetch (another dbcall)
execute Oracle kernel code path (acquire latches, visit the buffer cache, ...)
maybe make some syscalls (e.g., "db file...read")
end fetch
make a syscall or two

The trick is, you can't see this whole picture when you look at v$whatever within Oracle. You have to look at a lot of v$whatevers and do a lot of work reconciling what you find, to come up with anything close to a coherent picture of your code path.

But when you look at the Oracle code path, do you see how the syscalls just kind of blend in with the dbcalls? It's because they're all calls, and they all take time. It's non-orthogonal thinking to call syscalls something other than what they really are: just subroutine calls to another layer in the software stack. Calling all syscalls waits diminishes the one distinction that I think really actually is important; that's the distinction between syscalls that occur within dbcalls and the syscalls that occur between dbcalls.

It's the reason I like extended SQL trace data so much: it lets me look at my code path without having to spend a bunch of extra time trying to compose several different perspectives of performance into a coherent view. The coherent view I want is right there in one place, laid out sequentially for me to look at, and that coherent view fits what the business needs to be looking at, as in...

Scene 1:

  • Business person: Our TPS Report is slow.
  • Oracle person: Yes, our system has a lot of waits. We're working on it.
  • (Later...) Oracle person: Great news! The problem with the waits has been solved.
  • Business person: Er, but the TSP Report is still slow.

Scene 2:

  • Business person: Our TPS Report is slow.
  • Oracle person: I'll look at it.
  • (Later...) Oracle person: I figured out your problem. The TPS Report was doing something stupid that it didn't need to do. It doesn't anymore.
  • Business person: Thanks; I noticed. It runs in, like, only a couple seconds now.

Seminar: Deploying Oracle on Amazon Web Services

We are organizing a Seminar about Oracle on Amazon Web Services. This event will be held on June 16th and 17th in The Netherlands. The exact location will be will be announced later and more information will follow later.

Seminar Description

If you say the words “Cloud Computing” frequently in meetings and at the workplace, your colleagues will realize that you are very intelligent, and you will be promoted.  In addition, cloud computing improves upon traditional hosted infrastructure in many ways.  Deployment of new host resources takes only minutes. Applications can scale up quickly without costly hardware upgrades.  In this one-day seminar we will learn the pros, cons and details of deploying Oracle databases on the Amazon Elastic Computing Cloud. We will also learn how to leverage Amazon’s Web Services to maximize performance and availability using features such as RMAN, Data Guard, ASM and RAC.

If you are interested in this seminar please sign up here for more information

Oracle on 32-bit Windows: ORA-4030 errors

Over the last year I have seen a number of customers that have problems with Oracle on windows. They are getting frequently an ORA-4030 error.  Now Oracle on windows is implemented differently than Oracle on UNIX. The big difference is that Oracle on Windows shares one process with all clients (threads in this case). This process has a limited process space, which is by default 2GB and can be enlarged to 3.5GB. So this process space has to accommodate all the Oracle sessions (threads). So if you have a quick look at the Process space it basically consists of 4 parts (there are probably more, but I am not an Windows geek/expert): 

  1. The Oracle Executable
  2. The Oracle SGA
  3. The Heap (for all threads/sessions)
  4. The Stack

This has a number of implications. You can’t really tune the SGA to a very large size; you can only support a certain number of users; the heap can easily get fragmented and that means that large allocations will fail and there are other related problems.

So what can you really do to make this work? Reduce the SGA_MAX_SIZE or even remove the parameter and rely on your db_block_buffers and shared_pool_size settings. Or you can reduce the number of sessions/threads that connect to Oracle. But the your best bet is to reduce the amount of memory needed for the SGA. Another option is to switch to 64 bit windows and 64 bit windows, that will also allow for a larger process space.

I will update later with some more info.

Throughput versus Response Time

I like Doug Burns's recent blog post called Time Matters: Throughput vs. Response Time. If you haven't read it, please do. The post and its comment thread are excellent.

The principle Doug has recognized is why the knee in the performance curve is defined as the traffic intensity (think utilization, or load) value at which, essentially, the ratio of response time divided by throughput is minimized. It's not just the place where response time is minimized (which, as Doug observed, is when there's no load at all except for you, ...which is awesome for you, but not so good for business).

I'd like to emphasize a couple of points. First, batch and interactive workloads have wholly different performance requirements, which several people have already noted in their comments to Doug's post. With batch work, people are normally concerned with maximizing throughput. With online work, individual people care more about their own response times than group throughput, although those people's managers probably care more about group throughput. The individual people probably care about group throughput too, but not so much that they're happy about staying late after work to provide it when their individual tasks run so slowly they can't finish them during the normal working day.

In addition to having different performance requirements, batch workload can often be scheduled differently, too. If you're lucky, you can schedule your batch workload deterministically. For example, maybe you can employ a batch workload manager that feeds workload to your system like a carefully timed IV drip, to keep your system's CPU utilization pegged at 100% without causing your CPU run-queue depth to exceed 1.0. But online workload is almost always nondeterministic, which is to say that it can't be scheduled at all. That's why you have to keep some spare un-utilized system capacity handy; otherwise, your system load goes out past the nasty knee in your performance curve, and your users' response times behave exponentially in response to microscopic changes in load, which results in much Pain and Suffering.

My second point is one that I find that a lot of people don't understand very well: Focusing on individual response time—as in profiling—for an individual business task is an essential element in a process to maximize throughput, too. There are good ways to make a task faster, and there are bad ways. Good ways eliminate unnecessary work from the task without causing negative side-effects for tasks you're not analyzing today. Bad ways accidentally degrade the performance of tasks other than the one(s) you're analyzing.

If you stick to the good ways, you don't end up with the see-saw effect that most people seem to think of when they hear "optimize one business task at a time." You know, the idea that tuning A breaks B; then tuning B breaks A again. If this is happening to you, then you're doing it wrong. Trying to respond to performance problems by making global parameter changes commonly causes the see-saw problem. But eliminating wasteful work creates collateral benefits that allow competing tasks on your system to run faster because the task you've optimized now uses fewer resources, giving everything else freer and clearer access to the resources they need, without having to queue so much for them.

Figuring out how to eliminate wasteful work is where the real fun begins. A lot of the tasks we see are fixable by just changing just a little bit of source code. I mean the 2,142,103-latch query that consumes only 9,098 latches after fixing; things like that. A lot more are fixable by simply collecting statistics correctly. Others require adjustments to an application's indexing strategy, which can seem tricky when you need to optimize across a collection of SQL statements (here comes the see-saw), but even that is pretty much a solved problem if you understand Tapio Lahdenmäki's work (except for the inevitable politics of change control).

Back to the idea of Doug's original post, I wholeheartedly agree that you want to optimize both throughput and response time. The business has to decide what mixture is right. And I believe it's crucial to focus on eliminating waste from each individual competing task if you're going to have any hope of optimizing anything, whether you care more about response time, or throughput.

Think about it this way... A task cannot run at its optimal speed unless it is efficient. You cannot know whether a task is efficient without measuring it. And I mean specifically and exactly it, not just part of "it" or "it" plus a bunch of other stuff surrounding it. That's what profiling is: the measurement of exactly one interesting task that allows you to determine exactly where that task spends its time, and thus whether that task is spending your system's time and resources efficiently.

You can improve a system without profiling, and maybe you can even optimize one without profiling. But you can't know whether a system is optimal without knowing whether its tasks are efficient, and you can't know whether a given task is efficient without profiling it.

When you don't know, you waste time and money. This is why I contend that the ability to profile a single task is absolutely vital to anyone wanting to optimize performance.

Oracle Open World 2001 in Berlin: The Truth (Finally)

It’s time that we admit it. We did horrible things at OOW in Berlin. We’ve not told anyone for all these years, but the pressure is building inside. So I’ve decided to come clean.

We had just started Miracle, so we were only about eight folks or so in total. So we decided to go to the conference in Berlin all of us. We rented two or three apartments and also invited our friends (customers) to stay with us.

We drove down there in a few cars and found out upon arrival that the apartments were empty except for the mattresses on the floor. Oh well, easier to find your way around.

I’m still not sure why Peter Gram or someone else decided to bring along our big office printer/scanner/copier, but the guys quickly set up the network, the printer and the laptops, and then we just sat around, worked on the laptops, drank beers and talked about all sorts of Oracle internals.

I went down to registration and got a badge, so that was good. Then someone (forget who) came up with the idea that we should simply copy my badge so the rest of the guys could get in for free.

It wasn’t because we didn’t have the money or anything. Oh no. It was just because it sounded stupid and a little risky. So that’s why you’ll find pictures here and there (including in my office) of the guys copying and modifying badges.

The biggest challenge was that the badges had an “Oracle-red” stripe at the bottom.

But Oracle Magazine had a special conference edition out which had a lot of “Oracle-red” on the front cover, so it was just a matter of using the scissors in the Swiss army knife.

It worked perfectly for the whole conference and we were very proud, of course.

It was also the conference where I was introduced to James Morle by Anjo Kolk, our old-time friend from Oracle. I had placed myself strategically in a café/bar between the two main halls in the conference center which meant that everybody came walking by sooner or later. So I met lots of old friends that way. And a new friend named James Morle, who was in need for an assignment – and we had a customer in Germany who badly need his skills, so he ended up working for Mobilcom for half a year or more.

So the next bad thing we did was to crash the Danish country dinner. Oracle Denmark might not have been too fond of us back then, because they thought we were too many who had left in one go. Nevertheless, we thought it was not exactly stylish of them not to invite us to the Danish country dinner – as the only Danish participants.

Our friend (and future customer) Ivan Bajon from Simcorp stayed with us in the apartments and he was invited to the country dinner. So we found out where it was, snooped around a little, and then simply climbed a rather high fence and gate-crashed the dinner.

That was fun. The Oracle folks there were visibly nervous when we suddenly stormed in, but what could they do in front of all the customers, who very well knew who we were? So we sat down at the tables and had a good evening with all the other Danes there.

We had lots of fun during those few days in Berlin, had many political debates and beers, and went home smiling but tired.

To my knowledge we’ve not faked badges or gate-crashed country dinners since.

There have been a few suggestions since then that the badges we copied were actually free to begin with, but that can't possible be. I strongly object to that idea.

RMOUG Training Days

RMOUG Training Days was once again a fantastic experience. It is an amazing value for the cost, and Denver is such a beautiful place to visit. The dehydration thing seriously effects me though ... you'd think I'd remember that and start chugging water before I'm 24 hours into it. Maybe next year I'll do better :)Highlights of the conference for me included:Tim Gorman's AWR session: I've used

Diving in Iceland, June 2009

It seems to everyone that I travel a lot. I guess I do compared to most people, but I enjoy traveling, seeing new places, new people, and old friends about as much as I enjoy anything. It’s usually part of my job anyway. So, with a once-in-a-lifetime chance to visit a place I’ve never been and may not have much reason or opportunity to visit again plus do some scuba diving, I couldn’t pass it up.

That’s right, in June 2009, I will visit Iceland and willfully plunge into the +2 C water that is the clearest body of water in the world. The reasons it is so clear have something to do with the fact that the water is the runoff from melting glaciers, filtered by volcanic rocks, and is very, very cold. It supports no wildlife (another reason it’s so clear/clean). Rumor has it that visibility is over 300 feet–that is something I really do have to see to believe.

The trip is being arranged by my friend Mogens Nørgaard who may very well be completely crazy. If you ever get a chance to meet and engage in conversation with him (a.k.a. “Moans Nogood”), do it. You won’t regret it, guaranteed.

The trip is highlighted on DIVE.is, Iceland’s (probably only) dive shop website. Oh, I forgot to mention that the lake bottom is where two tectonic plates (the North American and Eurasian plates, to be precise) meet up (!), so you’re essentially diving on or in one of the continental divides.

Of course, I’m very excited about this trip and hope that Ice, land can continue to function as their economic issues seem to be a little worse than everyone else’s. In the small world department, I have made contact with an Iceland native that I worked with back at Tandem (acquired by Compaq -> HP) in the late 90s. Hopefully, I can meet up with Leifur while I’m in the country. There are only about 300,000 people in the whole country, so he shouldn’t be *that* hard to find. On the other hand, it is possible that Leifur is like “John” is in the US. We’ll see.

forcedirectio: Another victim

I see it all the time: people using Best Practices and ending up in a big mess afterwards. This time it is the mount option forcedirectio. According to a NetApp best practice for Oracle one should always use forcedirectio for the File Systems that store the Oracle Files. So people migrating to these systems read the white papers and best practices and then run into performance problems. A quick diagnosis shows that it is all related to IO. Ofcourse the NAS is blamed and NetApp gets a bad reputation. It is not only NetApp, it is true for all vendors that advice you to use the forcedirectio.

What does forcedirectio do?

It basically by passes the File System buffer cache and because of that it is using a shorter and faster code path in the OS to get the I/O done. That it is ofcourse what we want, however you are now nolonger using the File System Buffer Cache. Depending on your OS and defaults, a large portion of your internal memory could be used as the FS Buffer Cache. So most DBAs don’t dare to set Oracle Buffer Caches bigger than 2-3 GB and don’t dare to use Raw Devices. So the FS cache is used very heavily. It is not uncommon to see that on Oracle Database uses 2 to 10 more caching in the FS than the Oracle Buffer Cache. I have seen a system that used 20 to 40 times more caching the FS than the Oracle Buffer Cache.

So just imagine what happens if one than bypasses the FS Buffer Cache</p />

    	  	<div class=

IO performance: Oracle 8/6130 SAN/Veritas Volume Manager.

I had a look a system here in the Netherlands for a company that was having severe I/O performance problems. They tried to switch to DirectIO (filesystemset_option=setall), but they discovered that the IO performance got worse. So they quickly turned it back to ASYNC. The System Admins told the DBAs several times that there were no OS or File System issues. However they did recently migrate from one SAN to another with Volumne Manager Mirrorring. And the local was remotely mirrored, supposedly in asynchronous mode.

So then I had a look at the system. The system has 24 CPUs, 57GB of internal memory and 14 Oracle databases running on it. The two most important databases did according to statspack around 1700 I/Os per second. So I did a quickscan of the Physical Reads per SQL statement and found that some of the statements were missing indexes (based on Tapios Rules of Engagement), I also noticed that the buffer caches were really small.

After adding some indexes and increasing the buffer cache(s) it was discovered that the writes were still a bit (understatement) slow. To get the focus of the database and more onto the system and OS, I decided to use the test program for writes from Jonathan Lewis. The tests performed were 1K, 10000 blocks sequential writes, 8K, 10000 blocks random writes and 16K, 10000 blocks sequential writes. The tests were run on different mounts points and the interesting things were observered. The tests also performed slow on the database mount points. So the database was no longer the root of problem, something else was. Now the system administrators had to pay attention</p />

    	  	<div class=

Google Sync for Windows Mobile for Contacts and Calendar

Just started to use this new Google feature and installed for my Google Apps domain (www.miraclebenelux.nl) and it works great. I can now sync my Contacts and Google Calendar automatically. It uses the Active Sync utility from Windows Mobile and you have to enable this feature for your Google Apps Domain.

Now the only thing to do is the Tasks list from the mail view and I am all set</p />

    	  	<div class=