Top 60 Oracle Blogs

Recent comments

August 2011

Are you getting the most out of your Exadata performance? Part 1

In almost all of the Exadata migration projects I’ve been part of, the client sees immediate speedup & performance increase when testing their workload on Exadata (of course, we’ve made sure that we do plan & execute the tasks right). However, my performance geek’s nature usually doesn’t allow to stop there and leave the client with just 2x or 3x performance increase. For data warehousing and reporting workloads, Exadata can do much better than just 2-3x performance increase!

Are you getting the most out of your Exadata performance? Part 1

In almost all of the Exadata migration projects I’ve been part of, the client sees immediate speedup & performance increase when testing their workload on Exadata (of course, we’ve made sure that we do plan & execute the tasks right). However, my performance geek’s nature usually doesn’t allow to stop there and leave the client with just 2x or 3x performance increase. For data warehousing and reporting workloads, Exadata can do much better than just 2-3x performance increase! 

This is why I will write this article series about Getting the Most out of your Exadata Performance. I will write a bunch of random articles, based on my experience and lessons learned – and some day I may consolidate it all into a more formal paper.

So, here’s the first article (PDF format).

Are you getting the most out of your Exadata performance? Part 1

In almost all of the Exadata migration projects I’ve been part of, the client sees immediate speedup & performance increase when testing their workload on Exadata (of course, we’ve made sure that we do plan & execute the tasks right). However, my performance geek’s nature usually doesn’t allow to stop there and leave the client with just 2x or 3x performance increase. For data warehousing and reporting workloads, Exadata can do much better than just 2-3x performance increase!

Oracle Linux 5 Update 7 Available…

Over the weekend Oracle Linux 5.7 was released, as noted here.

Upgrades follow the normal path. Do one of the following:

  • Boot from the DVD and upgrade that way.
  • Replace the “/etc/yum.repos.d/public-yum-el5.repo” file with the new version here, remembering to enable the appropriate entries.
  • Amend your existing “/etc/yum.repos.d/public-yum-el5.repo”, adding the following new entry.
    name=Oracle Linux $releasever - U7 - $basearch - base

If you picked either of the last two options, you can upgrade by issuing the “yum update” command when logged in as root.

I’ve upgraded a couple of VMs running Oracle and everything seems fine so far.



IOT Part 4 – Greatly Boosting Buffer Cache Efficiency

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
IOT5 – Primary Key issues..>

So far I have covered the basics of Index Organized Tables, created some examples and shown how IOTs can help reduce the cost of single row selects and then how they can greatly aid range scans. Follow the above links for details.

Now I’d like to show that the use of IOTs has the potential to make your block buffer cache (BBC) far more efficient. Going to disc is very,very slow compared to going to memory {NB solid state storage improves this situation but does not remove it}. The block buffer cache has always been critical to oracle SQL Select performance as it allows you to access data in memory rather than disc and in general the more block buffer cache you have the faster your system will be.
{I am of the opinion that the BBC is even more important now than ever. As hard discs get larger we are seeing fewer and fewer spindles per GB of storage and, in essence, disc storage is effectively getting slower – because more data is hosted on the same number of spindles and those spindles are not themselves getting faster – I digress, for more details see posts Big Discs are Bad and IOPs and Form Factors}

In the scenario I’ve covered in my previous posts on IOTs we have a system where child data is coming in for many parents every day for 100 days. With a heap table the data pours into the growing end of the table, usually a record or two per parent each day and no guarantee that if two records come in, they will be put into the same block.

So, when you select a child record for a parent you get the situation shown below:

For many systems, the Block Buffer Cache is holding a lot of data no queries asked for -collateral data

When oracle needs to collect a record from the table, it has to read the whole block. Oracle only reads in tablespace data in whole blocks. That record comes with many other records in it that you did not ask for or want. I refer to this as Collateral Data – innocent bystander data that has got pulled into the BBC just because it was in the same block as required data. The larger the block size, the more collateral data there is.

To get all the child records for the parent, you need to read all those individual table blocks holding one or two records of interest. For our 100 child records you will probably need to read in close to 100 table blocks. Your Block Buffer Cache is filling up with of blocks where only one row out of each block is “of interest”. If that is one row out of 80 in a block, you are effectively wasting 98.75% of the space that table takes up in the block buffer cache.

With an IOT the situation is very different. We have already seen in my previous post on reducing IO that for a range scan on the IOT, oracle does not need to go and collect records from blocks scattered throughout the table. It simply collects the IOT leaf blocks holding the relevant data. Not only does this require less IO, it also results in the fetched blocks mostly holding the required data. The percentage of collateral data is greatly reduced:

IOTs are a powerful tool in reducing collateral data and using the BBC more efficiently

Thus instead of 100 table blocks that mostly hold collateral data, you have 2 or 3 blocks holding mostly the data you are interested in. Your wastage, the collateral data, is about 33-50%. With my example tables from post IOT2, it is indeed an average of 2-3 IOT blocks holding all the data for a single parent and 100 heap table blocks holding the same data.

You can think of it another way.

With my IOT I use only 3% of the memory to cache a single set of records for a parent compared to that needed with a normal HEAP table.

Let’s extend that idea a little. Let’s say I have 100,000 customer and 5% of the customers are active.
Each customer has on average 500 * 200-byte activity records for 100K of data
Each 8K Heap Table block holds 40 records, a very inefficient IOT holds 20 records per block {I’ll go into the details of this in a later post, but I am being a little unfair on the IOT here}.
There is some grouping of records for the customer so that those 500 records are over 400 blocks.
The whole working set of 400 blocks * (100,000*0.05) customers *8K = 2,000,000K

Now replace the table with an IOT keyed on customer ID.
Those 400 records would be spread over (400/20)+2 blocks. The +2 is the start and end of the range.
That is 22 *(100,000*0.05) *8K = 110,000K

Both would also need the overhead of an index structure to be cached also, for the Heap table it is the Primary Key index, for the IOT it is the rest of the IOT structure.

So you would need 2GB or so of Block Buffer Cache dedicated to caching the working set held as a heap table and 110MB of Block Buffer Cache dedicated to caching the IOT equivalent.

How many of you spotted that the space needed to “cache the heap table working set” is actually twice the size of the table? Well, that is because with a 5% working set and 40 records per block, there is a fair chance that some of those Collateral Data records in each block are for the 5% working set. Over all, almost every block will hold data for two or three active users. This is one of the complications of working out how efficient you block buffer cache is likely to be and I’ll have to leave that to another post. In reality you would need to hold 99% of the heap table in memory to cache the whole working set, so still 1GB. The IOT is still far more efficient.

I’ll just finish by saying that on one project I worked on we would have needed several hundred GB of Block Buffer Cache to hold the working set of the main tables if held as heap tables. That volume of memory was simply not available. With IOTs this reduced to about 40GB. This was available. The majority of this working set was able to stay in their SGA and it meant that so long as the instance stayed up, that working set of data mostly stayed in cache. I was able to see on the live system that processing of the data for these active customers was mostly being supported by consistent gets and less than 1% disk reads.

So, by using IOTS I reduced consistent gets dramatically, made more efficient use of the block buffer cache and, as a result of that, reduced the number of physical reads needed to support the consistent gets.

A day at Cisco’s labs and an introduction to the UCS

I recently had the immense pleasure of visiting Cisco’s labs at Bedfont Lakes for a day of intensive information exchange about their UCS offering. To summarise the day: I was impressed. Even more so by the fact that there is more to come, I’m assuming a few more blogs posts about UCS will get published here after I had some time to benchmark it.

I knew about UCS from a presentation at the UKOUG user group, but it didn’t occur at the time which potential is behind the technology. This potential is something Cisco sadly fail to make clear on their website-which is very good once you understand the UCS concept as it gives you many details about the individual components.

I should stress that I am not paid or otherwise financially motivated to write this article, it’s pure interest in technology that made me write this blog post. A piece of good technology should be mentioned, and this is what I would like to do.

What is the UCS anyway?

When I mentioned to friends that I was going to see Cisco to have a look at their blade server offering I got strange looks. Indeed, Cisco hasn’t been known as a manufacturer of blades before, it’s only recently (in industry terms) that they entered the market. However instead of providing YABE (yet another blade enclosure), they engineered it quite nicely.

If you like, the UCS is an appliance-like environment you can use for all sorts of workloads. It can be fitted in a standard 42” Rack and currently consists of these components (brackets contain product designations for further reading):

  • Two (clustered) Fabric Interconnects (UCS 6120 or 6140 series) for 20 or 40 10G ports, with each port configurable as either uplinks into the core network or server links down to UCS chassis. These ports carry both Ethernet and FCoE traffic from the UCS chassis
  • Two Fabric Extenders (UCS 2100 series), which go into the blade enclosure and provide connectivity up to the Fabric Interconnects. Each UCS 2104 fabric extender (FEX) provides 40Gb bandwidth  to the Interconnect, controlled by QoS policies
  • Blade enclosures (UCS 5100 series), which contain 8 half-width or 4 full width blades
  • Different models of half-width and full-width UCS B-series blades providing up to 512G RAM and 7500 series Intel Xeon processors
  • 10GE Adapters which are Converged Network Adapters (CNA). In other words they can do Fibre Channel over Ethernet and non-storage Ethernet traffic

The Fabric Interconnects can take extension modules with Fibre Channel to link to a FC switch, there is no new technology introduced and existing arrays can be used. Also, existing fibre channel solutions can be used for backups.

Another of the interesting features is the management software, called UCS Manager. It’s integrated into the Fabric Interconnect using a few gigabyte of flash storage. Not only is it used to manage a huge number of blades, it can also stage firmware for each component. At a suitable time, the firmware can be upgraded in a rolling fashion except for the Fabric Interconnect (obviously), though the fabric interconnects can take advantage of the clustering functionality to ensure that complete firmware upgrades can be undertaken with a system-wide outage.

Fibre Channel over Ethernet

What I was very keen to learn about was the adoption of FCoE in UCS. Ever since it has been released, the UCS uses FCoE for storage inside the system. I can image that this must have been difficult to sell, since FCoE was a very young standard at the time, and still probably is.

For those of you who don’t know FCoE, it’s broadly speaking FC payloads in Ethernet frames. Since Ethernet was never designed to work like Fibre Channel, certain amendments had to be made to the 802.x standards. The so-modified Ethernet is often referred to as Data Centre Ethernet (DCE) or Converged Enhanced Ethernet (CEE). In a way, FCoE competes with established Fibre Channel and emerging ones such as iSCSI or even SRP for the future storage solution. History has shown that Ethernet is very resilient and versatile, it might well win the battle for unified connection-if implemented correctly. And by unified I mean network and storage traffic. I was told that the next generation UCS will not have dedicated Fibre Channel ports in the Fabric switches, all ports are unified. All you need is a small SFP to attach a fibre cable or 10G Ethernet.

The fabric Interconnects in the current version use traditional but aggregated 8G/s Fibre Channel to go to the storage.

Nice features

UCS introduces the idea of a service profile. This is probably the biggest differentiator between it and other blade solutions. A blade in the enclosure can take any role and configuration you assign to it. It took me a little while to understand this, but an analogy helped: think of a blade as something configurable similar to a VM: before you can put something on it, you first have to define it. Amongst the things you set are boot order (SAN boot is highly recommended, we’ll see why shortly), which VSAN to use, which VNICs to use in which VLAN etc. Instead of having to provide the same information over and over again, it’s possible to define pools and templates to draw this information from.

Technicalities set aside, once you define a service profile (let’s assume a RAC node for example), you assign this profile to a blade that’s in the enclosure. A few seconds later, you’ll see the blade boot from the storage array and you are done. If the SAN LUNs don’t contain a bootable operating system, you can us eth KVM to create one.

Another nice thing I think is the use of 10G Ethernet throughout. The two switches do not operate in Spanning Tree Mode, which would limit the uplink speed to 10G (one path)

There is obviously more, but I think this blog post has become longer than it should be. I might blog more about the system at a later stage, but not after going to add this final section:


The question that immediately springs to mind is: how does it compare to Exadata? Is it Exadata competition? Well, probably no. UCS is a blade system but it doesn’t feature Infiniband or zero copy/iDB protocol. It doesn’t come with its own more or less directly attached storage subsystem. It can’t do smart scans, create storage indexes, or do other cell offloading. It can’t do EHCC: all these are exclusive to Exadata.

This can be either good or bad for you, and the answer is of course “it depends”. If your workload is highly geared towards DSS and warehousing in general, and you have the requirement to go through TB of data quickly, then Exadata probably is the answer.

If you are in need of consolidating say your SPARC hardware on x86, a trend I see a lot, then you may not need Exadata, and in fact you might be better off waiting for Exadata to mature some more if you are really after it. UCS scores many points by not breaking completely with traditional data centre operations: you still use a storage array you connect the blades to. This makes provisioning of a test database as simple as cloning the production LUNs. Admittedly you get a lot of throughput from IPoIB as Exadata uses it, but I would doubt that an RMAN duplicate from active database is faster than creating clone on the storage array. UCS also allows you to use storage level replication such as SRDF of Continuous Access (or whichever name other vendor give it)


In summary, UCS is a well-engineered blade system using many interesting technologies. I am especially looking forward to FCoE multi-hop, which should be available with UCS 2. Imagine the I/O bandwidth you could get with all these aggregated links!


Am I an expert on UCS? Nope, not by a long. So it could be that certain things described in this blog post might be inaccurate. If you like, use the comment option below to make me aware of these!

Super 8…

I can see Super 8 getting really mixed reviews because it feels a little like The Goonies meet Cloverfield.

Personally, I liked it and could see past the odd cheesy bits, but I can also see why some would think it falls short. There is a very E.T. childlike feel to the film, which is hardly surprising as the lead characters are kids. There is also the Stand By Me coming of age aspect. All this is wrapped up with a bow made of some fantastic visual effects.

Cool film, but difficult to pinpoint the target audience they had in mind (in my opinion). I’m sure over the next few years I will watch and enjoy it several times, but I don’t think I will be buying it.



Multi-Column Joins, Expressions and 11g


I've already outlined in one of my previous posts that getting a reasonable cardinality estimate for multi-column joins can be tricky, in particular when dealing with correlated column values in the join columns.

Since Oracle 10g several "Multi-Column Join Cardinality" sanity checks have been introduced that prevent a multi-column join from producing too low join cardinalities - this is controlled via the "_optimizer_join_sel_sanity_check" internal parameter that defaults to true from 10g on.

I Do Believe In One-Size-Fits-All Solutions! Humor?

My recent posts about certain technical marketing motions about certain information technology has kept me awake at night. However, my readers span all time zones so I need to remain vigilant in handling email and blog comments. I need a one-size-fits-all solution. I think I’ll pop a couple of these:

Filed under: oracle

Something free and interesting sounding...

Tanel Poder is doing another 'secret' free webinar. I've known Tanel for quite a few years now and he is a pretty smart guy. The topics he is presenting on this time center around scanning of data - and all of them look very interesting. Some are Exadata related but most of them would be useful for anyone.