Search

Top 60 Oracle Blogs

Recent comments

Logical tuning

Here’s a model of a problem I solved quite recently at a client site. The client’s query was much more complex and the volume of data much larger, but this tiny, two table, example is sufficient to demonstrate the key principle. (Originally I thought I’d have to use three tables to model the problem, which is why you may find my choice of table names a little odd). I ran this example on 11.2.0.2 – which was the client version:

create table t2
as
select
	rownum					id2,
	trunc(dbms_random.value(0,1000))	n2a,
	trunc(dbms_random.value(0,1000))	n2b,
	lpad(rownum,6,'0')			vc2,
	lpad('x',100,'x')			padding
from
	all_objects
where
	rownum <= 5000;

alter table t2 add constraint t2_pk primary key(id2, n2a);

create table t3
as
select
	rownum					id3,
	trunc(dbms_random.value(0,1000))	n3a,
	trunc(dbms_random.value(0,1000))	n3b,
	lpad(rownum,6,'0')			vc3,
	lpad('x',100,'x')			padding
from
	all_objects
where
	rownum <= 5000;

alter table t3 add constraint t3_pk primary key(n3a, n3b, id3);

-- now collect stats on the table and execute this query (with autotrace enabled)

select
	*
from
	t2
where
	not exists (
		select	/*+ unnest */
			null
		from 	t3
		where	n3a = n2a
		and	n3b = n2b
		and	(id3 = id2 or id3 = id2 + 1000)
	)
;

select * from table(dbms_xplan.display);

You’ll note that I’ve included the /*+ unnest */ hint in the subquery because I want Oracle to run this as a hash anti-join; and it would appear to be legal (and simple) to do this given the various not null contraints and primary key information. In fact the plan uses a filter subquery:

------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost  |
------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |  5000 |   585K|  5015 |
|*  1 |  FILTER            |       |       |       |       |
|   2 |   TABLE ACCESS FULL| T2    |  5000 |   585K|    15 |
|*  3 |   INDEX RANGE SCAN | T3_PK |     1 |    12 |     2 |
------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( NOT EXISTS (SELECT /*+ UNNEST */ 0 FROM "T3" "T3" WHERE
              "N3B"=:B1 AND "N3A"=:B2 AND ("ID3"=:B3 OR "ID3"=:B4+1000)))
   3 - access("N3A"=:B1 AND "N3B"=:B2)
       filter("ID3"=:B1 OR "ID3"=:B2+1000)

I suspect that the optimizer code bypasses the anti-join because of the (carefully bracketed) disjunct (OR) predicate. On the client site this resulted in the subquery being executed 9 million times, reducing an intermediate data set from 9M rows to 2M rows at a cost of 27 million buffer visits and about 60 CPU seconds. Fortunately I was able to dredge up a little bit of propositional calculus and quote the following equivalence:

        not( A or B ) <=> (not A and not B)

Equally fortunately I didn’t have to worry about three-valued logic (all relevant columns were declared not null), so I was able to rewrite the query in the form:

select
	*
from
	t2
where
	not exists (
		select	null
		from 	t3
		where	n3a = n2a
		and	n3b = n2b
		and	id3 = id2
	)
and	not exists (
		select	null
		from 	t3
		where	n3a = n2a
		and	n3b = n2b
		and	id3 = id2 + 1000
	)
;

With this code Oracle did two unnests and converted to hash anti-joinsin both cases (at least, that’s what happened on the client site – my small sample switched to nested loop anti-joins):

-------------------------------------------------------------
| Id  | Operation           | Name  | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |    50 |  7200 |    15 |
|   1 |  NESTED LOOPS ANTI  |       |    50 |  7200 |    15 |
|   2 |   NESTED LOOPS ANTI |       |  4999 |   644K|    15 |
|   3 |    TABLE ACCESS FULL| T2    |  5000 |   585K|    15 |
|*  4 |    INDEX UNIQUE SCAN| T3_PK |     1 |    12 |       |
|*  5 |   INDEX UNIQUE SCAN | T3_PK |  5000 | 60000 |       |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2"+1000)
   5 - access("N3A"="N2A" AND "N3B"="N2B" AND "ID3"="ID2")

The change to hash anti-joins was a huge benefit (the nested loop anti-join would have improved things for the client to a degree, but there’s not really an enormouse difference in some cases between a filter subquery and an equivalent nested loop anti/semi-join). In this case the query run time dropped from 95 seconds to 27 seconds – all of it CPU time.

Update: a check of the 10053 trace file for 10.2.0.3 shows the following:

SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:     SU bypassed: Invalid correlated predicates.
SU:   Validity checks failed.

On the other hand, it is possible to get unnesting and a hash anti-join with the predicate: id3 between id2 and id2 + 1000. (But see comment 6 and its reply)