Search

Top 60 Oracle Blogs

Recent comments

Index Hash

I’m afraid this is one of my bad puns again – an example of the optimizer  making a real hash of the index hash join. I’m going to create a table with several indexes (some of them rather similar to each other) and execute a query that should do an index join between the obvious two indexes. To show how obvious the join should be I’m going to start with a couple of queries that show the cost of simple index fast full scans.

Here’s the data generating code:

create table t1
as
with generator as (
	select	--+ materialize
		rownum id
	from dual
	connect by
		level <= 10000
)
select
	rownum			id,
	mod(rownum-1,20)	flag,
	lpad(rownum,10,'0')	v10,
	lpad(rownum,20,'0')	v20,
	lpad(rownum,30,'0')	v30,
	rpad('x',100)		padding
from
	generator	v1,
	generator	v2
where
	rownum <= 100000
;

begin
	dbms_stats.gather_table_stats(
		ownname		 => user,
		tabname		 =>'T1',
		estimate_percent => 100,
		method_opt	 => 'for all columns size 1'
	);
end;
/
alter table t1 add constraint t1_pk primary key(id)
	using index (create index t1_pk on t1(id))
;

create index t1_flag on t1(flag);

create index t1_ia on t1(id, v20);
create index t1_ib on t1(id, v10);
create index t1_ic on t1(id, v30);

select
	index_name, leaf_blocks
from
	user_indexes
where
	table_name = 'T1'
order by
	index_name
;

/*  output from the query */
/*
INDEX_NAME           LEAF_BLOCKS
-------------------- -----------
T1_FLAG                      195
T1_IA                        515
T1_IB                        375
T1_IC                        657
T1_PK                        222

*/

Given the definitions of the primary key index and the three indexes that start with the ID column their relative sizes shouldn’t surprise you. The cost of an index fast full scan on these indexes will depend on your parameter settings and values for system stats, here are the figures from one system  (from autotrace) running 12.1 – the behaviour is consistent across several versions:


select /*+ index_ffs(t1 t1_pk) */ count(*) from t1;
select /*+ index_ffs(t1 t1_flag) */ count(*) from t1 where flag is not null;
select /*+ index_ffs(t1 t1_ia) */ count(*) from t1;
select /*+ index_ffs(t1 t1_ib) */ count(*) from t1;
select /*+ index_ffs(t1 t1_ic) */ count(*) from t1;

-- with autotrace results:

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |    63   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_PK |   100K|    63   (2)| 00:00:01 |
-----------------------------------------------------------------------

---------------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |     3 |    56   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE       |         |     1 |     3 |            |          |
|*  2 |   INDEX FAST FULL SCAN| T1_FLAG |   100K|   292K|    56   (2)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("FLAG" IS NOT NULL)

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |   142   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_IA |   100K|   142   (1)| 00:00:01 |
-----------------------------------------------------------------------

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |   104   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_IB |   100K|   104   (1)| 00:00:01 |
-----------------------------------------------------------------------

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |   181   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_IC |   100K|   181   (1)| 00:00:01 |
-----------------------------------------------------------------------

If you compare the different costs of the fast full scans they’re consistent with the different sizes (leaf_blocks) of the indexes; so you might expect the following query to do either a tablescan or an index join between the t1_flag index and the t1_pk index (which is the smallest candidate index to find the id column):


select	sum(id)
from
	t1
where
	flag = 0
;

But here’s the plan I got:


--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |     1 |     8 |   528   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE         |                  |     1 |     8 |            |          |
|*  2 |   VIEW                  | index$_join$_001 |  5000 | 40000 |   528   (1)| 00:00:01 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN    | T1_FLAG          |  5000 | 40000 |    10   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| T1_IA            |  5000 | 40000 |   646   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("FLAG"=0)
   3 - access(ROWID=ROWID)
   4 - access("FLAG"=0)

Four things to notice:

  1. The optimizer has picked the wrong index
  2. The fast full scan of t1_ia is 646 in this plan when (on its own) it was only 142
  3. The cost of the whole query is less than the cost of one of the lines
  4. The index chosen looks as if it might have been selected on the basis of alphabetical order

Oops.

Fortunately, of course, we can always add hints to get the right plan – so let’s try this – and this time the plan is what I got by using explain plan followed by a call to dbms_xplan() with the ‘outline’ option:

explain plan for
select
	/*+
		qb_name(main)
		index_join(@main t1 t1_flag t1_pk)
	*/
	sum(id)
from
	t1
where
	flag = 0
;

select * from table(dbms_xplan.display(null,null,'outline alias'));
--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |     1 |     8 |   528   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE         |                  |     1 |     8 |            |          |
|*  2 |   VIEW                  | index$_join$_001 |  5000 | 40000 |   528   (1)| 00:00:01 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN    | T1_FLAG          |  5000 | 40000 |    10   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| T1_IA            |  5000 | 40000 |   646   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - SEL$998059AF / T1@MAIN
   3 - SEL$998059AF
   4 - SEL$998059AF / indexjoin$_alias$_001@SEL$998059AF
   5 - SEL$998059AF / indexjoin$_alias$_002@SEL$998059AF

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX_JOIN(@"MAIN" "T1"@"MAIN" ("T1"."FLAG") ("T1"."ID" "T1"."V20"))
      OUTLINE(@"MAIN")
      OUTLINE_LEAF(@"MAIN")
      OUTLINE_LEAF(@"SEL$998059AF")
      ALL_ROWS
      DB_VERSION('12.1.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Ouch – the optimizer has ignored the hint and is still using the wrong index.
Here’s something really odd, though – and I’ll get around to looking at the 10053 eventually – let’s add an (undocumented) outline_leaf() hint to the query, a hint that is already in the Outline Data:

explain plan for
select
	/*+
		qb_name(main)
		outline_leaf(@main)
		index_join(@main t1 t1_flag t1_pk)
	*/
	sum(id)
from
	t1
where
	flag = 0
;

select * from table(dbms_xplan.display(null,null,'outline alias'));
--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |     1 |     8 |   235   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE         |                  |     1 |     8 |            |          |
|*  2 |   VIEW                  | index$_join$_001 |  5000 | 40000 |   235   (1)| 00:00:01 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN    | T1_FLAG          |  5000 | 40000 |    10   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| T1_PK            |  5000 | 40000 |   280   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - MAIN
   2 - SEL$998059AF / T1@MAIN
   3 - SEL$998059AF
   4 - SEL$998059AF / indexjoin$_alias$_001@SEL$998059AF
   5 - SEL$998059AF / indexjoin$_alias$_002@SEL$998059AF

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      INDEX_JOIN(@"MAIN" "T1"@"MAIN" ("T1"."FLAG") ("T1"."ID"))
      OUTLINE(@"MAIN")
      OUTLINE_LEAF(@"MAIN")
      OUTLINE_LEAF(@"SEL$998059AF")
      ALL_ROWS
      DB_VERSION('12.1.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("FLAG"=0)
   3 - access(ROWID=ROWID)
   4 - access("FLAG"=0)

We get the plan we want, and it’s cheaper than the default one. It does still suffer from the problem that the cost of the fast full scan is larger than it should be (it seems to be the cost of an index fast full scan plus the cost of an index full scan) and the cost of the whole plan is still less than the cost of that one line.

There have been a number of cases where I’ve thought that the optimizer hasn’t chosen an index join when it was a sensible choice – this is probably one of the reasons why.