Search

Top 60 Oracle Blogs

Recent comments

NULL predicate

People ask me from time to time if I’m going to write another book on the Cost Based Optimizer – and I think the answer has to be no because the product keeps growing so fast it’s not possible to keep up and because there are always more and more little details that might have been around for years and finally show up when someone asks me a question about some little oddity I’ve never noticed before.

The difficult with the “little oddities” is the amount of time you could spend trying to work out whether or not they matter and if it’s worth writing about them. Here’s a little example to show what I mean – first the data set:


rem
rem     Script:         null_filter.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Dec 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0
rem             12.1.0.2
rem

create table t1
nologging
as
select  *
from    all_objects
where   rownum <= 50000 -- > comment to avoid wordpress format issue
;

insert into t1 select * from t1;
insert into t1 select * from t1;
insert into t1 select * from t1;
commit;

create index t1_i1 on t1(object_type, data_object_id, object_id, created);

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                cascade     => true,
                method_opt  => 'for all columns size 1'
        );
end;
/

It’s a simple data set with a single index. The only significant thing about the index is that the second column (data_object_id) is frequently null. This leads to a little quirk in the execution plans for a very similar pair of statements:


set serveroutput off
alter session set statistics_level = all;

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id = 20002
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

select
        object_name, owner
from
        t1
where
        object_type = 'TABLE'
and     data_object_id is null
and     object_id = 20002
and     created > trunc(sysdate - 90)
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

How much difference would you expect in the execution plans for these two queries? There is, of course, the side effect of the “is null” predicate disabling the “implicit column group” that is the index distinct_keys value, but in this case I’ve got a range-based predicate on one of the columns so Oracle won’t be using the distinct_keys anyway.

Of course there’s the point that you can’t use the equality operator with null, you have to use “is null” – and that might make a difference, but how ? Here are the two execution plan:


----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |      1 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |      1 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID"=20002 AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))

-------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |      1 |        |      0 |00:00:00.01 |       3 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |      1 |      1 |      0 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |      1 |      1 |      0 |00:00:00.01 |       3 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("OBJECT_TYPE"='TABLE' AND "DATA_OBJECT_ID" IS NULL AND "OBJECT_ID"=20002 AND
              "CREATED">TRUNC(SYSDATE@!-90))
       filter(("OBJECT_ID"=20002 AND "CREATED">TRUNC(SYSDATE@!-90)))

The query with the predicate “data_object_id is null” repeats the object_id and sysdate predicates as access predicates and filter predicates. This seems a little surprising and a potential performance threat. In the first query the run_time engine will hit the correct index leaf block in exactly the right place very efficiently and then walk along it supplying every rowid to the parent operator until it hits the end of the range.

With the “is null” plan the run-time engine will be checking the actual value of object_id and created for every index entry on the way – how much extra CPU will this use and, more importantly, might Oracle start with the first index entry where object_type = ‘TABLE’ and data_object_id is null and walk through every index entry that has that null checking for the correct object_id as it goes ?

That last question is the reason for running the query with rowsource execution stats enabled. The first query did a single physical read while the second didn’t have to, but the more important detail is that both queries did the same number of buffer gets – and there is, by the way, a set of eight rows where the object_id and data_object_id are  20,002, but they were created several years ago so the index range scan returns no rows in both cases.

Based on that comparison, how do we show that Oracle has not walked all the way from the first index entry where object_type = ‘TABLE’ and data_object_id is null checking every entry on the way or, to put it another way, has Oracle really managed to prune down the index range scan to the minimum “wedge” indicated by the presence of the predicates “OBJECT_ID”=20002 AND “CREATED”>TRUNC(SYSDATE@!-90) as access predicates?

Let’s just count the number of leaf blocks that might be relevant, using the sys_op_lbid() function (last seen here) that Oracle uses internally to count the number of leaf blocks in an index. First we get the index object_id, then we scan it to see how many leaf blocks hold entries that match our object_type and data_object_id predicates but appear in the index before our target value of 20,002:


column object_id new_value m_index_id

select
        object_id
from
        user_objects
where
        object_type = 'INDEX'
and     object_name = 'T1_I1'
;

select  distinct sys_op_lbid(&m_index_id, 'L', rowid)
from    t1
where   object_type    = 'TABLE'
and     data_object_id is null
and     object_id      < 20002
;


SYS_OP_LBID(159271
------------------
AAAm4nAAFAAACGDAAA
AAAm4nAAFAAACF9AAA
AAAm4nAAFAAACGCAAA
AAAm4nAAFAAACF/AAA
AAAm4nAAFAAACF+AAA
AAAm4nAAFAAACGFAAA
AAAm4nAAFAAACGEAAA
AAAm4nAAFAAACGGAAA

8 rows selected.


This tells us that there are 8 leaf blocks in the index that we would have to range through before we found object_id 20,002 and we would have seen 8 buffer gets, not 3 in the rowsource execution stats, if Oracle had not actually been clever with its access predicates and narrowed down the wedge of the index it was probing.

Bottom line: for a multi-column index there seems to be a difference in execution plans between “column is null” and “column = constant” when the column is one of the earlier columns in the index – but even though the “is null” option results in some access predicates re-appearing as filter predicates in the index range scan the extra workload is probably not significant – Oracle still uses the minimum number of index leaf blocks in the index range scan.