Lunchtime quiz

There was a question on OTN a few days ago asking the following question:

Here’s a query that ran okay on 11g, but crashed with Oracle error “ORA-01843: not a valid month” after upgrade to 12c; why ?

The generically correct answer, of course, is that the OP had been lucky (or unlucky, depending on your point of view) on 11g – and I’ll explain that answer in another blog posting.

That isn’t the point of this posting, though. This posting is a test of observation and deduction. One of the respondants in the thread had conveniently supplied a little bit of SQL that I copied and fiddled about with to demonstrate a point regarding CPU costing, but as I did so I thought I’d show you the following and ask a simple question.’


drop table T;

Create Table T
As
with
periods as (
                  Select 'January' period, 1 cal  From Dual
        union all Select 'February' period , 2 cal From Dual
        union all Select 'March' period , 3 cal From Dual
        union all Select 'April' period , 4 cal From Dual
        union all Select 'May'  period, 5 cal From Dual
        union all Select 'June' period, 6 cal From Dual
        union all Select 'July' period, 7 cal From Dual
        union all Select 'August' period, 8 cal From Dual
        union all Select 'September' period, 9 cal  From Dual
        union all Select 'October' period, 10 cal From Dual
        union all Select 'November' period, 11 cal From Dual
        Union All Select 'December' Period, 12 Cal From Dual
        Union All Select '13 Series' Period, Null Cal  From Dual
)
Select  Period,Cal
from periods;

prompt  ==================================
prompt  When we invoke below SQL it works.
prompt  ==================================

set autotrace on explain

select *
from    (
        select
                Period,
                Cal,
                to_date(Period || ', ' || 2014,'Month, YYYY') col1 ,
                to_date('November, 2014','Month, YYYY') col2
        From  T
        Where  Cal > 0
        )
;

prompt  ================================================
prompt  But when we add comparison operations , it fails
prompt  ================================================

select *
from    (
        select
                Period,
                Cal,
                to_date(Period || ', ' || 2014,'Month, YYYY')   col1,
                to_date('November, 2014','Month, YYYY')         col2
        From  T
        Where  Cal > 0
        )
where
        col1 >= col2
;

set autotrace off



All I’ve done is create a table then run and generate the execution plans for two queries – with a comment that if you try to run one query it will succeed but if you try to run the other it will fail (and raise ORA-01843). As far as the original supplier was concerned, both queries succeeded in 11g and the failure of the second one appeared only in 12c. In fact, for reasons that I won’t discuss here, it is POSSIBLE for the failure to appear in 11g as well, though not necessarily with this exact data set.

Here’s the COMPLETE output I got from running the code above on an 11.2.0.4 instance:



Table dropped.


Table created.

==================================
When we invoke below SQL it works.
==================================

PERIOD           CAL COL1      COL2
--------- ---------- --------- ---------
January            1 01-JAN-14 01-NOV-14
February           2 01-FEB-14 01-NOV-14
March              3 01-MAR-14 01-NOV-14
April              4 01-APR-14 01-NOV-14
May                5 01-MAY-14 01-NOV-14
June               6 01-JUN-14 01-NOV-14
July               7 01-JUL-14 01-NOV-14
August             8 01-AUG-14 01-NOV-14
September          9 01-SEP-14 01-NOV-14
October           10 01-OCT-14 01-NOV-14
November          11 01-NOV-14 01-NOV-14
December          12 01-DEC-14 01-NOV-14

12 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    12 |   228 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |    12 |   228 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAL">0)

Note
-----
   - dynamic sampling used for this statement (level=2)

================================================
But when we add comparison operations , it fails
================================================

PERIOD           CAL COL1      COL2
--------- ---------- --------- ---------
November          11 01-NOV-14 01-NOV-14
December          12 01-DEC-14 01-NOV-14

2 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    19 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T    |     1 |    19 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("CAL">0 AND TO_DATE("PERIOD"||', '||'2014','Month,
              YYYY')>=TO_DATE(' 2014-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


So this is the question. What’s the anomaly in this output ?

Bonus question: What’s the explanation for the anomaly ?

Answers:

If I had asked why the query might, or might not, crash – the answer would be about the order of predicate evaluation, and simply collecting stats (or not) might have made a difference. Ever since “system stats”  and “CPU costing” appeared the optimizer has been able to change the order in which it applies filter predicates to a table (there’s a pdf of an article of mine from Oracle magazine in the 9i / 10g timeline linked at this URL) .  In this case, applying the “cal > 0″ predicate first luckily eliminates the rows that would fail the second predicate. Since the effect is driven by the optimizer’s stats this type of failure could occur ANY TIME you have a predicate that requires coercion between types to take place – which is one reason why you see the injunctions to use the correct data types; and why, if you need coercion to work around incorrect data types you have to consider writing your own functions to trap and resolve the necessary errors raised by Oracle’s implicit conversion mechanisms.

For a quick sketch of the optimizer strategy, the arithmetic is roughly:  predicate A costs c1 and predicate B costs c2; if I apply predicate A to every row I have to apply predicate B to only N surviving rows; if I apply predicate B to every row I have to apply predicate A to M surviving rows; which is smaller: (input_rows * c1 + N * c2) or (input_rows * c2 + M * c1).

The answer to the question I actually asked is this, though: I stressed the fact that this was the COMPLETE output because, as Narenda highlighted in comment 7 below –  the first query shows a note about dynamic sampling and the second query does not. This is a little surprising; we don’t have stats on the table, and the two queries are different so we have to optimizer both of them.  In 12c, of course, it’s possible that the optimizer may have done something clever with statistics feedback (formerly cardinality feedback) and created an SQL directive – but even then we should have seen a note about that.

For the bonus question: given the second output doesn’t report dynamic sampling we should be curious why not – did the optimizer simply decide not to try, did it try then decide not to use the results for some reason, or is there some other reason.  The obvious next step is to look at the 10053 (optimizer) trace – where you find that the optimizer DID do dynamic sampling or rather, it tried to do dynamic sampling but the query generated to take the sample failed with Oracle error ORA-01843, as suggested by Chinar Aliyev in comment 9  and expanded by Mohamed Houri in comment 11.

The irony of the sampling problem (hinted by Chinar Aliyev in comment 10) is that you could be in a position where you have a large table and oracle picks a small sample which happens to miss any of the problem rows and then return a sample that persuades the optimizer to pick an execution plan that is bound to find a problem row; alternatively the SQL used to generate the sample might apply the predicate in an order that manages to eliminate the problem rows, while the final plan derived after sampling persuades the optimizer to use the predicate in the order B, A.