Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Exadata Workloads to Azure, Part II

In my last post, I discussed some of the unique challenges migrating Oracle workloads from Exadata to Azure posed.  Engineered systems are not your everyday lift and shift and are rarely simple.

Although I covered some focus areas for success, I’d like to get into the migration philosophical questions around cell offloading and IO.  cell information is referred to in the average Oracle 12c AWR report almost 350 times.  That’s a LOT of data to consider when migrating a workload to a server that won’t have cell nodes to OFFLOAD TO.

If cell nodes are creating a ton of different IO in Exadata and don’t exist in Azure, will it require IO in Azure?

Its not far off from the old saying, “If a tree falls in the forest and no one is around to hear it, does it make a sound?”

Why yes, yes it will require IO and a lot of it.

Its one of the more interesting areas on the subject of calculating IO in preparation for migration to Azure from Exadata.  Exadata offloads whenever it can-  Many Exadata specialists shrink the SGA with the purpose of forcing an offload because Exadata was engineered to perform well when it does so-  Why perform a task on the database nodes when you can force an offload and put a cell node to work?  Sometimes this work is efficient and provides great value, other times, especially such as a “single cell block read”- isn’t.  There’s also the question of the amount of IO produced and calculated by the offloading process itself.

The Power of the AWR Report

An AWR report breaks up this information in multiple parts and with subsequent versions of Exadata, AWR has been enhanced to provide us with more information about what part of an offloaded process creates IO.

The reader of the AWR should spend considerable time in the System Statistics section inspecting the cell IO breakdown.  This is very valuable to discover, not only if the database workload is offloaded efficiently, (it’s an engineered system, not a miracle worker, so if the database design or code is not built to use Exadata features, don’t expect it to…)

What Isn’t Coming to Azure

With the migration to Azure, there are certain features that may not impact cell IO estimates like you might assume:

  • Secondary cell nodes (servers)
  • Flash cache
  • Flash log
  • RAC “shipping” between nodes
  • Smart scans
  • Storage Indexes
  • HCC, (Compression)

Any of these processes need not only be identified by percent of IO workload, but estimates on what the post-migrated database will do without it.

Cell Nodes: Increases in scans on database nodes, needed increases in buffer cache, more activity between cluster nodes, (If RAC is retained).  What we must take into consideration is how much IO is created just to produce an offload and when its no longer an available solution, how much IO may not occur.

Flash Cache: The removal of Flash level IO rated feature removed from the scenario has to be considered, but when sizing is being performed, you must determine how much of it and how often was it being used by the workload.

Flash Log: Greater latency on the logwriter, (LGWR) writes.  These number often impact if I’m going to implement an ultra disk solution to house the redo logs.

RAC Shipping: Without cell nodes directly returning results to a node, all shipping of results will need to be shipped between nodes with all work being performed on the database nodes.

Smart Scans: Increase in IO if buffer cache isn’t increased significantly and without cell nodes, great IO for those scans that still exist.

Storage Indexes: These indexes only exist in memory-  without them being manually created, poor performance and increased IO must be expected.  A full assessment of what storage indexes exist and to ensure they exist in the migrated system is a must.

Hybrid Columnar Compression (HCC): This allows not only for highly compressed storage, but ability to read from compressed blocks, saving considerable IO.

Cell Data Breakdown

System statistics in the AWR report provide us a detailed wait report, broken down by statistics vs. just wait type or category.  One of the areas broken down in the report is cell usage, which for this post, is where we will focus on.

https://dbakevlar.com/wp-content/uploads/2020/02/exa_cell_dta-300x128.png 300w, https://dbakevlar.com/wp-content/uploads/2020/02/exa_cell_dta-768x327.png 768w, https://dbakevlar.com/wp-content/uploads/2020/02/exa_cell_dta-1536x654.png 1536w, https://dbakevlar.com/wp-content/uploads/2020/02/exa_cell_dta.png 1678w" sizes="(max-width: 800px) 100vw, 800px" />

**All data presented in this post has been masked.  Thanks for playing.

Let’s spend some time looking over some of these values and what they mean to someone migrating a database from Exadata onto Azure.

There’s a math to all of this when calculating the IO to figure out what needs to be considered and what shouldn’t.  Some of it is exact science and then, with a bigger picture taken into consideration, a bit of an art, too.

cell IO uncompressed byte:  Total IO that was returned in smart scans/offloading, with decompression by the cell nodes after storage index filtering and other optimization.  This is the natural number that is used for IO considerations for migrations, but we need to keep a few things in mind:

  • With an expectation to raise the SGA, there will be less table scans, which will result in less IO from the preference for Exadata to offload.
  • The Exadata enhancement to the optimizer for offloading preference will no longer exist once migrated to Azure-  this means the optimizer will be more likely to create different execution plans that have different IO.
  • Without HCC, data will have to be read uncompressed unless the customer licenses for it outside of Exadata. This is one licensing push I make for a customer leaving Exadata and going to Azure from Exadata-  get HCC!

cell physical IO bytes eligible for predicate offload:  This is the data eligible for storage indexing and is important as we identify the amount of physical indexing that will need to be supplemented once this feature is no longer in place.  Don’t skip this and think the performance will just be fine.  This is akin to moving a database without 50% or more of the indexes.  Not a good idea for performance.

We can even estimate the efficiency of the smart scans by performing the following from the numbers above:

Smart Scan Efficiency=100-((100/) * )

Using the example above with my numbers in my screenshot, we’d get the following:

100-((100/3099793357070)* 2750799353303)

Resulting in a value from our example of 88.7%

#ffffff; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">cell physical IO bytes saved by storage index:  This is the amount of IO that would have increased if the storage indexes didn’t exist.  If the database was simply moved to Azure without identifying the storage indexes, (columns, etc.) and recreated as physical indexes on the new hardware, this IO would have to be added to the totals and latency expected.

If the indexes will be created and if it was done the way I would as a DBA, there would be a full justification for the index to begin with-

  • Is it an index to create a vertical selection of data to query from creating significant IO?
  • Is there some optimization that can be done in the code or database design to eliminate the IO need/index?
  • Is there a feature in Azure that will benefit and improve the IO?

No matter what, this are indexes in memory that will exist on disk in Azure and have to be added to the total IO.

cell logical write IO requests: This one can be misleading, as it also includes requests to flash cache.  As flash cache has a tendency to be more used by OLTP type workloads and offloading is used by OLAP/DSS, it’s important to know the workload type and consider how the different loads may react to a migration from Exadata.

The Winners Are

Calculating IO is not going to be an exact science #ffffff; color: #333333; cursor: text; font-family: Georgia,'Times New Roman','Bitstream Charter',Times,serif; font-size: 16px; font-style: normal; font-variant: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-decoration: none; text-indent: 0px; text-transform: none; -webkit-text-stroke-width: 0px; white-space: normal; word-spacing: 0px;">when migrating off of an Exadata, but there is a calculation we can use to get pretty close using an AWR report with at least a one-week window:

IO Metrics Factor= type of disk in Azure vs. the disk IO chosen for Azure, most often equals 2.5-3.5.

((Total Throughput MB/s - 
((cell physical IO bytes eligible for predicate offload - cell partial write to flash cache - cell physical IO interconnect bytes #333333; text-transform: none; text-indent: 0px; letter-spacing: normal; font-size: 16px; font-style: normal; font-variant: normal; text-decoration: none; word-spacing: 0px; display: inline !important; white-space: normal; cursor: text; orphans: 2; float: none; -webkit-text-stroke-width: 0px; background-color: #ffffff;">+ cell physical IO bytes saved by storage index) 
/1024/1024)) 
#333333; text-transform: none; text-indent: 0px; letter-spacing: normal; font-size: 16px; font-style: normal; font-variant: normal; text-decoration: none; word-spacing: 0px; display: inline !important; white-space: normal; cursor: text; orphans: 2; float: none; -webkit-text-stroke-width: 0px; background-color: #ffffff;">* IO Metrics Factor) = MB Throughput to Run Oracle Database on Azure

Now keep in mind, I will have also submitted an optimization plan that will hopefully eliminate a significant amount of IO and other resource usage, but the above calculation is what I use to come up with my numbers when working with customers, at least initially.

As we’ve already discussed, Exadata is an engineered system and there are numerous features we need to take into consideration as part of the migration strategy that an impact the amount of IO necessary.  Cell Nodes are just one part of this and subsequent posts will dig into those other features.

 

 

:



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Exadata Workloads to Azure, Part II], All Right Reserved. 2020.

Interval Partition(s)

A quirky little feature of interval partitioning showed up on Twitter today – a parallel insert that would only use a single PX slave to do the inserting. With 1.1 billion rows and the option for running parallel 32 this made the loading process rather slower than it ought to have been.

Fortunately it’s quite easy to model (and work around) the oddity. So here’s a small data set and an empty partitioned table to work with:


rem
rem     Script:         pt_int_load_anomaly.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

create table t1 
nologging 
as
select 
        ao.* 
from 
        all_Objects ao, 
        (select rownum id from dual connect by level <= 20)
;

create table pt1
partition  by range (object_id) interval (1000000) (
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

I’ve created a table by copying all_objects 20 times which, for my little sandbox, has given me a total of about 1.2M rows. Then I’ve created an empty interval-partitioned clone of all_objects, with the first partition defined to hold all rows where the object_id is less than 1 (and there’s no object in the database that could match that criterion). I’ve defined the interval to be 1,000,000 and since the highest object_id in my database is about 90,000 the first partition that gets added to this table will be able to hold all the data from t1.

So now we try to do a parallel insert from t1 into pt1, and check the execution plan and parallel execution statistics:


set serveroutput off

insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select * from t1;

select * from table(dbms_xplan.display_cursor);

start pq_tqstat

Note how I’ve used the hint /*+ enable_parallel_dml */ (possible a 12c hint back-ported to 11.2.0.4) rather than using an “alter session”, it’s just a little convenience to be able to embed the directive in the SQL. The pq_tqstat script is one I published some time ago to report the contents of the session-specific dynamic performance view v$pq_tqstat immediately after running a parallel statement.

Here’s the plan:


SQL_ID  25hub68pf1z1s, child number 0
-------------------------------------
insert /*+ append enable_parallel_dml parallel(6) */ into pt1 select *
from t1

Plan hash value: 2888707464

-------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                            |          |       |       |   631 (100)|          |        |      |            |
|   1 |  PX COORDINATOR                             |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)                       | :TQ10001 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | P->S | QC (RAND)  |
|   3 |    LOAD AS SELECT (HIGH WATER MARK BROKERED)| PT1      |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     OPTIMIZER STATISTICS GATHERING          |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX RECEIVE                             |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND RANDOM LOCAL                  | :TQ10000 |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | P->P | RANDOM LOCA|
|   7 |        PX BLOCK ITERATOR                    |          |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWC |            |
|*  8 |         TABLE ACCESS FULL                   | T1       |  1235K|   159M|   631  (10)| 00:00:01 |  Q1,00 | PCWP |            |
-------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access(:Z>=:Z AND :Z<=:Z)

Note
-----
   - Degree of Parallelism is 6 because of hint

The most important detail of this plan is that the PX slaves do the load as select (operation 3), then send a message to the query coordinator (PX send QC, operation 2) to tell it about the data load. They do not send their data to the QC for the QC to do the load.

So the plan says we will be doing parallel DM, but here’s what v$pq_tqstat tells us:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                215880   34785363     17.47      16.86         16          0           0
                                             1 P007                202561   34436325     16.39      16.69         17          0           0
                                             1 P008                207519   34564496     16.79      16.75         17          0           0
                                             1 P009                208408   34594770     16.86      16.77         17          0           0
                                             1 P00A                198915   33529627     16.10      16.25         16          0           0
                                             1 P00B                202537   34430603     16.39      16.69         16          0           0
                      Consumer               1 P000                     0        144      0.00       0.00         51         47           0
                                             1 P001                     0        144      0.00       0.00         51         47           0
                                             1 P002               1235820  206340464    100.00     100.00         75         47           0
                                             1 P003                     0        144      0.00       0.00         51         47           0
                                             1 P004                     0        144      0.00       0.00       1138       1134           0
                                             1 P005                     0        144      0.00       0.00       1137       1133           0

                    1 Producer               1 P000                     0         24      0.00       5.91         51         42           0
                                             1 P001                     0         24      0.00       5.91         50         41           0
                                             1 P002                     2        286    100.00      70.44         58         14           0
                                             1 P003                     0         24      0.00       5.91         51         43           0
                                             1 P004                     0         24      0.00       5.91         51         42           0
                                             1 P005                     0         24      0.00       5.91         51         43           0
                      Consumer               1 QC                       2        406    100.00     100.00        311        179           0

19 rows selected.

The query did run parallel 6 as hinted – and 6 PX slaves scanned the t1 table; but they all sent all their data to one PX slave in the second slave set and that one PX slave did all the inserts. The plan was parallel, but the execution was effectively serial. (You’ll note there is something a little odd about the waits and timeout for p004 and p005 when they are acting as consumers. I may worry about that later, but it could be a host-based side effect of running parallel 6 on a VM with 4 CPUs).

The serialization leads to two questions

  1. What went wrong?
  2. How do we work around this and make the insert “truly” parallel

My answer to (1) is “I don’t know – but I’ll look at it if necessary” combined with the guess – it’s something to do with the table having only one partition at the outset and this has an unexpected side effect on the randomising function for the PX distribution.

My answer to (2) is “if I’m right about (1), why not try pre-defining two partitions, and I’ll even let both of them stay empty”.

So here’s my new definition for pt1:


create table pt1
partition  by range (object_id) interval (1000000) (
        partition p0 values less than (0),
        partition p1 values less than (1)
)
as
select  * 
from    all_Objects
where   rownum = 0
/

Re-running the test with the completely redundant, and permanently empty p0 partition the plan doesn’t change but the results from v$pq_tqstat change dramatically:


DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS           NUM_ROWS      BYTES ROW_SHARE DATA_SHARE      WAITS   TIMEOUTS AVG_LATENCY
---------- ---------- --------------- -------- --------------- ---------- ---------- --------- ---------- ---------- ---------- -----------
         1          0 Producer               1 P006                207897   34581153     16.82      16.76         23          4           0
                                             1 P007                215669   34786429     17.45      16.86         30          5           0
                                             1 P008                221474   36749626     17.92      17.81         28          5           0
                                             1 P009                204959   34497164     16.58      16.72         22          2           0
                                             1 P00A                177755   30141002     14.38      14.61         21          0           0
                                             1 P00B                208066   35585810     16.84      17.25         25          2           0
                      Consumer               1 P000                213129   35612973     17.25      17.26         82         57           0
                                             1 P001                200516   33570586     16.23      16.27         84         55           0
                                             1 P002                203395   33950449     16.46      16.45         83         56           0
                                             1 P003                205458   34235575     16.63      16.59         82         54           0
                                             1 P004                204111   33999932     16.52      16.48        581        555           0
                                             1 P005                209211   34971669     16.93      16.95        580        553           0

                    1 Producer               1 P000                     2        286     16.67      16.67        422        149           0
                                             1 P001                     2        286     16.67      16.67        398        130           0
                                             1 P002                     2        286     16.67      16.67        405        128           0
                                             1 P003                     2        286     16.67      16.67        437        161           0
                                             1 P004                     2        286     16.67      16.67        406        116           0
                                             1 P005                     2        286     16.67      16.67        440        148           0
                      Consumer               1 QC                      12       1716    100.00     100.00        242        111           0



19 rows selected.

Every consumer receives and inserts roughly 200,000 rows – it’s a totally fair parallel DML. Timings are pretty irrelevant for such a small data set but the excution time did drop from 7 seconds to 4 seconds when parallelism was working “properly”.

I’ve tested this script on 12.2.0.1 and 19.3.0.0 – the same anomaly appears in both versions though it might be worth noting that the strange skew in the waits and timeouts doesn’t appear in 19.3.0.0.

Cloud Control 13.4 : Silent Installation and Silent Upgrade

A little over a week ago Enterprise Manager Cloud Control 13.4 was released. The following weekend I spent 3 days running builds constantly trying to get a clean install to work. Eventually I tweeted out in frustration and a friendly face at Oracle, who I’ve stalked on numerous occasions, put me in touch with the EM dev team.

Having had a quick look at my Vagrant build, they suggested I unset the CLASSPATH environment variable, and a working build was born. Thanks very much to the EM dev team! Without them I would have spent days looking at it and would probably still have failed.

Installation

The resulting Vagrant build and an article about the silent installation of Cloud Control 13.4 can be found here.

One thing that still irks me somewhat is the documentation about the adaptive optimizer parameters. The documentation says the following.

“If your Management Repository is using Oracle Database 12.2 or higher, none of these parameters need to be set.”

This is not true, and you always get this error message.

“ERROR:
The following prerequisite check failed because the Oracle Database, where the Management Repository will be configured, does not meet the configuration requirements. Fix the issue manually based on the recommendation offered for this prerequisite, and click OK. For more details, check the logs: /u01/app/oracle/middleware/cfgtoollogs/oui/emdbprereqs
Prereq Name Recommendation
Check if all adaptive features parameters are unset All adaptive features parameters should be unset for improved SQL performance”

I even tried a GUI installation, in case there was a difference between the GUI and silent installations. There wasn’t.

The workaround is to amend a bunch of underscore parameters that are only meant to be necessary when running a patched version of Oracle database 12.1 as the repository database.

alter system set "_optimizer_nlj_hj_adaptive_join"= FALSE scope=both sid=''; 
alter system set "_optimizer_strans_adaptive_pruning" = FALSE scope=both sid='';
alter system set "_px_adaptive_dist_method" = OFF scope=both sid=''; 
alter system set "_sql_plan_directive_mgmt_control" = 0 scope=both sid='';
alter system set "_optimizer_dsdir_usage_control" = 0 scope=both sid=''; 
alter system set "_optimizer_use_feedback" = FALSE scope=both sid='';
alter system set "_optimizer_gather_feedback" = FALSE scope=both sid=''; 
alter system set "_optimizer_performance_feedback" = OFF scope=both sid='';

It’s not a show stopper, so I can live with it, but it’s annoying, and the documentation should be altered to reflect the reality.

Upgrade

The next challenge was to work through an upgrade from a previous release. I worked through this using a starting point of 13.3. I already had a vagrant build for 13.3, but I made a few changes to bring it up to date, and add some more disk space. I also renamed the directory structure to make things a little neater.

The upgrade itself was very similar to that of the previous version. You can find the article about the silent upgrade to 13.4 and the Vagrant build I used to test the upgrade here.

Now remember, this is a simple upgrade of a totally clean 13.3 build to 13.4, so I’m not saying this is an exhaustive test, and I’m not saying this is proof it will work for you.

Next Steps

The next challenge will be to try a real upgrade at work. Work is crazy at the moment, so I’m not sure how long I will have to wait before doing this.

Most of our kit is VMware virtual machines running Oracle Linux, and the Cloud Control server is no exception, so I can get a backup of the whole VM before the upgrade, and just restore back to that in case of a disaster.

An ideal place to be is to have your build scripted, including the reconfiguration of all your targets. After a previous “issue”, I went through our existing config and built the EMCLI scripts to replace it all. I *think* I can rebuild everything from scratch if I need to. We do all new agent installations, target discovery and setup using EMCLI now, so I think all the retrofitted stuff will work too, but I have to admit I’m kind-of scared to try. </p />
</p></div>

    	  	<div class=

Join Elimination bug

It is possible to take subquery factoring (common table expressions / CTEs) too far. The most important purpose of factoring is to make a complex query easier to understand – especially if you can identify a messy piece of text that is used in more than one part of the query – but I have seen a couple of patterns appearing that make the SQL harder to read.

  • In one pattern each table is given its own factored subquery holding the non-join predicates (and, possibly, filter subqueries) and then the main query is just a simple join of the factored subqueries with nothing but join (ON) predicates.
  • In another pattern each factored subquery consists of the previous subquery with one more table added to it, so every subquery is no more that a “two-table” query block and the final subquery is a simple “select from last_factored_subquery”.

Neither of these patterns is helpful – but today’s blog note is not going to be about going to extremes with subquery factoring; instead it’s an example of a fairly reasonable use of subquery factoring that ran into a “wrong results” bug.

Consider a system that collects data from some type of meters. Here’s a two-table definition for meters and meter readings:


create table meters (
        meter_id        number,
        meter_type      varchar2(10),  -- references meter_types
        date_installed  date,
        padding         varchar2(100),
        constraint met_pk primary key(meter_id)
)
;

create table meter_readings (
        meter_id        number,
        date_read       date,
        reading         number(10,3),
        padding         varchar2(100),
        constraint      mrd_pk primary key(meter_id, date_read),
        constraint      mrd_fk_met foreign key (meter_id) references meters
)
;

insert into meters
select
        1e6 + rownum,
        case mod(rownum,3)
                when 0 then 'A'
                when 1 then 'B'
                       else 'C'
        end,
        trunc(sysdate) - mod(rownum,5),
        rpad('x',100,'x')
from
        dual 
connect by 
        level <= 10
;

execute dbms_stats.gather_table_stats(null,'meters')

insert into meter_readings 
select
        met.meter_id,
        met.date_installed - v.id + 2,
        dbms_random.value,
        rpad('x',100,'x')
from
        meters met,
        (select rownum id from dual connect by level <= 4) v
;

commit;

execute dbms_stats.gather_table_stats(null,'meter_readings')

I’ve picked the obvious primary keys for the two tables and defined the appropriate referential integrity constraint – which means the optimzer should be able to choose the best possible strategies for any query that joins the two tables.

I’ve created a very small data set – a few meters installed in the last few days, and a few readings per meters over the last few days. So lets report the readings for the last 48 hours, and include in the output any meters that haven’t logged a reading in that interval.

Here’s the query I wrote, with its output, running on a 19.3 instance on 17th Feb 2020:


with mrd_cte as (
        select 
                meter_id, date_read, reading
        from 
                meter_readings
        where 
                date_read in (trunc(sysdate), trunc(sysdate)+1)
)
select
        met.meter_id, met.date_installed, mrd_cte.date_read, reading
from
        meters met
left join 
        mrd_cte
on      mrd_cte.meter_id = met.meter_id
;

  METER_ID DATE_INST DATE_READ    READING
---------- --------- --------- ----------
   1000001 16-FEB-20 17-FEB-20       .063
   1000002 15-FEB-20
   1000003 14-FEB-20
   1000004 13-FEB-20
   1000005 17-FEB-20 18-FEB-20        .37
   1000005 17-FEB-20 17-FEB-20       .824
   1000006 16-FEB-20 17-FEB-20       .069
   1000007 15-FEB-20
   1000008 14-FEB-20
   1000009 13-FEB-20
   1000010 17-FEB-20 17-FEB-20       .161
   1000010 17-FEB-20 18-FEB-20       .818

12 rows selected.

The query returns 12 rows – which SQL*Plus can report because it counts them as it fetches them so it can give you the total at the end of the query.

Of course, sometimes people write preliminary queries to find out how big the result set would be before they run the query to acquire the result set itself. In cases like that (where they’re just going to select a “count(*)” the optimizer may a choose different execution path from the base query – perhaps finding a way to do an index-only execution, and maybe eliminating a few table joins from the query. So let’s execute a count of the above query:


rem
rem     Script:         join_elimination_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem

with mrd_cte as (
        select 
                meter_id, date_read, reading
        from 
                meter_readings
        where 
                date_read in (trunc(sysdate), trunc(sysdate)+1)
)
select count(*) from (
        select
                met.meter_id, met.date_installed, mrd_cte.date_read, mrd_cte.reading
        from
                meters met
        left join 
                mrd_cte
        on      mrd_cte.meter_id = met.meter_id
)
;

  COUNT(*)
----------
        10

1 row selected.

You’ll have to take my word for it, of course, but no-one else was using this database while I was running this test, and no-one else has access to the schema I was using anyway. Unfortunately when I count the 12 rows instead of reporting them Oracle thinks there are only 10 rows. Oops!

Step 1 in investigating the problem – check the execution plans to see if there are any differences in the structure of the plan, the use of predicates, or the outline information. I won’t bother with the plan for the base query because it was very obvious from the count query where the problem lay.


-------------------------------------------------------------------
| Id  | Operation        | Name   | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT |        |       |     1 (100)|          |
|   1 |  SORT AGGREGATE  |        |     1 |            |          |
|   2 |   INDEX FULL SCAN| MET_PK |    10 |     1   (0)| 00:00:01 |
-------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$69B21C86")
      ELIMINATE_JOIN(@"SEL$00F67CF8" "METER_READINGS"@"SEL$1")
      OUTLINE(@"SEL$00F67CF8")
      MERGE(@"SEL$C43CA2CA" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$C43CA2CA")
      MERGE(@"SEL$D28F6BD4" >"SEL$E6E74641")
      OUTLINE(@"SEL$E6E74641")
      ANSI_REARCH(@"SEL$4")
      OUTLINE(@"SEL$D28F6BD4")
      MERGE(@"SEL$1" >"SEL$006708EA")
      OUTLINE(@"SEL$4")
      OUTLINE(@"SEL$006708EA")
      ANSI_REARCH(@"SEL$3")
      OUTLINE(@"SEL$1")
      OUTLINE(@"SEL$3")
      INDEX(@"SEL$69B21C86" "MET"@"SEL$3" ("METERS"."METER_ID"))
      END_OUTLINE_DATA
  */

This is the plan as pulled from memory by a call to dbms_xplan.display_cursor(). We note particularly the following: meter_readings doesn’t appear in the plan, there is no predicate section (and no asterisks against any of the operations that would tell us there ought to be some predicate information), and there’s a very revealing ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) in the outline information.

For some reason the optimizer has decided that it’s okay to remove meter_readings from the query (even though there may be many meter readings for each meter), so it was inevitable that it produced the wrong result.

Despite my opening note, this is not an issue with subquery factoring – it just looked that way when I started poking at the problem. In fact, if you rewrite the query using an inline view you get the same error, if you turn the inline view into a stored view you get the error, and if you turn the whole query into a simple (left) join with the date predicate as part of the ON clause you still get the error.

The problem lies somewhere in the join elimination transformation. If you go back to the outline information from the bad plan you’ll see the line: ELIMINATE_JOIN(@”SEL$00F67CF8″ “METER_READINGS”@”SEL$1”) – by changing this to NO_ELIMINATE_JOIN(…) and adding it to the main query block I got a suitable plan joining the two tables and producing the right result.

The problem appears in 12.2.0.1 and 19.5.0.0 (tested on livesql) – but does not appear in 12.1.0.2 or 11.2.0.4

There is a known bug associated with this problem:

Bug: 29182901
Abstract: WRONG COUNT WAS RETURNED WHEN _OPTIMIZER_ENHANCED_JOIN_ELIMINATION=TRUE
Query with Outer Join Returned a Wrong Result due to Join Elimination (Doc ID 29182901.8)

The bug is fixed in 20.1, with a fix that has been backported into the Jan 2020 patches for 19, 18, and 12.1

XS$NULL - Can we login to it and does it really have no privileges?

I have read on line about XS$NULL over the years and particularly the documentation that states that it has no privileges. The documentation states the following: An internal account that represents the absence of a user in a session. Because....[Read More]

Posted by Pete On 17/02/20 At 01:09 PM

MobaXterm 20.0 and KeePass 2.44

And in other news about things I’ve missed recently…

MobaXterm 20.0 was released a couple of days ago. It looks like they’ve switched across to the yearly naming like many other companies. </p />
</p></div>

    	  	<div class=

Video : Multitenant : Online Move of Datafiles in CDBs and PDBs

Today’s video is a quick look at online datafile moves in container databases (CDBs) and pluggable databases (PDBs).

If you’ve used this functionality in a non-CDB database, it’s going to look familiar, with a PDB-specific gotcha.

These articles discus moving and renaming files.

I’ve added this to my Multitenant YouTube playlist.

The star of today’s video is Todd Trichler, but he’s having the share the limelight with the top of Roel Hartman‘s head, and brief clips of John King and Debra Lilley on the video screen behind him.

Cheers

Tim…


Video : Multitenant : Online Move of Datafiles in CDBs and PDBs was first posted on February 17, 2020 at 8:41 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

SQLNET.EXPIRE_TIME and ENABLE=BROKEN

By Franck Pachot

.
Those parameters, SQLNET.EXPIRE_TIME in sqlnet.ora and ENABLE=BROKEN in a connection description exist for a long time but may have changed in behavior. They are both related to detecting dead TCP connections with keep-alive probes. The former from the server, and the latter from the client.

The change in 12c is described in the following MOS note: Oracle Net 12c: Changes to the Functionality of Dead Connection Detection (DCD) (Doc ID 1591874.1). Basically instead sending a TNS packet for the keep-alive, the server Dead Connection Detection now relies on the TCP keep-alive feature when available. The note mentions that it may be required to set (ENABLE=BROKEN) in the connection string “in some circumstances” - which is not very precise. This “ENABLE=BROKEN” was used in the past for transparent failover when we had no VIP (virtual IP) in order to detect a lost connection to the server.

I don’t like those statements like “on some platform”, “in some circumstances”, “with some drivers”, “it may be necessary”… so there’s only one solution: test it in your context.

My listener is on (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521))) and I will connect to it and keep my connection idle (no user call to the server).I trace the server (through the forks of the listener, found by pgrep with the name of listener associated with this TCP address) and color it in green (GREP_COLORS=’ms=01;32′):


pkill strace ; strace -fyye trace=socket,setsockopt -p $(pgrep -f "tnslsnr $(lsnrctl status "(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))" | awk '/^Alias/{print $2}') ") 2>&1 | GREP_COLORS='ms=01;32' grep --color=auto -E '^|.*sock.*|^=*' &

I trace the client and color it in yellow (GREP_COLORS=’ms=01;32′):


strace -fyye trace=socket,setsockopt sqlplus demo/demo@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" <<&1 | GREP_COLORS='ms=01;33' grep --color=auto -E '^|.*sock.*|^=*'

I’m mainly interested by the setsockopt() here because this is how to enable TCP Keep Alive.

(ENABLE=BROKEN) on the client

My first test is without enabling DCD on the server: I have nothing defined in sqlnet.ora on the server side. I connect from the client without mentioning “ENABLE=BROKEN”:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/001_sOt... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/001_sOt... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/001_sOt... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/001_sOt... 1665w" sizes="(max-width: 1024px) 100vw, 1024px" />
The server (green) has set SO_KEEPALIVE but not the client.

Now I run the same scenario but adding (ENABLE=BROKEN) in the description:


strace -fyye trace=socket,setsockopt sqlplus demo/demo@"(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))" <<&1 | GREP_COLORS='ms=01;33' grep --color=auto -E '^|.*sock.*|^=*'

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/002_4Ua... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/002_4Ua... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/002_4Ua... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/002_4Ua... 1802w" sizes="(max-width: 1024px) 100vw, 1024px" />

The client (yellow) has now a call to set keep-alive:


setsockopt(9, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0

However, as I’ll show later, this uses the TCP defaults:


[oracle@db195 tmp]$ tail /proc/sys/net/ipv4/tcp_keepalive*
==> /proc/sys/net/ipv4/tcp_keepalive_intvl <== 
75
==> /proc/sys/net/ipv4/tcp_keepalive_probes <== 
9
==> /proc/sys/net/ipv4/tcp_keepalive_time <== 
7200

After 2 hours (7200 seconds) of idle connection, the client will send a probe 9 times, every 75 seconds. If you want to reduce it you must change it on the client system settings. If you don’t add “(ENABLE=BROKEN)” the dead broken connection will not be detected before then next user call, after the default TCP timeout (15 minutes).

That’s only from the client when its connection to the server is lost.

SQLNET.EXPIRE_TIME on the server

On the server side, we have seen that SO_KEEPALIVE is set - using the TCP defaults. But, there, it may be important to detect dead connections faster because a session may hold some locks. You can (and should) set a lower value in sqlnet.ora with SQLNET.EXPIRE_TIME. Before 12c this parameter was used to send TNS packets as keep-alive probes but now that SO_KEEPALIVE is set, this parameter will control the keep-alive idle time (using TCP_KEEPIDL instead of the default /proc/sys/net/ipv4/tcp_keepalive_time).
Here is the same as my first test (without the client ENABLE=BROKER) but after having set SQLNET.EXPIRE_TIME=42 in $ORACLE_HOME/network/admin/sqlnet.ora

Side note: I’ve got the “do we need to restart the listener?” question very often about changes in sqlnet.ora but the answer is clearly “no”. This file is read for each new connection to the database. The listener forks the server (aka shadow) process and this one reads the sqlnet.ora, as we can see here when I “strace -f” the listener but the forked process is setting-up the socket.

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/003_v0n... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/003_v0n... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/003_v0n... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/003_v0n... 1662w" sizes="(max-width: 1024px) 100vw, 1024px" />

Here is the new setsockopt() from the server process:


[pid  5507] setsockopt(16127.0.0.1:31374]>, SOL_TCP, TCP_KEEPIDLE, [2520], 4) = 0
[pid  5507] setsockopt(16127.0.0.1:31374]>, SOL_TCP, TCP_KEEPINTVL, [6], 4) = 0
[pid  5507] setsockopt(16127.0.0.1:31374]>, SOL_TCP, TCP_KEEPCNT, [10], 4) = 0

This means that the server waits for 42 minutes of inactivity (the EXPIRE_TIME that I’ve set, here TCP_KEEPIDLE=2520 seconds) and then sends a probe. Without answer (ack) it re-probes every 6 seconds during one minute (the 6 seconds interval is defined by TCP_KEEPINTVL and TCP_KEEPCNT sets the retries to 10 times). We control the idle time with SQLNET.EXPIRE_TIME and then can expect that a dead connection is closed after one additional minute of retry.

Here is a combination of SQLNET.EXPIRE_TIME (server detecting dead connection in 42+1 minute) and ENABLE=BROKEN (client detecting dead connection after the default of 2 hours):

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/004_-qY... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/004_-qY... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/004_-qY... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/004_-qY... 1797w" sizes="(max-width: 1024px) 100vw, 1024px" />

tcpdump and iptable drop

The above, with strace, shows the translation of Oracle settings to Linux settings. Now I’ll translate to the actual behavior by tracing the TCP packets exchanged, with tcpdump:


sqlplus demo/demo@"(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))"
host cat $ORACLE_HOME/network/admin/sqlnet.ora
host sudo netstat -np  | grep sqlplus
host sudo netstat -np  | grep 36316
set time on escape on
host sudo tcpdump -vvnni lo port 36316 \&

“netstat -np | grep sqlplus” finds the client connection in order to get the port and “netstat -np | grep $port” shows both connections (“sqlplus” for the client and “oracleSID” for the server).

I have set SQLNET.EXPIRE_TIME=3 here and I can see that the server sends a 0-length packets every 3 minutes (connection at 14:43:39, then idle, 1st probe: 14:46:42, 2nd probe: 14:49:42…). And each time the client replied with an ACK and then the server knows that the connection is still alive:

Now I simulate a client that doesn’t answer, by blocking the input packets:


host sudo iptables -I INPUT 1 -p tcp --dport 36316 -j DROP
host sudo netstat -np  | grep 36316

Here I see the next probe 3 minutes after the last one (14:55:42) and then, as there is no reply, the 10 probes every 6 seconds:

At the end, I checked the TCP connections and the server one has disappeared. But the client side remains. That is exactly what DCD does: when a session is idle for a while it tests if the connection is dead and closes it to release all resources.
If I continue from there and try to run a query, the server cannot be reached and I’ll hang for the default TCP timeout of 15 minutes. If I try to cancel, I get “ORA-12152: TNS:unable to send break message” as it tries to send an out-of-bound break. SQLNET.EXPIRE_TIME is only for the server-side. The client detects nothing until it tries to send something.

For the next test, I remove my iptables rule to stop blocking the packets:


host sudo iptables -D INPUT 1

And I’m now running the same but with (ENABLE=BROKEN)


connect demo/demo@(DESCRIPTION=(ENABLE=BROKEN)(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=localhost)(PORT=1521)))
host sudo netstat -np  | grep sqlplus
host sudo netstat -np  | grep 37064
host sudo tcpdump -vvnni lo port 37064 \&
host sudo iptables -I INPUT 1 -p tcp --dport 37064 -j DROP
host sudo netstat -np  | grep 37064
host sudo iptables -D INPUT 1
host sudo netstat -np  | grep 37064

Here is the same as before: DCD after 3 minutes idle, and 10 probes that fail because I’ve blocked again with iptables:

As with the previous test, the server connection (the oracleSID) has been closed and only the client one remains. As I know that SO_KEEPALIVE has been enabled thanks to (ENABLE=BROKEN) the client will detect the closed connection:

17:52:48 is 2 hours after the last activity and probes 9 times every 1’15 according to the system defaults:


[oracle@db195 tmp]$ tail /proc/sys/net/ipv4/tcp_keepalive*
==> /proc/sys/net/ipv4/tcp_keepalive_intvl <==    TCP_KEEPINTVL
75
==> /proc/sys/net/ipv4/tcp_keepalive_probes <==     TCP_KEEPCNT
9
==> /proc/sys/net/ipv4/tcp_keepalive_time <==      TCP_KEEPIDLE

It was long (but you can change those defaults on the client) but finally, the client connection is cleared up (sqlplus not there in the last netstat).
Now, an attempt to run a user call fails immediately with the famous ORA-03113 because the client knows that the connection is closed:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/009_82V... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/02/009_82V... 768w" sizes="(max-width: 968px) 100vw, 968px" />

Just a little additional test to show ORA-03135. If the server detects and closes the dead connection, but before the dead connection is detected on the client, we have seen that we wait for a 15 minutes timeout. But that’s because the iptable rule was still there to drop the packet. If I remove the rule before attempting a user-call, the server can be reached (then no wait and timeout) and detects immediately that there’s no endpoint anymore. This raises “connection lost contact”.

In summary:

  • On the server, the keep-alive is always enabled and SQLNET.EXPIRE_TIME is used to reduce the tcp_keepalive_time defined by the system, because it is probably too long.
  • On the client, the keep-alive is enabled only when (ENABLE=BROKEN) is in the connection description, and uses the tcp_keepalive_time from the system. Without it, the broken connection will be detected only when attempting a user call.

Setting SQLNET.EXPIRE_TIME to a few minutes (like 10) is a good idea because you don’t want to keep resources and locks on the server when a simple ping can ensure that the connection is lost and we have to rollback. If we don’t, then the dead connections may disappear only after 2 hours and 12 minutes (the idle time + the probes). On the client-side, it is also a good idea to add (ENABLE=BROKEN) so that idle sessions that have lost contact have a chance to know it before trying to use them. This is a performance gain if it helps to avoid sending a “select 1 from dual” each time you grab a connection from the pool

And, most important: the documentation is imprecise, which means that the behavior can change without notification. This is a test on specific OS, specific driver, specific version,… Do not take the results from this post, but now you know how to check in your environment.

Cet article SQLNET.EXPIRE_TIME and ENABLE=BROKEN est apparu en premier sur Blog dbi services.

Char problems

The semantics of comparing character columns of different types can lead to some confusion, so before I get into the main body of this note here’s a little test based on a table with one row:


create table t1(c2 char(2), c3 char(3), vc2 varchar2(2), vc3 varchar2(3));

insert into t1 values ('XX','XX','XX','XX');
commit;

select count(*) c2_c3   from t1 where c2 = c3;
select count(*) c2_vc3  from t1 where c2 = vc3;
select count(*) c3_vc2  from t1 where c3 = vc2;
select count(*) c3_vc3  from t1 where c3 = vc3;

I’ve inserted one row, using the same value for every single column; then I’ve been counting the row(s) where various pairs of columns match. Which (if any) of the four queries return the value 1 and which (if any) return the value zero ?

To help you, here’s a quote from the relevant Oracle manual about character comparison semantics:

Blank-Padded and Nonpadded Comparison Semantics

With blank-padded semantics, if the two values have different lengths, then Oracle first adds blanks to the end of the shorter one so their lengths are equal. Oracle then compares the values character by character up to the first character that differs. The value with the greater character in the first differing position is considered greater. If two values have no differing characters, then they are considered equal. This rule means that two values are equal if they differ only in the number of trailing blanks. Oracle uses blank-padded comparison semantics only when both values in the comparison are either expressions of data type CHAR, NCHAR, text literals, or values returned by the USER function.

With nonpadded semantics, Oracle compares two values character by character up to the first character that differs. The value with the greater character in that position is considered greater. If two values of different length are identical up to the end of the shorter one, then the longer value is considered greater. If two values of equal length have no differing characters, then the values are considered equal. Oracle uses nonpadded comparison semantics whenever one or both values in the comparison have the data type VARCHAR2 or NVARCHAR2.

The first two queries return 1, the second two return zero.

  1. Query 1: c2 is blank padded to match c3 in length before the comparison, so the values are ‘XX {space}’
  2. Query 2: c2 is not padded, so the compared values are both ‘XX’
  3. Query 3: c3 is three characters long, vc2 is only 2 characters long and does not get padded to match c3
  4. Query 4: c3 is three characters long, vc3 is only 2 characters long and does not get padded to match c3

One interesting by-product of this example is this:

  • c3 = c2 and c2 = vc3 but c3 != vc3     whatever happened to transitive closure!

So we come to the point of the article, which is this:

Be very careful about using char() (or nchar) types in your tables – especially if you’re thinking of using columns of type [n]char() in join predicates (or predicates that don’t start life as join predicates but become join predicates through transitive closure).

Here’s an interesting bug that has appeared (I think) as a side effect of the need for blank-padded semantics. We start with two tables that I’ll be joining with a hash join – one of them will be a small table that will be used as the “build” table, the other will be (faked to look like) a large table that will be used as the “probe” table.


rem
rem     Script:         bloom_prune_char_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jan 2020
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table test_probe(status char(3)) partition by list(status) (partition st_1 values('00','BI'));

create table test_build(status char(2)); 

insert into test_build values('00');
insert into test_probe values('00');
insert into test_build values('BI');
insert into test_probe values('BI');

commit;
 
prompt  =====================================
prompt  Fake large table stats for test_probe
prompt  =====================================

exec dbms_stats.set_table_stats(null,'test_probe',numrows=>2000000);

spool bloom_prune_char_bug
set linesize 156
set pagesize 60

set serveroutput off

select  /*+ 
                gather_plan_statistics 
        */
        * 
from 
        test_build b,
        test_probe a 
where 
        a.status = b.status
;

select * from table(dbms_xplan.display_cursor(null,null,'projection partition allstats last'))
/


The two tables have a pair of matching rows – so the query should return two rows. But it doesn’t – it returns no rows, and the clue about why not is in the execution plan (which I’ve pulled from memory with lots of extra bits and pieces). Here’s the output from running this script (from the query onwards) on an instance of 12.2.0.1:


no rows selected


PLAN_TABLE_OUTPUT
-----------------------------------------------------------------
SQL_ID  2295z4p6m4557, child number 0
-------------------------------------
select /*+   gather_plan_statistics  */  * from  test_build b,
test_probe a where  a.status = b.status

Plan hash value: 177769189

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name       | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |            |      1 |        |       |       |      0 |00:00:00.01 |       7 |       |       |          |
|*  1 |  HASH JOIN               |            |      1 |   2000K|       |       |      0 |00:00:00.01 |       7 |  2078K|  2078K|  766K (0)|
|   2 |   PART JOIN FILTER CREATE| :BF0000    |      1 |      2 |       |       |      2 |00:00:00.01 |       7 |       |       |          |
|   3 |    TABLE ACCESS FULL     | TEST_BUILD |      1 |      2 |       |       |      2 |00:00:00.01 |       7 |       |       |          |
|   4 |   PARTITION LIST SINGLE  |            |      1 |   2000K|KEY(AP)|KEY(AP)|      0 |00:00:00.01 |       0 |       |       |          |
|   5 |    TABLE ACCESS FULL     | TEST_PROBE |      0 |   2000K|     1 |     1 |      0 |00:00:00.01 |       0 |       |       |          |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."STATUS"="B"."STATUS")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "B"."STATUS"[CHARACTER,2], "A"."STATUS"[CHARACTER,3]
   2 - INTERNAL_FUNCTION("B"."STATUS")[2], INTERNAL_FUNCTION("B"."STATUS")[2], "B"."STATUS"[CHARACTER,2]
   3 - "B"."STATUS"[CHARACTER,2]
   4 - (rowset=256) "A"."STATUS"[CHARACTER,3]
   5 - (rowset=256) "A"."STATUS"[CHARACTER,3]

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

The optimizer has used a Bloom filter to do partition pruning, and while we can see operation 4 reporting a “partition list single” operation using “and pruning” (AP), we can see that operation 5 reports zero starts. This is because the Bloom filter has been used to determine that there are no relevant partitions!

Looking down at the (rarely examined) projection information we can see why – operation 2 (the “part join filter create”) has a strange “Internal Function” in its projection, and references B.STATUS as character[2]. It looks as if the Bloom filter that identifies partitions has been built using a char(2) as the input to its hashing function – which is bad news when the resulting filter is used to check the hash values returned from the partition definition that is hash a char(3).

If my thoughts about the mismatch in how the Bloom filters for the build and probe tables are built then a test that would help to confirm the hypothesis would be disable Bloom filter pruning – which you can only do by setting a hidden parameter, possibly in a hint or SQL Patch):

select 
        /*+ 
                gather_plan_statistics 
                opt_param('_bloom_pruning_enabled','false') 
        */  
        * 
from 
        test_build b,
        test_probe a 
where
        a.status = b.status;

select * from table(dbms_xplan.display_cursor(null,null,'projection partition allstats last'))
/


ST STA
-- ---
00 00
BI BI

2 rows selected.


PLAN_TABLE_OUTPUT
------------------------------------------------------------------------
SQL_ID  9whuurpkm3wpw, child number 0
-------------------------------------
select  /*+   gather_plan_statistics
opt_param('_bloom_pruning_enabled','false')   subquery_pruning(a)  */
* from  test_build b,  test_probe a where  a.status = b.status

Plan hash value: 787868928

------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name       | Starts | E-Rows | Pstart| Pstop | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |            |      1 |        |       |       |      2 |00:00:00.01 |      45 |       |       |          |
|*  1 |  HASH JOIN             |            |      1 |   2000K|       |       |      2 |00:00:00.01 |      45 |  2078K|  2078K|  866K (0)|
|   2 |   TABLE ACCESS FULL    | TEST_BUILD |      1 |      2 |       |       |      2 |00:00:00.01 |       7 |       |       |          |
|   3 |   PARTITION LIST SINGLE|            |      1 |   2000K|     1 |     1 |      2 |00:00:00.01 |      38 |       |       |          |
|   4 |    TABLE ACCESS FULL   | TEST_PROBE |      1 |   2000K|     1 |     1 |      2 |00:00:00.01 |      38 |       |       |          |
------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."STATUS"="B"."STATUS")

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - (#keys=1) "B"."STATUS"[CHARACTER,2], "A"."STATUS"[CHARACTER,3]
   2 - (rowset=256) "B"."STATUS"[CHARACTER,2]
   3 - (rowset=256) "A"."STATUS"[CHARACTER,3]
   4 - (rowset=256) "A"."STATUS"[CHARACTER,3]

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

Adding the hint opt_param(‘_bloom_pruning_enabled’,’false’) to the query we get the right results and, of course, we can see that there is no operation in the execution plan to generate and use the Bloom filter that is probably causing the problem.

Conclusion

If you are going to use char() types in your tables, and if you are going to compare columns of type char() make sure that the columns are defined to be exactly the same length – or that you include an explicit cast() to guarantee that the shorter column appears to be the same length as the longer column.

Footnote

This bug appeared in my MOS “hot topics”email a couple of days ago as

Bug 27661222: WRONG RESULTS WITH PARTITION PRUNING COMPARING CHAR COLUMNS OF DIFFERENT LENGTH

Reading the bug note the problem is described as a bug in “AND pruning” with a workaround of setting the hidden parameter “_and_pruning_enabled” to false (possibly through the opt_param() hint). I suspect that the underlying problem may be the Bloom filter itself and that disabling Bloom filter pruning for the query may be a slightly less aggressive workaround.

The bug is reported as fixed in 20.1 – but you don’t need to upgrade just yet because, apart from the workarounds, there are various patches available back to 19.x and 12.2.

The sample script above is basically the example in the bug note with a few minor changes.

Update (March 2020)

Hot on the heels of the previous bug report, another one showed up today that related to problems with char() types and comparisons. The title of the bug isn’t an accurate description of the problem: Bug 29002488 : WRONG RESULT WITH A QUERY USING UNION ALL. However the note includes a test case which started life as a join to a union all.

The bug is reported as fixed in 20.1, but there are patches and backports into various versions of 12.2, 18c and 19c.

Update (March 2020 again)

Another related bug has shown up in my “hot topics” email: B28910498 : WRONG RESULT WITH JOIN TABLES WITH CHAR COLUMNS OF DIFFERENT LENGTH. Again fixed in 20.1; apparently introduced in 12.2.0.1.

In this case with a workaround of setting optimizer_features_enable back to 12.1.0.2 or 11.2.0.4, or adding the hint /*+ no_push_pred */ to the (invisible) query.  The last workaround suggests, however, that the problem may be older than 12.2.0.1, and setting OFE simply disables an enhancement to predicate pushing in 12.2 that allows the problem to become visible for the user’s query.

 

 

 

 

 

 

Oracle Database 20c : Cloud Preview, Docs and Desupport

A little while ago Dominic Giles tweeted about the release of an Oracle Database 20c preview on Oracle Cloud and the Oracle Database 20c documentation. Some lucky people have already deployed the 20c preview. </p />
</p></div>

    	  	<div class=