Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Validate FK

A comment arrived yesterday on an earlier posting about an enhancement to the truncate command in 12c that raised the topic of what Oracle might do to validate a foreign key constraint. Despite being sure I had the answer written down somewhere (maybe on a client site or in a report to a client) I couldn’t find anything I’d published about it, so I ran up a quick demo script to show that all Oracle does is construct a simple SQL statement that will do check the data – and then do whatever the optimizer does to produce the fastest possible plan.

Here’s the script – with a few variations to show what happens if you start tweaking features to change the plan.

rem
rem     Script:         validate_fk.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table parent 
as 
select  * 
from    all_Objects 
where   rownum <= 10000 -- > comment to avoid wordpress format issue
;

alter table parent add constraint par_pk primary key(object_id);

execute dbms_stats.gather_table_stats(null, 'parent', cascade=>true)


create table child 
as 
select par.* 
from    (select rownum from dual connect by level <= 10) v1, --> comment to avoid wordpress format issue
        parent par
; 

alter table child add constraint chi_fk_par foreign key(object_id) references parent enable novalidate; 
create index chi_fk_par on child(object_id); 
execute dbms_stats.gather_table_stats(null, 'child', cascade=>true)


-- alter table child modify object_id null;
-- alter table child parallel(degree 8);
-- alter session set "_fast_full_scan_enabled" = FALSE;
-- alter session set "_optimizer_outer_to_anti_enabled" = false;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 12';
alter table child modify constraint chi_fk_par validate;
alter session set events '10046 trace name context off;

All I’ve done is created parent table with a primary key, and a child table with 10 rows per parent. I’ve created a foreign key constraint on the child table, enabled it (so future data will be checked) but not validated it (so there’s no enforced guarantee that the existing data is correct). Then I’ve issued a command to validate the foreign key.

The flush of the buffer cache is to allow me to see the I/O that takes place and will also (usually) let me see some if there are any strange issues due to any recursive SQL Oracle. As you can see I’ve also got a couple of commented commands that might cause a couple of variations in behaviour.

Here’s the critical content from the output of the trace file summary from tkprof (in versions from 11.2.0.4 to 12.2.0.1):


select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3
from
 "TEST_USER"."CHILD" A , "TEST_USER"."PARENT" B where( "A"."OBJECT_ID" is not
  null) and( "B"."OBJECT_ID" (+)= "A"."OBJECT_ID") and( "B"."OBJECT_ID" is
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      0.01       0.02        241        373          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3      0.02       0.02        241        373          0           0

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  NESTED LOOPS ANTI (cr=373 pr=241 pw=0 time=21779 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000   INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=17753 us starts=1 cost=32 size=1700000 card=100000)(object id 104840)
     10000      10000      10000   INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=4494 us starts=10000 cost=0 size=49995 card=9999)(object id 104838)

As you can see, Oracle writes SQL for an outer join with an “is null” predicate on the outer table – which the optimizer converts to an anti-join, running a nested loop in this case. It’s an interesting little oddity that the code includes the predicate “A”.”OBJECT_ID” is not null given that the column is declared as not null – but this is presumably a developer deciding to re-use code even if it then includes a redundant predicate (which is effectively zero cost – since the optimizer can use transitive closure to eliminate it).

Given that Oracle has converted an outer join to an anti join I obviously had to check what would happen if I disabled this conversion by altering the “_optimizer_outer_to_anti_enabled” parameter to false. The optimizer obeyed the session setting with the following plan in the trace:

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  FILTER  (cr=373 pr=241 pw=0 time=226926 us starts=1)
    100000     100000     100000   NESTED LOOPS OUTER (cr=373 pr=241 pw=0 time=177182 us starts=1 cost=70 size=22000 card=1000)
    100000     100000     100000    INDEX FAST FULL SCAN CHI_FK_PAR (cr=224 pr=219 pw=0 time=40811 us starts=1 cost=32 size=1700000 card=100000)(object id 104848)
    100000     100000     100000    INDEX UNIQUE SCAN PAR_PK (cr=149 pr=22 pw=0 time=119363 us starts=100000 cost=0 size=49995 card=9999)(object id 104846)

The significant difference is in the CPU usage, of course, and to a degree the magnitude of the change is dictated by the pattern and distribution of the data. The number of CR gets hasn’t changed as the number of index probes jumps from 10,000 to 100,000 because Oracle will have pinned index blocks (There’s a very old article on my old website if you want to read more about buffer pins).

The original question was about the effect of a local session setting that disabled index fast full scans, and followed up with a question on parallelism. After seeing the effect of changing one optimizer parameter at the session level you probably won’t be surprised by the following two results.  First, when the only change I make is the setting of the “_index_fast_full_scan_enabled” parameter, and then when the only change is the declared parallelism of the child table.

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  MERGE JOIN ANTI (cr=240 pr=240 pw=0 time=120163 us starts=1 cost=247 size=22000 card=1000)
    100000     100000     100000   INDEX FULL SCAN CHI_FK_PAR (cr=218 pr=218 pw=0 time=20314 us starts=1 cost=222 size=1700000 card=100000)(object id 104852)
    100000     100000     100000   SORT UNIQUE (cr=22 pr=22 pw=0 time=81402 us starts=100000 cost=25 size=50000 card=10000)
     10000      10000      10000    INDEX FULL SCAN PAR_PK (cr=22 pr=22 pw=0 time=1185 us starts=1 cost=22 size=50000 card=10000)(object id 104850)


Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  PX COORDINATOR  (cr=15 pr=2 pw=0 time=722483 us starts=1)
         0          0          0   PX SEND QC (RANDOM) :TQ10001 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0    HASH JOIN ANTI BUFFERED (cr=0 pr=0 pw=0 time=0 us starts=0 cost=37 size=22000 card=1000)
         0          0          0     PX BLOCK ITERATOR (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0      TABLE ACCESS FULL CHILD (cr=0 pr=0 pw=0 time=0 us starts=0 cost=32 size=1700000 card=100000)
         0          0          0     PX RECEIVE  (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0      PX SEND BROADCAST :TQ10000 (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)
         0          0          0       PX SELECTOR  (cr=0 pr=0 pw=0 time=0 us starts=0)
         0          0          0        INDEX FAST FULL SCAN PAR_PK (cr=0 pr=0 pw=0 time=0 us starts=0 cost=4 size=50000 card=10000)(object id 104854)

In the first case my version of Oracle has switched to a merge anti-join with an index full scan (not FAST full scan). It’s interesting to note that the merge join anti hasn’t been as clever as the nested loop anti in avoiding probes of the second data source as it walks the foreign key index (note how starts=100000 in the SORT UNIQUE line).

In the second case all the work was done by the parallel query slaves – and the PX SELECTOR line tells you that this plan must have come from 12c. As you can see we’re still doing an anti-join but this time we do a parallel tablescan of the child table (as we haven’t enabled the index for parallel execution – if we had altered the index to parellel(degree 8) as well we would have seen a parallel index fast full scan instead of the parallel tablescan.)

Bottom line: the SQL executed to validate a foreign key constraint is essentially a join between the parent and child tables, Oracle will simply optimize that statement to the best of its abilities based on the current session settings. If you want to test on a clone (or accurate model) of the tables you may find that you can create an sql_patch that works (even though the necessary SQL will be optimised as SYS – though so far I’ve only tried this with a couple of variants of the parallel() hint on 12.2.0.1)

Footnote

If you were wondering what the three bind variables in the query were, this is the relevant extract from the 10046 trace file with bind variable tracing enabled:

 Bind#0
  oacdty=01 mxl=32(09) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c38  bln=32  avl=09  flg=05
  value="TEST_USER"
 Bind#1
  oacdty=01 mxl=32(05) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359c00  bln=32  avl=05  flg=05
  value="CHILD"
 Bind#2
  oacdty=01 mxl=32(10) mxlc=00 mal=00 scl=00 pre=00
  oacflg=18 fl2=0001 frm=01 csi=873 siz=32 off=0
  kxsbbbfp=7f3fbd359bc8  bln=32  avl=10  flg=05
  value="CHI_FK_PAR"

The values are the owner, table, and constraint names. (Though you have to modify the code a little to show that the last one is the constraint name and not the index name).

 

 

Data Guard: always set db_create_file_dest on the standby

The file name convert parameters are not dynamic and require a restart of the instance. An enhancement request was filled in 2011. I mentioned recently on Twitter that it can be annoying with Active Data Guard when a file on the primary server is created on a path that has no file name conversion. However, Ian Baugaard mentioned that there is a workaround for this specific case because db_create_file_dest is dynamic:

I’ve quickly created a 18c Data Guard configuration on the Oracle Cloud DBaaS to test it and here it is.

In the primary database and the standby database, here are the datafiles:

RMAN> report schema;
 
Report of database schema for database with db_unique_name ORCL_01
 
List of Permanent Datafiles
===========================
File Size(MB) Tablespace RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1 830 SYSTEM YES /u02/app/oracle/oradata/ORCL/system01.dbf
3 510 SYSAUX NO /u02/app/oracle/oradata/ORCL/sysaux01.dbf
4 60 UNDOTBS1 YES /u02/app/oracle/oradata/ORCL/undotbs01.dbf
5 340 PDB$SEED:SYSTEM NO /u02/app/oracle/oradata/ORCL/pdbseed/system01.dbf
6 620 PDB$SEED:SYSAUX NO /u02/app/oracle/oradata/ORCL/pdbseed/sysaux01.dbf
7 5 USERS NO /u02/app/oracle/oradata/ORCL/users01.dbf
8 200 PDB$SEED:UNDOTBS1 NO /u02/app/oracle/oradata/ORCL/pdbseed/undotbs01.dbf
12 340 PDB1:SYSTEM YES /u02/app/oracle/oradata/ORCL/PDB1/system01.dbf
13 620 PDB1:SYSAUX NO /u02/app/oracle/oradata/ORCL/PDB1/sysaux01.dbf
14 200 PDB1:UNDOTBS1 YES /u02/app/oracle/oradata/ORCL/PDB1/undotbs01.dbf
15 50 PDB1:USERS NO /u02/app/oracle/oradata/ORCL/PDB1/PDB1_users01.dbf
 
List of Temporary Files
=======================
File Size(MB) Tablespace Maxsize(MB) Tempfile Name
---- -------- -------------------- ----------- --------------------
1 33 TEMP 32767 /u04/app/oracle/oradata/temp/temp01.dbf
2 62 PDB$SEED:TEMP 32767 /u04/app/oracle/oradata/temp/pdbseed_temp012018-02-08_13-49-27-256-PM.dbf
4 62 PDB1:TEMP 32767 /u04/app/oracle/oradata/temp/temp012018-02-08_13-49-27-256-PM.dbf

The properties of the standby database define no DbFileNameConvert because the directory structure is supposed to be the same:

DGMGRL> show configuration
 
Configuration - fsc
 
Protection Mode: MaxPerformance
Members:
ORCL_01 - Primary database
ORCL_02 - Physical standby database
 
Fast-Start Failover: DISABLED
 
Configuration Status:
SUCCESS (status updated 45 seconds ago)
 
 
DGMGRL> show database verbose 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 0 seconds ago)
Apply Lag: 0 seconds (computed 0 seconds ago)
Average Apply Rate: 15.00 KByte/s
Active Apply Rate: 532.00 KByte/s
Maximum Apply Rate: 535.00 KByte/s
Real Time Query: ON
Instance(s):
ORCL
 
Properties:
DGConnectIdentifier = 'ORCL_02'
...
DbFileNameConvert = ''
LogFileNameConvert = 'dummy, dummy'
...
 
Log file locations:
Alert log : /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/alert_ORCL.log
Data Guard Broker log : /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/drcORCL.log
 
Database Status:
SUCCESS

You can see that Oracle defines a dummy log file name convert. This a good idea to avoid some RMAN duplicate issues.

On the standby server, I have no db_create_file_dest defined:

SQL> show parameter create%dest
 
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_create_file_dest string
db_create_online_log_dest_1 string .
db_create_online_log_dest_2 string
db_create_online_log_dest_3 string
db_create_online_log_dest_4 string
db_create_online_log_dest_5 string

Note that the Oracle Cloud DBaaS defines it. I’ve reset it for the purpose of this demo.

New filesystem on Primary server only

I create a new filesystem on the primary server:

[root@DG-dg01 opc]# mkdir /DATA ; chown oracle:dba /DATA

I create a datafile on this new filesystem:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> create tablespace FRANCK datafile '/DATA/franck.dbf' size 100M;
Tablespace created.

The apply is stuck:

DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 2 seconds ago)
Apply Lag: 11 seconds (computed 2 seconds ago)
Average Apply Rate: 16.00 KByte/s
Real Time Query: OFF
Instance(s):
ORCL
 
Database Error(s):
ORA-16766: Redo Apply is stopped
 
Database Status:
ERROR

The standby alert.log shows the error about the impossibility to create the datafile:

2018-07-06T08:04:59.077730+00:00
Errors in file /u01/app/oracle/diag/rdbms/orcl_02/ORCL/trace/ORCL_pr00_29393.trc:
ORA-01274: cannot add data file that was originally created as '/DATA/franck.dbf'
2018-07-06T08:04:59.111881+00:00
Background Media Recovery process shutdown (ORCL)

db_file_name_convert

The first idea is to set a db_file_name_convert, however, this requires an instance restart, which means downtime when you have sessions on the Active Data Guard standby:

DGMGRL> edit database 'ORCL_02' set property DbFileNameConvert='/DATA,/u02/app/oracle/oradata/ORCL';
Warning: ORA-16675: database instance restart required for property value modification to take effect
 
Property "dbfilenameconvert" updated
 
DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 2 seconds ago)
Apply Lag: 3 minutes 32 seconds (computed 2 seconds ago)
Average Apply Rate: 16.00 KByte/s
Real Time Query: OFF
Instance(s):
ORCL
Warning: ORA-16675: database instance restart required for property value modification to
take effect
Warning: ORA-16714: the value of property DbFileNameConvert is inconsistent with the member setting
 
Database Error(s):
ORA-16766: Redo Apply is stopped
 
Database Warning(s):
ORA-16853: apply lag has exceeded specified threshold
 
Database Status:
ERROR

db_create_file_dest

The solution is set db_create_file_dest which, on the standby, has higher priority than the convert:

SQL> alter system set db_create_file_dest='/u02/app/oracle/oradata';
System altered.

I restart the apply:

DGMGRL> edit database 'ORCL_02' set state=apply-on;
Succeeded.

No need to restart and future datafile creations will be created there. However, it is too late for this datafile as it has already been created as UNNAMED in the controlfile:

ORA-01186: file 18 failed verification tests
ORA-01157: cannot identify/lock data file 18 - see DBWR trace file
ORA-01111: name for data file 18 is unknown - rename to correct file
ORA-01110: data file 18: '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018'

Manual CREATE DATAFILE

Then I must manually create it, but I cannot do that while I am in standby_file_management=auto:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as '/u02/app/oracle/oradata/ORCL/franck.dbf';
alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as '/u02/app/oracle/oradata/ORCL/franck.dbf'
*
ERROR at line 1:
ORA-01275: Operation CREATE DATAFILE is not allowed if standby file management
is automatic.

This can be changed dynamically:

DGMGRL> edit database 'ORCL_02' set property StandbyFileManagement=manual;
Property "standbyfilemanagement" updated

And then the creation is possible:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database create datafile '/u01/app/oracle/product/18.0.0/dbhome_1/dbs/UNNAMED00018' as new;
Database altered.

You can see that because I have defined db_create_file_dest, I don’t need to name the datafile and create it as OMF with the ‘new’ keyword.

Now I can start the apply and it will resolve the gap:

DGMGRL> edit database 'ORCL_02' set state=apply-on;
Succeeded.
 
DGMGRL> show database 'ORCL_02';
 
Database - ORCL_02
 
Role: PHYSICAL STANDBY
Intended State: APPLY-ON
Transport Lag: 0 seconds (computed 3 seconds ago)
Apply Lag: 0 seconds (computed 3 seconds ago)
Average Apply Rate: 22.00 KByte/s
Real Time Query: ON
Instance(s):
ORCL
Warning: ORA-16675: database instance restart required for property value modification to take effect
Warning: ORA-16714: the value of property DbFileNameConvert is inconsistent with the member setting
 
Database Status:
WARNING

Do not forget to put back standby_file_management”to auto:

DGMGRL> edit database 'ORCL_02' set property StandbyFileManagement=auto;
Property "standbyfilemanagement" updated

So, now that db_create_file_dest is set, new datafiles will be created automatically as OMF (Oracle Managed Files), without caring about file name conversion:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter tablespace FRANCK add datafile '/DATA/franck2.dbf' size 100M;
Tablespace altered.

This is confirmed from the standby alert.log:

(4):Datafile 19 added to flashback set
(4):Successfully added datafile 19 to media recovery
(4):Datafile #19: '/u02/app/oracle/oradata/ORCL_02/7050211FE75F26FAE05392781D0AADAA/datafile/o1_mf_franck_fmybw332_.dbf'

Conclusion

Always define db_create_file_dest in the standby database so that datafiles will be created. Better to have them at the wrong place rather than stopping the apply. And anyway, if you don’t like the OMF names, and you are at least in 12c Enterprise Edition, you can change their name later with online move:

SQL> alter session set container=PDB1;
Session altered.
 
SQL> alter database move datafile '/u02/app/oracle/oradata/ORCL_02/7050211FE75F26FAE05392781D0AADAA/datafile/o1_mf_franck_fmybw332_.dbf' to '/u02/app/oracle/oradata/ORCL/franck2.dbf';
Database altered.

 

Cet article Data Guard: always set db_create_file_dest on the standby est apparu en premier sur Blog dbi services.

Historic Stats

If you want to examine historic object stats Oracle gives you a few procedures in the dbms_stats package to compare sets of stats captured at two different time periods, but there’s no view that you can query to get an idea of how a table’s stats have changed over time. This is a problem that can be addressed when you discover two things:

  • There are views to report pending table, index, column and histogram stats.
  • Pending stats are stored stored as “historic” stats with a future date.

Once you’ve spotted the second detail, you can acquire the SQL to generate the pending stats views:


SQL> select view_name from dba_views where view_name like 'DBA%PENDING%STAT%';

VIEW_NAME
--------------------------------------------------------------------------------------------------------------------------------
DBA_TAB_PENDING_STATS
DBA_IND_PENDING_STATS
DBA_COL_PENDING_STATS
DBA_TAB_HISTGRM_PENDING_STATS


SQL> set long 20000
SQL> set pagesize 0
SQL> select dbms_metadata.get_ddl('VIEW','DBA_TAB_PENDING_STATS','SYS') from dual;

  CREATE OR REPLACE FORCE NONEDITIONABLE VIEW "SYS"."DBA_TAB_PENDING_STATS" ("OW
NER", "TABLE_NAME", "PARTITION_NAME", "SUBPARTITION_NAME", "NUM_ROWS", "BLOCKS",
 "AVG_ROW_LEN", "IM_IMCU_COUNT", "IM_BLOCK_COUNT", "SCAN_RATE", "SAMPLE_SIZE", "
LAST_ANALYZED") AS
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
	 h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize, h.analyzet
ime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
	 h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate, h.samplesize,
	 h.analyzetime
  from	 sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime > systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
	 h.blkcnt, h.avgrln, h.im_imcu_count, h.im_block_count, h.scanrate,
	 h.samplesize, h.analyzetime
  from	sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
	sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
	tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime > systimestamp

Notice the critical predicate repeated across the UNION ALL: “and h.savtime > systimestamp” – all we have to do is change that to “less than or equal to” (or just delete it if we’re not fussy about reporting pending stats along with historic) then add a few columns reporting the available statistics and we can create a view that we can query for the historic stats.


rem
rem     Script:         optstat_table_history.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem
rem     Notes
rem     Have to be connected as SYS, or have directly
rem     granted privileges on the sys tables to do this
rem

create or replace force view sys.jpl_tab_history_stats (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
)
as
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
;

drop public synonym jpl_tab_history_stats;
create  public synonym jpl_tab_history_stats for sys.jpl_tab_history_stats;
grant select on jpl_tab_history_stats to public;

Now you can query historic stats for any schema and table.

If you don’t want to create a view in the sys schema (and don’t want to create a permanent object at all) you can always use subquery factoring as an easy way of editing the metadata into a suitable query. You have to be connected as a user with privileges (that could be through a role) to view the relevant sys tables, though.


with jpl_tab_history (
        owner, table_name, partition_name, subpartition_name,
        num_rows, blocks, avg_row_len, sample_size, last_analyzed
) as (
  select u.name, o.name, null, null, h.rowcnt, h.blkcnt, h.avgrln,
         h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.rowcnt, h.blkcnt,
         h.avgrln, h.samplesize, h.analyzetime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
    and  h.savtime <= systimestamp
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.rowcnt,
         h.blkcnt, h.avgrln, h.samplesize, h.analyzetime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
    and h.savtime <= systimestamp
)
select
        table_name, blocks, num_rows, avg_row_len, sample_size, last_analyzed
from
        jpl_tab_history
where
        owner = 'TEST_USER'
and     table_name = 'T1'
order by
        last_analyzed
;

TABLE_NAME               BLOCKS   NUM_ROWS AVG_ROW_LEN SAMPLE_SIZE LAST_ANAL
-------------------- ---------- ---------- ----------- ----------- ---------
T1                          188      10000         120	     10000 04-JUL-18
T1                          322      18192         118	     18192 06-JUL-18

Matching code for indexes, columns and histograms is left as an exercise to the interested reader.

PostgreSQL Active Session History (ash): welcome to the pg_active_session_history view (part of the pgsentinel extension)

Why active session history?

What if you could record and query an history of the active sessions? Would not it be useful for performance tuning activities?

With active session history in place you could have a look to the “near” past database activity. You could answer questions like:

  • What wait events type were taking most time?
  • What wait events were taking most time?
  • Which application name was taking most time?
  • What was a session doing?
  • What does a SQL statement wait for?
  • How many sessions were running in CPU?
  • Which database was taking most time?
  • Which backend type was taking most time?
  • On which wait event was the session waiting for?
  • And so on….

How does it look like?

Let’s have a look to the pg_active_session_history view (more details on how to create it later on):

postgres@pgu:~$ /usr/local/pgsql/bin/psql -c "\d pg_active_session_history"
                   View "public.pg_active_session_history"
      Column      |           Type           | Collation | Nullable | Default
------------------+--------------------------+-----------+----------+---------
 ash_time         | timestamp with time zone |           |          |
 datid            | oid                      |           |          |
 datname          | text                     |           |          |
 pid              | integer                  |           |          |
 usesysid         | oid                      |           |          |
 usename          | text                     |           |          |
 application_name | text                     |           |          |
 client_addr      | text                     |           |          |
 client_hostname  | text                     |           |          |
 client_port      | integer                  |           |          |
 backend_start    | timestamp with time zone |           |          |
 xact_start       | timestamp with time zone |           |          |
 query_start      | timestamp with time zone |           |          |
 state_change     | timestamp with time zone |           |          |
 wait_event_type  | text                     |           |          |
 wait_event       | text                     |           |          |
 state            | text                     |           |          |
 backend_xid      | xid                      |           |          |
 backend_xmin     | xid                      |           |          |
 top_level_query  | text                     |           |          |
 query            | text                     |           |          |
 queryid          | bigint                   |           |          |
 backend_type     | text                     |           |          |

You could see it as samplings of pg_stat_activity (one second interval) providing more information:

  • ash_time: the sampling time
  • top_level_query: the top level statement (in case PL/pgSQL is used)
  • query: the statement being executed (not normalised, as it is in pg_stat_statements, means you see the values)
  • queryid: the queryid of the statement (the one coming from pg_stat_statements)

Thanks to the queryid field you will be able to link the session activity with the sql activity (then addressing the point reported by Franck Pachot into this blog post: PGIO, PG_STAT_ACTIVITY and PG_STAT_STATEMENTS)

Installation

pgsentinel uses the pg_stat_statements extension (officially bundled with PostgreSQL) for tracking which queries get executed in your database.

So, add the following entries to your postgres.conf:

shared_preload_libraries = 'pg_stat_statements,pgsentinel'

# Increase the max size of the query strings Postgres records
track_activity_query_size = 2048

# Track statements generated by stored procedures as well
pg_stat_statements.track = all

restart the postgresql daemon and create the extension:

postgres@pgu:~$ /usr/local/pgsql/bin/psql -c "create extension pgsentinel;"
CREATE EXTENSION

Now, the pg_active_session_history view has been created and the activity is being recorded in it.

Let’s see the extension in action during a 2 minutes pgio run:

Remarks

  • No objects are created into the PostgreSQL instance (except the view). The data being queried through the pg_active_session_history view are fully in memory
  • You are able to choose the maximum number of records, once reached, then the data rotate (oldest records are aged out)

Next Steps

  • The library and the source code will be available soon on github in the pgsentinel github repository (under the GNU Affero General Public License v3.0): please do contribute!
  • Once the source code is published, more information on how to create the pgsentinel library will be available
  • Once the source code is published, more information on how to define the maximum of records will be available
  • The extension is named “pgsentinel” because more cool stuff will be added in it in the near future

Conclusion

At pgsentinel we really love performance tuning activity. We will do our best to add valuable performance tuning stuff to the postgresql community: stay tuned

My tmux scripts to script demos

When I did my first demo using tmux, I have written the following blog post to explain it: https://blog.dbi-services.com/using-tmux-for-semi-interactive-demos/. Since then I’ve done all my demos with this technique. My scripts and configuration files have evolved now I have uploaded the on GitHub to share them: https://github.com/FranckPachot/scripts/tree/master/tmux-demo

The README.md explains the pre-requisites (a recent version of Tmux), how to download the scripts quickly (I use this to get everything ready on a new Cloud host), and how to run it. The ‘demo’ alias starts tmux attached to the same session (in order to see it on the laptop and the beamer). The .vimrc defines the macros to run lines from the script file as tmux-send commands.

If you have questions, the best is to ask them on Twitter:

 

Cet article My tmux scripts to script demos est apparu en premier sur Blog dbi services.

Data Modeling, Dates and DAX

Presenting data in the format to ease visualization is required for any BI product.  Power BI provides much of this with Data Analysis Expressions, (DAX).   As a DBA, I admit to cringing every time a reference was made how similar it is to functions in Excel or other non-database platforms.  I’m a DBA and I naturally am going to see data at a much larger, more complex level.  I love the simplicity of DAX, which granted me the ability to acquire basic skills using it in just a day, but considering Power BI’s ability to pull from multiple data sources, including SQL Server, Oracle, PostgreSQL and even JSON files, the comparison to Excel left me feeling, well, ‘meh.’  </p />
</p></div>

    	  	<div class=

18c: Order by in WITH clause is not preserved

For a previous post I’ve run on 18c a script of mine to get the V$MYSTAT delta values between two queries. This script (new version available on GitHub) generates the queries to store some values and subtract them on the next execution. But I had to fix it for 18c because I relied on some order by in a CTE which is lost in 18c.
The idea was to get the statistic names in a Common Table Expression (CTE):

with stats as (
select rownum n,stat_id,name from (select stat_id,name from v$statname where name in (&names) order by stat_id)
)

and query it from different parts of the UNION ALL which generates the script:

select 'select ' from dual
union all
select ' '||decode(n,1,' ',',')||'"CUR'||stat_id||'" - '||'&'||'LAG'||stat_id||' "DIF'||stat_id||'"' from stats
union all
select ' '||',nvl("CUR'||stat_id||'",0) "CUR'||stat_id||'"' from stats
union all

The rowum in the WITH clause is used later to add the comma for all rows except the first one – using decode(n,1,’ ‘,’,’)

But this relied on two assumptions:

  1. the WITH keeps the rows ordered
  2. the UNION ALL keeps the rows ordered

Of course, it would be better to add a number on each part and an ORDER BY at the end of the query. But for better readability, I didn’t.

However, the first assumption is wrong since 12.2 where the optimizer introduced In-Memory Cursor Duration Temp Tables. In the query above, the CTE is materialized because it is used multiple times. And this optimization keeps it in memory. But it seems that this structure does not keep the rows ordered.

Here is an example where I have a 4 rows table:

SQL> create table DEMO as select rownum id from xmltable('1 to 4');
Table DEMO created.
 
SQL> select * from DEMO order by id;
 
ID
----------
1
2
3
4

When I put the same query in a WITH clause, with its ORDER BY, and query it from 2 union all statements, the rows are not ordered anymore:

SQL> with cte as ( select * from DEMO order by id)
select rownum,id from cte
union all
select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 4
2 3
3 2
4 1
1 4
2 3
3 2
4 1

The execution plan shows the materialization of the CTE result, and mentions that it is an In-Memory Cursor Duration Temp Table (CURSOR DURATION MEMORY)

SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 0
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
----------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6640_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D6640_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - STRDEF[22], STRDEF[22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0]
3 - (#keys=1) "DEMO"."ID"[NUMBER,22]
4 - (rowset=256) "DEMO"."ID"[NUMBER,22]
5 - STRDEF[22], STRDEF[22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - "C0"[NUMBER,22]
9 - "ID"[NUMBER,22], ROWNUM[8]
10 - "ID"[NUMBER,22]
11 - "C0"[NUMBER,22]

/*+ inline */

If the CTE is not materialized, the rows are ordered, bit the table is read two times:


SQL> with cte as ( select /*+ inline */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL>
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 0dad0vwg3k0th, child number 0
-------------------------------------
with cte as ( select /*+ inline */ * from DEMO order by id) select
rownum,id from cte union all select rownum,id from cte
 
Plan hash value: 2913170750
 
------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 8 (100)| |
| 1 | UNION-ALL | | | | | |
| 2 | COUNT | | | | | |
| 3 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 4 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 5 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 4 (25)| 00:00:01 |
| 8 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 9 | TABLE ACCESS FULL| DEMO | 4 | 12 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22]
2 - "ID"[NUMBER,22], ROWNUM[8]
3 - "ID"[NUMBER,22]
4 - (#keys=1) "DEMO"."ID"[NUMBER,22]
5 - (rowset=256) "DEMO"."ID"[NUMBER,22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - (#keys=1) "DEMO"."ID"[NUMBER,22]
9 - (rowset=256) "DEMO"."ID"[NUMBER,22]

“_in_memory_cdt”=off

If the CTE is materialized, but the new feature to keep it in memory is disabled, the rows are ordered (but probably by chance – there si no guarantee):


SQL> alter session set "_in_memory_cdt"=off;
 
Session altered.
 
SQL>
SQL> with cte as ( select /*+ materialize */ * from DEMO order by id)
2 select rownum,id from cte
3 union all
4 select rownum,id from cte ;
 
ROWNUM ID
---------- ----------
1 1
2 2
3 3
4 4
1 1
2 2
3 3
4 4
 
8 rows selected.
 
SQL> select * from dbms_xplan.display_cursor(format=>'+projection');
 
PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 9uwc06ana6trn, child number 1
-------------------------------------
with cte as ( select * from DEMO order by id) select rownum,id from
cte union all select rownum,id from cte
 
Plan hash value: 4025392480
 
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TEMP TABLE TRANSFORMATION | | | | | |
| 2 | LOAD AS SELECT | SYS_TEMP_0FD9D664D_23D74B | | | | |
| 3 | SORT ORDER BY | | 4 | 12 | 4 (25)| 00:00:01 |
| 4 | TABLE ACCESS FULL | DEMO | 4 | 12 | 3 (0)| 00:00:01 |
| 5 | UNION-ALL | | | | | |
| 6 | COUNT | | | | | |
| 7 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 8 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
| 9 | COUNT | | | | | |
| 10 | VIEW | | 4 | 52 | 2 (0)| 00:00:01 |
| 11 | TABLE ACCESS FULL | SYS_TEMP_0FD9D664D_23D74B | 4 | 12 | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
 
Column Projection Information (identified by operation id):
-----------------------------------------------------------
 
1 - STRDEF[22], STRDEF[22]
2 - SYSDEF[4], SYSDEF[0], SYSDEF[1], SYSDEF[120], SYSDEF[0]
3 - (#keys=1) "DEMO"."ID"[NUMBER,22]
4 - (rowset=256) "DEMO"."ID"[NUMBER,22]
5 - STRDEF[22], STRDEF[22]
6 - "ID"[NUMBER,22], ROWNUM[8]
7 - "ID"[NUMBER,22]
8 - "C0"[NUMBER,22]
9 - "ID"[NUMBER,22], ROWNUM[8]
10 - "ID"[NUMBER,22]
11 - "C0"[NUMBER,22]

So what?

We should never rely on the preservation of the row order except when explicitly documented (like rownum over a subquery with order by). And this In-Memory Cursor Duration Temporary table also works in parallel and RAC, so we can understand that the rows are not read in the same order as they were stored. This is always a problem when we migrate. When something works by chance in a version and not in the newer versions, people usually incriminate the migration, rather than recognizing an old bug in the application, just because it was working before.

 

Cet article 18c: Order by in WITH clause is not preserved est apparu en premier sur Blog dbi services.

#Exasol Fail-Safety explained

The building blocks of an Exasol cluster are commodity Intel servers like e.g. Dell PowerEdge R740 with 96 GB RAM,12 x 1.2 TB SAS Hot-plug hard-drives and 2 x 10Gb Ethernet Cards for the private network. That’s sufficient to deliver outstanding performance combined with high availability. The picture below shows a 4+1 cluster, one of our most popular configurations:

Exasol 4+1 Clusterhttps://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=1240&h=908 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=150&h=110 150w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=300&h=220 300w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=768&h=562 768w, https://uhesse.files.wordpress.com/2018/07/failsafe1.png?w=1024&h=750 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Shared Nothing Architecture

Each active node hosts one database instance that works on its part of the database (A,B,C,D) in an MPP way. The instances communicate over the private network. Optionally, the private network can be separated into one database network and one storage network. In this case, the instances communicate over the database network. Notice that the instances access their part of the database directly on their local hard drives, they do not need the private network respectively the storage network for that. The reserve node becomes relevant only if one of the active nodes fails. The local hard drives are being setup in RAID 1 pairs, so single disk failures can be tolerated without losing database availability. Not listed is the license node that is required to boot the cluster initially. After that, the license node is no longer required to keep the cluster running.

If data volumes with redundancy 2 are in use – which is the most common case – then each node holds a copy of the data operated on by a neighbor node:

Exasol 4+1 Cluster: Redundancy 2https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=1240&h=892 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=150&h=108 150w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=300&h=216 300w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=768&h=552 768w, https://uhesse.files.wordpress.com/2018/07/failsafe2.png?w=1024&h=736 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Redundancy 2

If a Master-Segment like A is modified, the Slave-Segment (A’) is synchronized accordingly over the private network respectively the storage network.

Availability comes with a price: The raw disk capacity is reduced by half because of the RAID 1 mirroring and again by half because of the redundancy 2, so you remain with approximately (Linux OS and database software also require a small amount of disk space) 1/4 of your raw disk capacity. But since we are running on commodity hardware – no storage servers, no SAN, no SSDs required etc. – this is actually a very competitive price.

Now what if one node fails?

Exasol 4+1 Cluster: Node failurehttps://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=1240&h=892 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=150&h=108 150w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=300&h=216 300w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=768&h=552 768w, https://uhesse.files.wordpress.com/2018/07/failsafe3.png?w=1024&h=737 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Node failure

ExaClusterOS – Exasols Clusterware – will detect the node failure within seconds and shutdown all remaining database instances in order to preserve a consistent state of the database. Then it restarts them again on the still available 3 nodes and also on the Reserve node that now becomes an Active node too. The database itself becomes available again with the node n15 now immediately working with segment B’.

The downtime of the system caused by the node failure is below 30 seconds typically. The restart of the database triggers a threshold called Restore Delay which defaults to 10 Minutes. If within that time the failed node becomes available again, we will just re-synchronize the segments (A’ and B in the example) which can be done fast. The instance on n15 will then work with the segment B as a Master-Segment until the database is manually restarted. Then n15 becomes a reserve node again and n12 is active with an instance running there.

If the failed node doesn’t come back within Restore Delay:

Exasol 4+1 Cluster: Restore Delay is overhttps://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=1240&h=904 1240w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=150&h=109 150w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=300&h=219 300w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=768&h=559 768w, https://uhesse.files.wordpress.com/2018/07/failsafe41.png?w=1024&h=746 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Exasol 4+1 Cluster: Restore Delay is over

We will then create new segments on node n15: A’ is copied from n11 and B is copied from n13. This activity is time-consuming and puts a significant load on the private network, which is why configuring a dedicated storage network may be beneficial to avoid a drop in performance during that period. A new reserve node should now be added to the cluster, replacing the crashed n12.

Complex materialized views and fast refresh

Just a quick discovery that came across the AskTOM “desk” recently. We have an outstanding bug in some instances of fast refresh materialized views when the definition of the materialized view references a standard view.

Here’s a simple demo of the issue – I’ll use a simplified version of the EMP and DEPT tables, linked by a foreign key in the usual way:


SQL> create table dept(dept_id number(10) primary key, dname varchar2(20));
 
Table created.
 
SQL> create table emp(empid number(20) primary key, ename varchar2(20), sal number(10,2), dept_id number(10) references dept(dept_id));
 
Table created.
 
SQL> insert into dept values(10,'IT');
 
1 row created.
 
SQL> insert into dept values(20,'HR');
 
1 row created.
 
SQL> insert into dept values(30,'MAT');
 
1 row created.
 
SQL> insert into emp values(1,'MIKE',20000,10);
 
1 row created.
 
SQL> insert into emp values(2,'JOHN',30000,20);
 
1 row created.
 
SQL> insert into emp values(3,'SUE',20000,20);
 
1 row created.
 
SQL> insert into emp values(4,'TOM',40000,30);
 
1 row created.
 
SQL>
SQL> commit;
 
Commit complete.

One of the cool things with materialized views is that even with complicated SQL definitions (such as joins), the materialized view can still be fast refreshable as long as the materialized view logs and database constraints are correctly defined.


SQL> create materialized view log on dept
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL>
SQL> create materialized view log on emp
  2  with rowid, primary key, sequence
  3  including new values;
 
Materialized view log created.
 
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  8  from emp a, dept b
  9  where a.dept_id=b.dept_id;
 
Materialized view created.
 

Note: See the documentation for DBMS_MVIEW.EXPLAIN_MVIEW for how to check on the refresh characteristics of a materialized view (or potential materialized view).

Now I’ll repeat the same experiment, but I’ll wrap that SQL that joins EMP and DEPT within a standard view called VW. Since a view is just stored SQL text, and the previous usage of the same SQL worked fine, we’d expect no difference in functionality.  However, the results do not meet the expectation.


SQL> drop materialized view mv ;
 
Materialized view dropped.
 
SQL>
SQL> create or replace view vw
  2  as
  3  select a.rowid erowid,b.rowid drowid ,b.dname, a.*
  4  from emp a, dept b
  5  where a.dept_id=b.dept_id;
 
View created.
 
SQL>
SQL> create materialized view mv
  2  --build deferred
  3  refresh fast on demand
  4  with primary key
  5  enable query rewrite
  6  as
  7  select * from vw;
select * from vw
              *
ERROR at line 7:
ORA-12015: cannot create a fast refresh materialized view from a complex query

This is not a flaw in our DDL – it is a bug in the database that will be fixed in due course. So if you have standard views being used within your materialized view definitions, and you are getting unexpected restrictions on whether the materialized views can be fast refreshed, try a simple workaround of in-lining the view text directly.  You might have hit this bug.

Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song)

In Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of the index if all columns are referenced via equality predicates. If the leading column has a high number of distinct columns, it might result in less necessary data within index branches as less data […]