Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

RMOUG, Feb. 20th, 2018 #td18

So its the first day of RMOUG Training Days 2018 in Westminster, CO, (just NW of Denver, between Denver and Boulder) and we’re enjoying the winter weather after spring temperatures on Sunday, which dropped almost 50F degrees in a single day.  The change in temperatures was accompanied by about 8 inches of snow for Denver and most of those in summer gear were faced with only one choice….

Actually they were left with two-  They *could* go for a swim in the snow or they could register for the workshops at RMOUG Training Days, braving the roads, (which due to our lacking humidity weren’t as bad as they could be…)

We had a great group of instructors this year for our four offered sessions, Mike Dietrich, Kent Graziano, John King, and Peter Koletzke who offered some incredible workshop/deed dive sessions to kick off the event.

After lunch we’ll begin the one house sessions with our fantastic presenters and the schedule can be found here!

With that, I’ll get back to performing my duties of welcoming folks, social media and being the loveliest User Group president in the land… </p />
</p></div>

    	  	<div class=

A look into into Oracle redo, part 4: the log writer null write

This is the fourth blogpost on a series of blogposts about how the Oracle database handles redo. The previous blogpost talked about the work cycle of the log writer: https://fritshoogland.wordpress.com/2018/02/12/a-look-into-oracle-redo-part-3-the-log-writer-work-cycle-overview/. This posts is looking into the execution of the kcrfw_redo_write_driver function executed in the ksbcti.

Now that we are familiar with how the logwriter works in general, we need to take a closer look to the kcrfw_redo_write_driver function. First let me be clear: the kcrfw_redo_write_driver function inside ksbcti is called every time the logwriter process times out on its semaphore, which is every 3 seconds, so this means it is called too when nothing is written in the public redo strands. In fact, when the log writer times out on semtimedop (which means it is not (yet) semctl’ed or semop’ed), it doesn’t know if there are entries in the public redo strands at that point in time.

There is a lot of things that go on in the kcrfw_redo_write_driver function. The following tracing are small snippets of execution from the function, not the every detail in this function.

The first thing that is of interest is the logwriter obtaining a LWN (log write number) number and SCN.

 | | > kcsnew3(0x600113b8, 0x7ffc3dd10d28, ...)
 | | | > kcsnew8(0x600113b8, 0x7ffc3dd10b70, ...)
 | | | | > kslgetl(0x60049800, 0x1, ...)
 | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | > kslfre(0x60049800, 0x1, ...)
 | | | | < kslfre+0x0000000001e2 returns: 0
 | | | < kcsnew8+0x000000000117 returns: 0

Please mind at this time it’s unknown if there is anything to write in the public redo strands.

So the kcsnew3 function is executed, with 0x600113b8 as first argument. That is the address of kcsgscn_, the instance global SCN. The kcsnew3 function calls the kcsnew8 function, which calls kslgetl for latch address 0x60049800. That address belongs to the ‘lgwr LWN SCN’ latch in my instance. In the function call overview there is nothing more to see. If see look at the pina memory trace, we see a lot more; this is what is executed in between kslgetl and kslfre:

kcsnew8+107:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0x1/1()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):R:4:0xc5/197()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):W:4:0xc6/198()
kcsnew8+125:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b7e/7744382()
kcsnew8+131:0x00007fffa1071ae0():W:4:0/0()
kcsnew8+140:0x00007fffa1071b12():W:1:0/0()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b7e/7744382()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):W:8:0x762b7f/7744383()
kcsnew8+165:0x00007fffa1071ae0():R:4:0/0()
kcsnew8+184:0x00007f520ba8be08():R:8:0x600113b8/1610683320(fixed sga|var:kcsgscn_+0 )
kcsnew8+193:0x00007f520ba8ae90():R:8:0x7a823620/2055353888(shared pool|permanent memor,duration 1,cls perm+769 )
kcsnew8+212:0x000000007a823e40(shared pool|permanent memor,duration 1,cls perm+7695936 ):R:8:0x719beed8/190604(shared pool|ksu:stats_freel,duration 1,cls freeabl+24 )
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):R:8:0xc6/198()
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):W:8:0xc7/199()
kcsnew8+223:0x00007fffa1071bf0():W:8:0x762b7f/7744383()
kcsnew8+232:0x0000000060016260(fixed sga|var:kcrfsg_+56 shared pool|x_kcrfws.lwn_scn+0 ):W:8:0x762b7f/7744383()
kcsnew8+238:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0/0()
kcsnew8+246:0x0000000060016288(fixed sga|var:kcrfsg_+96 shared pool|pointer:lgwr lwn scn latch+0 ):R:8:0x60049800/1610913792(fixed sga|(parent)latch:lgwr LWN SCN+0 fixed sga|var:lwn_scn_lat_+0 )
kcsnew8+250:0x00007fffa1071ac8():W:8:0x10e3ed3f/283372863()

What is visible here, is that at kcrfsg_ offset 76 ‘1’ is written. This is quite probably a flag indicating the LWN related entries are being modified (this is a guess, of course the LWN SCN latch essentially performs the same function). Especially since this is set to ‘0’ at the end if the function (kcsnew8+238). At kcrfsg_ offset 72 a number is read and increased by 1. This is most likely the LWN number. If you take a look at the number of gets for the ‘lgwr LWN’ you will see the number of gets is very close (albeit not equal) to the LWN number.

Then the LWN SCN is fetched from kcsgscn_; it first is read, and increased by one to 7744383 and written back in kcsgscn_ (get and advance). The SCN from kcsgscn_ then is written in the kcrfsg_ struct at offset 56 a little further as well, after which the earlier set ‘1’ is reset to ‘0’ at offset 76. This SCN is visible in x$kcrfws as LWN SCN.

The next thing to look at is the function kcrfw_gather_lwn a little further:

 | | > kcrfw_gather_lwn(0x7ffc3dd10d68, 0x77eced90, ...)
 | | | > kslgetl(0x76fe0c10, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7ffc3dd10d68, 0, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x76fe0c10, 0x118b2, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | | > kslgetl(0x76fe0cb0, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7ffc3dd10d68, 0x1, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x76fe0cb0, 0x19, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | < kcrfw_gather_lwn+0x00000000065c returns: 0xffffffff

In the kcrfw_gather_lwn function first latch address x76fe0c10 is taken in immediate mode, which is the redo allocation child #1, and then the function kcrfw_gather_strand is called, which returns 0, after which the latch is freed. This pattern repeats itself with a different latch address, 0x76fe0cb0, which is redo allocation child #2. This gives a hint the two public strands are active. By looking at the memory accesses trace again, a fixed sga variable is used to find out how many strands are active:

7038:kcrfw_gather_lwn+76:0x00000000600169b8(Fixed Size(pgsz:2048k)|+92600 fixed sga|var:kcrf_max_strands_+0 ):R:4:0x2/2()

A little further information is obtained from kcrfsg_ about the redo allocation latch and the KCRFA structure address:

7052:kcrfw_gather_lwn+223:0x0000000060016570(fixed sga|var:kcrfsg_+840 ):R:8:0x76fe0c10/1996360720(shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
7053:kcrfw_gather_lwn+230:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )

Then the redo allocation latch child #1 is obtained via kslgetl, after which kcrfw_gather_lwn calls kcrfw_gather_strand:

kcrfw_gather_strand+2:0x00007fffa1071b00():W:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+15:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+22:0x00007fffa1071af8():W:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+29:0x00007fffa1071af0():W:8:0x76fdf2b0/1996354224(shared pool|permanent memor,duration 1,cls perm+16114352 )
kcrfw_gather_strand+36:0x0000000076fdf458(shared pool|KCRFA+208 shared pool|LAST_BUF_WRITTEN+0 shared pool|permanent memor,duration 1,cls perm+16114776 ):R:4:0x65e/1630()
kcrfw_gather_strand+53:0x0000000076fdf480(shared pool|KCRFA+248 shared pool|TOTAL_BUFS_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114816 ):R:4:0x20000/131072()
kcrfw_gather_strand+70:0x00007fffa1071de8():R:8:0x762b7f/7744383()
kcrfw_gather_strand+79:0x0000000076fdf390(shared pool|KCRFA+8 shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x762b7c/7744380()
kcrfw_gather_strand+86:0x0000000076fdf3a8(shared pool|KCRFA+32 shared pool|NEXT_BUF_NUM_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x65e/1630()
kcrfw_gather_strand+121:0x0000000076fdf45c(shared pool|KCRFA+212 shared pool|LAST_BUF_GATHERED_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114780 ):R:4:0x65e/1630()
kcrfw_gather_strand+169:0x00007fffa1071bf8():W:4:0x65e/1630()
kcrfw_gather_strand+176:0x00007fffa1071bfc():W:4:0/0()
kcrfw_gather_strand+182:0x00007fffa1071af0():R:8:0x76fdf2b0/1996354224(shared pool|permanent memor,duration 1,cls perm+16114352 )
kcrfw_gather_strand+186:0x00007fffa1071af8():R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+193:0x00007fffa1071b00():R:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+194:0x00007fffa1071b08():R:8:0x31f0f00/52367104()

In kcrfw_gather_strand, we see the sequence we have seen before: at offset 0 of kcrfsg_, the KCRFA struct address is looked up, after which the function investigates addresses in the KCRFA struct for the first public redo strand only. I’ve annotated these in the pinatrace snippet above. At offset 36 in the function, the last buffer written to the online redologfile is obtained from offset 208 in KCRFA, which is public strand buffer 1630. A little further, at offset 86 and 121 in the function kcrfw_gather_strand, the most recently used public strand buffer that is written to is obtained at KCRFA offset 32 (NEXT_BUF_NUM_KCRFA_CLN), and the highest buffer number that has been investigated (gathered) by the logwriter process at KCRFA offset 212 (LAST_BUF_GATHERED_KCRFA). In all these cases, the public redo strand buffer number is 1630. This means no redo has been written into the first public redo strand.
After reading the KCRFA information for the first public redo strand, the latch for this struct is freed.

A little further, the latch is obtained that protects the second public strand, and kcrfw_gather_strand is executed again to read the KCRFA information for this strand:

kcrfw_gather_strand+2:0x00007fffa1071b00():W:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+15:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+22:0x00007fffa1071af8():W:8:0x76fdf4b0/1996354736(shared pool|KCRFA+296 shared pool|permanent memor,duration 1,cls perm+16114864 )
kcrfw_gather_strand+29:0x00007fffa1071af0():W:8:0x76fdf1f0/1996354032(shared pool|permanent memor,duration 1,cls perm+16114160 )
kcrfw_gather_strand+36:0x0000000076fdf580(shared pool|KCRFA+504 shared pool|permanent memor,duration 1,cls perm+16115072 ):R:4:0x1d/29()
kcrfw_gather_strand+53:0x0000000076fdf5a8(shared pool|KCRFA+544 shared pool|permanent memor,duration 1,cls perm+16115112 ):R:4:0x20000/131072()
kcrfw_gather_strand+70:0x00007fffa1071de8():R:8:0x762b7f/7744383()
kcrfw_gather_strand+79:0x0000000076fdf4b8(shared pool|KCRFA+304 shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x762ae7/7744231()
kcrfw_gather_strand+86:0x0000000076fdf4d0(shared pool|KCRFA+328 shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0x1d/29()
kcrfw_gather_strand+121:0x0000000076fdf584(shared pool|KCRFA+508 shared pool|permanent memor,duration 1,cls perm+16115076 ):R:4:0x1d/29()
kcrfw_gather_strand+169:0x00007fffa1071bf8():W:4:0x1d/29()
kcrfw_gather_strand+176:0x00007fffa1071bfc():W:4:0/0()
kcrfw_gather_strand+182:0x00007fffa1071af0():R:8:0x76fdf1f0/1996354032(shared pool|permanent memor,duration 1,cls perm+16114160 )
kcrfw_gather_strand+186:0x00007fffa1071af8():R:8:0x76fdf4b0/1996354736(shared pool|KCRFA+296 shared pool|permanent memor,duration 1,cls perm+16114864 )
kcrfw_gather_strand+193:0x00007fffa1071b00():R:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+194:0x00007fffa1071b08():R:8:0x31f0f00/52367104()

This is exactly the same function, just reading the same information for a different strand, which means at a different offset in KCRFRA. If you subtract the size of the the information for a single strand which we calculated to be 296 from the KCRFA offsets (see blogpost number 2), you get the same numbers as above (504-296=208=LAST_BUF_WRITTEN). Once you understand what is read, it should be fairly easy to see nothing has been written into the second strand too. So at this point, the logwriter knows there is nothing to write.

What does the logwriter do when there is nothing to write? It executes a ‘null write’:

 | | > kcrfw_do_null_write(0, 0, ...)
 | | | > kcrfw_slave_phase_batchdo(0, 0, ...)
 | | | | > kcrfw_slave_phase_enter(0, 0x40, ...)
 | | | | < kcrfw_slave_phase_enter+0x000000000449 returns: 0
 | | | <> kcrfw_slave_phase_exit(0, 0x40, ...)
 | | | < kcrfw_slave_phase_exit+0x00000000035a returns: 0
 | | | > kcrfw_post(0, 0, ...)
 | | | | > kcrfw_slave_single_getactivegroup(0, 0, ...)
 | | | | < kcrfw_slave_single_getactivegroup+0x000000000047 returns: 0x769a5620
 | | | | > kspGetInstType(0x1, 0x1, ...)
 | | | | | > vsnffe_internal(0x19, 0x1, ...)
 | | | | | | > vsnfprd(0x19, 0x1, ...)
 | | | | | | < vsnfprd+0x00000000000f returns: 0x8
 | | | | | | > kfIsASMOn(0x19, 0x1, ...)
 | | | | | | <> kfOsmInstanceSafe(0x19, 0x1, ...)
 | | | | | | < kfOsmInstanceSafe+0x000000000031 returns: 0
 | | | | | < vsnffe_internal+0x0000000000a7 returns: 0
 | | | | | > kspges(0x115, 0x1, ...)
 | | | | | < kspges+0x00000000010f returns: 0
 | | | | < kspGetInstType+0x0000000000b1 returns: 0x1
 | | | | > kcrfw_slave_phase_enter(0x1, 0x40, ...)
 | | | | < kcrfw_slave_phase_enter+0x00000000006f returns: 0x40
 | | | | > kcscu8(0x60016290, 0x7ffc3dd10a98, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > kcsaj8(0x60016290, 0x7ffc3dd10a38, ...)
 | | | | < kcsaj8+0x0000000000dc returns: 0x1
 | | | | > kcrfw_slave_phase_exit(0x1, 0x40, ...)
 | | | | < kcrfw_slave_phase_exit+0x00000000008e returns: 0
 | | | | > kslpsemf(0x97, 0, ...)
 | | | | | > ksl_postm_init(0x7ffc3dd08730, 0x7ffc3dd10750, ...)
 | | | | | < ksl_postm_init+0x00000000002b returns: 0
 | | | | < kslpsemf+0x0000000006b5 returns: 0x1f
 | | | | > kcrfw_slave_barrier_nonmasterwait(0x769a5628, 0x4, ...)
 | | | | < kcrfw_slave_barrier_nonmasterwait+0x000000000035 returns: 0x600161a0
 | | | < kcrfw_post+0x000000000c1c returns: 0xd3
 | | < kcrfw_do_null_write+0x0000000000b2 returns: 0xd3

This means there is housekeeping to do, despite no public redo strand buffers needing writing. The most crucial things it does that I can detect, is updating the on disk SCN in the kcrfsg_ struct with the already set LWN SCN and posting any outstanding processes that are sleeping on a semaphore via post/wait. These are done in the kcrfw_post function, which probably means (kernel cache redo file write) ‘post write’, however there was nothing to write (null write), since there was nothing in the public redo strands. If you look to the above function trace, you see inside kcrfw_do_null_write, kcrfw_post is called.

Inside kcrfw_post, kcscu8 (kernel cache service get current SCN) is called with 0x60016290 as first argument, which is the memory address of offset 104 in the kcrfsg_ struct in the fixed SGA, which is externalised as ON_DISK_SCN in X$KCRFWS, which is done to read the current value of the on disk SCN:

kcscu8+2:0x00007fffa10718a0():W:8:0x7fffa1071bb0/140735894985648()
kcscu8+13:0x00007fffa1071898():W:8:0x77eced90/2012016016(Variable Size(pgsz:2048k)|+384626064 shared pool|permanent memor,duration 1,cls perm+2451496 )
kcscu8+20:0x00007fffa1071890():W:8:0x7f520ba98f40/139990359707456()
kcscu8+27:0x00000000600162a0(Fixed Size(pgsz:2048k)|+90784 fixed sga|var:kcrfsg_+120 ):R:4:0/0()
kcscu8+31:0x00000000600162a4(Fixed Size(pgsz:2048k)|+90788 fixed sga|var:kcrfsg_+124 ):R:4:0/0()
kcscu8+39:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):R:8:0x762b7e/7744382()
kcscu8+43:0x00000000600162a0(Fixed Size(pgsz:2048k)|+90784 fixed sga|var:kcrfsg_+120 ):R:4:0/0()
kcscu8+56:0x00007fffa1071b18():W:8:0x762b7e/7744382()
kcscu8+59:0x00007fffa1071898():R:8:0x77eced90/2012016016(Variable Size(pgsz:2048k)|+384626064 shared pool|permanent memor,duration 1,cls perm+2451496 )
kcscu8+63:0x00007fffa1071890():R:8:0x7f520ba98f40/139990359707456()
kcscu8+70:0x00007fffa10718a0():R:8:0x7fffa1071bb0/140735894985648()
kcscu8+71:0x00007fffa10718a8():R:8:0x3202150/52437328()

The next function that is called is kcsaj8 (kernel cache service adjust SCN), with 0x60016290 as first argument, indicating it is going to change the on disk SCN:

kcsaj8+105:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):R:8:0x762b7e/7744382()
kcsaj8+105:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):W:8:0x762b7f/7744383()
(these are only the relevant lines from the memory trace for the function kcsaj8)

What this means is that the LWN SCN that was obtained at te beginning of the write cycle using the kcsnew3 function, now is written into the on disk SCN too, despite anything actually being written to disk. The obvious question at this point is ‘if the on disk SCN is increased without anything actually written, would there be a SCN value that actually tracks the SCN of truly written (“on disk”) redo? To spoil the next blogpost about actual redo writing, this is administered in a SCN value called the ‘real redo SCN’.

After kcrfw_post returns, the execution returns from kcrfw_do_null_write and kcrfw_redo_write_driver, ending the ‘null write’.

Taking Notes – 2

[Originally written August 2015, but not previously published]

If I’m taking notes in a presentation that you’re giving there are essentially four possible reasons:

  • You’ve said something interesting that I didn’t know and I’m going to check it and think about the consequences
  • You’ve said something that I knew but you’ve said it in a way that made me think of some possible consequences that I need to check
  • You’ve said something that I think is wrong or out of date and I need to check it
  • You’ve said something that has given me a brilliant idea for solving a problem I’ve had to work around in the past and I need to work out the details

Any which way, if I’m taking notes it means I’ve probably just added a few more hours of work to my todo list.

Footnote

“Checking” can include:

  • having a chat
  • reading the manuals
  • finding a recent Oracle white-paper
  • searching MoS
  • building some models

Philosophy

Here’s a note I’ve just re-discovered – at the time I was probably planning to extend it into a longer article but I’ve decided to publish the condensed form straight away.

In a question to the Oak Table a couple of years ago (May 2015) Cary Millsap asked the following:

If you had an opportunity to tell a wide audience of system owners, users, managers, project leaders, system architects, DBAs, and developers “The most important things you should know about Oracle” what would you tell them?

I imagine that since then Cary has probably discussed the pros and cons of some of the resulting thoughts in one of his excellent presentations on how to do the right things, but this was my quick response:

If I had to address them all at once it would be time to go more philosophical than technical.

The single most important point: Oracle is a very large, complex, and flexible product. It doesn’t matter where you are approaching it from you will not have enough information on your own to make best use of it. You have to talk to your peer group to get alternative ideas, and you have to talk to the people at least one step either side of you on the technology chain (dev to dba, dba to sysadmin, Architect to dev, dba to auditor etc.) to understand options and consequences. Create 4 or 5 scenarios of how your system should behave and then get other people – and not just your peer group – to identify their advantages and threats.

Assumptions

As the years roll on I’ve found it harder and harder to supply quick answers to “simple” questions on the Oracle-L list server and OTN/ODC forum because things are constantly changing and an answer that may have been right the last time I checked could now be wrong. A simple example of the consequences of change showed up recently on the OTN/ODC forum where one reply to a question started:

Just why do you need distinct in a subquery??? That’s the first thing that appears really shocking to me. If it’s a simple in (select …) adding a distinct to the subquery would just impose a sort unique(as you can see in the explain plan), which may be quite costly.

Three question-marks is already tip-toeing its way to the Pratchett limit – but “really shocking” ? It’s bad enough that the comment goes operatic, but going operatic in order to introduce an error pushes the thing into tragedy (or possibly comedy – or maybe both). To make the self-inflicted injury worse, there were two execution plans supplied in the original post anyway of which only one showed any attempt to achieve uniqueness.

Bottom line – when you’re about to correct someone for doing something that is “obviously” wrong, be a little bit kind about it and then be kind to yourself and do a quick sanity check that your attempt at correction is itself correct. A good guideline would be to ask yourself: “How do I know what I know – and am I about to make myself look like an idiot.”

Check It

Question: Does a  “distinct” in a subquery impose a sort (or hash) unique ?

Answer: No – a uniqueness operation may appear, but it’s not guaranteed to appear.

Here’s a quick example which does not result in any attempt at imposing uniqueness (running 11.2.0.4):


drop table t2 purge;
drop table t1 purge;
create table t1 as select * from all_objects where rownum  <= 100;
create table t2 as select * from all_objects where rownum <= 100;

create index t1_i1 on t1(owner);
create index t2_i2 on t2(object_type);

set autotrace traceonly explain

select  * 
from    t1 
where   owner = 'OUTLN' 
and     object_name in (
                select distinct object_name 
                from   t2 
                where  object_type = 'TABLE'
        )
;


Execution Plan
----------------------------------------------------------
Plan hash value: 3169044451

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     3 |   558 |     4   (0)| 00:00:01 |
|*  1 |  HASH JOIN SEMI              |       |     3 |   558 |     4   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T2    |    12 |   336 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   5 - access("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)


There’s no sign of a sort unique or hash unique. The optimizer has decided that the IN subquery can be transformed into an EXISTS subquery, which can then be transformed into a semi-join.

I can think of three other execution plan strategies that might have appeared depending on the data, indexing, and statstics:

a) Transform the IN subquery to an EXISTS subquery then operate as a filter subquery (with no uniqueness imposed):


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |   158 |     5   (0)| 00:00:01 |
|*  1 |  FILTER                      |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     3 |   474 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     3 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |    28 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | T2_I2 |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

b) Simple unnest with sort/hash unique and join


---------------------------------------------------------------------------
| Id  | Operation                      | Name     | Rows  | Bytes | Cost  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |          |     1 |   175 |     9 |
|*  1 |  HASH JOIN                     |          |     1 |   175 |     9 |
|   2 |   TABLE ACCESS BY INDEX ROWID  | T1       |     4 |   632 |     1 |
|*  3 |    INDEX RANGE SCAN            | T1_I1    |     2 |       |     1 |
|   4 |   VIEW                         | VW_NSO_1 |     4 |    68 |     7 |
|   5 |    SORT UNIQUE                 |          |     4 |   112 |     7 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2       |     4 |   112 |     1 |
|*  7 |      INDEX RANGE SCAN          | T2_I2    |     2 |       |     1 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("OBJECT_NAME"="OBJECT_NAME")
   3 - access("OWNER"='OUTLN')
   7 - access("OBJECT_TYPE"='TABLE')

For this data set I actually had to take the optimizer_features_enable back to ‘8.1.7’ to get this plan – but you can see that there’s a SORT UNIQUE at operation 5, but that would have been there whether or not the DISTINCT keyword had appeared in the SQL. Effectively the query has been transformed to:

select  t1.*
from    (
                select  distinct t2.object_name object_name
                from    t2
                where   t2.object_type='TABLE'
        )
        vw_nso_1,
        t1
where   t1.owner = 'OUTLN'
and     t1.object_name = vw_nso_1.object_name
/

c) Unnest, then “place group by” so that the distinct is applied after the join

--------------------------------------------------------------------------------------------
| Id  | Operation                      | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |           |     3 |   474 |     5  (20)| 00:00:01 |
|   1 |  VIEW                          | VM_NWVW_1 |     3 |   474 |     5  (20)| 00:00:01 |
|   2 |   HASH UNIQUE                  |           |     3 |   594 |     5  (20)| 00:00:01 |
|*  3 |    HASH JOIN                   |           |     3 |   594 |     4   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1        |     3 |   510 |     2   (0)| 00:00:01 |
|*  5 |      INDEX RANGE SCAN          | T1_I1     |     3 |       |     1   (0)| 00:00:01 |
|   6 |     TABLE ACCESS BY INDEX ROWID| T2        |    12 |   336 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | T2_I2     |    12 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."OBJECT_NAME"="T2"."OBJECT_NAME")
   5 - access("T1"."OWNER"='OUTLN')
   7 - access("T2"."OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic sampling used for this statement (level=2)

Again, the plan would be the same whether or not the original subquery had a redundant DISTINCT. (Again, with this data set, I had to be a bit devious to get this lan).

The things you think you know may have been true 10 years ago – but maybe they’re not true any longer, or maybe they’re still true on your version of the database but not every  version of the database. So I often end up looking at a question, thinking the poster’s claim can’t be right, and then working out and modelling the circumstances that might make the poster’s observations appear (and learning something new).

Remember: “I’ve never seen it before” doesn’t mean “It doesn’t happen”.

 

You may still need gcc when patching Oracle Database 12.2

I have previously written about changes in the Oracle 12.2 preinstall RPM and how gcc is no longer part of the dependencies list. As was pointed out to me, this shouldn’t be necessary anymore, according to the 12.2 Linux Database Installation Guide. Check the blue note for a statement indicating that gcc and gcc-c++ aren’t needed for Grid Infrastructure, nor for the RDBMS software.

I have applied patch 27100009 (January 2018 Release Update 12.2.0.1.180116) on my 2 node RAC system in the lab, and found out that this is partially true :) You may or may not encounter this issue in your environment, see below.

Updating the preinstall RPM

Just to be sure I didn’t miss any changes to the preinstall RPM I pulled the latest one from Oracle and checked it’s requirements for gcc.

[oracle@rac122pri1 ~]$ rpm -qi oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64
Name        : oracle-database-server-12cR2-preinstall
Version     : 1.0
Release     : 3.el7
Architecture: x86_64
Install Date: Fri 19 Jan 2018 03:13:18 PM GMT
Group       : Test Environment/Libraries
Size        : 56561
License     : GPLv2
Signature   : RSA/SHA256, Mon 10 Jul 2017 11:27:07 AM BST, Key ID 72f97b74ec551f03
Source RPM  : oracle-database-server-12cR2-preinstall-1.0-3.el7.src.rpm
Build Date  : Mon 10 Jul 2017 11:26:59 AM BST
Build Host  : x86-ol7-builder-02.us.oracle.com
Relocations : (not relocatable)
Vendor      : Oracle
Summary     : Sets the system for Oracle Database single instance and Real Application
              Cluster install for Oracle Linux 7
Description :
The Oracle Preinstallation RPM package installs software packages and sets system parameters 
required for Oracle Database single instance and Oracle Real Application Clusters installations 
for Oracle Linux Release 7
Files affected: /etc/sysctl.conf, /boot/grub/menu.lst OR /boot/grub2/grub.cfg
Files added: /etc/security/limits.d/oracle-database-server-12cR2-preinstall.conf
[oracle@rac122pri1 ~]$ rpm -q --requires oracle-database-server-12cR2-preinstall-1.0-3.el7.x86_64 | grep -i gcc
libgcc
[oracle@rac122pri1 ~]$ 

So gcc is definitely not part of the dependent packages, and my minimum install doesn’t pull gcc (or gcc-c++ for that matter):

[oracle@rac122pri1 ~]$ rpm -qa | grep gcc
libgcc-4.8.5-16.el7_4.1.x86_64
[oracle@rac122pri1 ~]$ 

This is a 2 node cluster, consisting of nodes rac122pri1 and rac122pri2. Both use Oracle Linux 7.4 with UEK4 patched to February 12th 2018. As the names I picked for my cluster nodes suggest, this is a system running RAC 12.2.

IMPORTANT note: just because I hit the issue doesn’t mean there’s an issue for everyone. My lab environment is that – my lab environment. I just wanted to point out a way to investigate the problem and how I resolved it in my lab environment. Your mileage might vary.

Applying the January 2018 RU

I always like to be current when it comes to Oracle, and in that spirit decided to apply the January 2018 RU to my cluster after having made sure that I have a working backup. My rule is that there is no patching without backups, ever. And by backups I mean backups of the entire stack ;)

Since this was a new installation in my lab I thought I’d give opatchauto another chance and run with it. What could possibly go wrong?

Node 1 went without any issues, and opatchauto reported that it had applied the patches it wanted to apply. I don’t have the screen output anymore, however here’s the summary after the patch completed.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/oracle/product/12.2.0.1/dbhome_1
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)

OPatch succeeded.

[oracle@rac122pri1 ~]$ opatch lspatches -oh /u01/app/12.2.0.1/grid/
27335416;OCW JAN 2018 RELEASE UPDATE 12.2.0.1.180116 (27335416)
27144050;Tomcat Release Update 12.2.0.1.0(ID:171023.0830) (27144050)
27128906;ACFS Release Update : 12.2.0.1.0 (27128906)
27105253;Database Release Update : 12.2.0.1.180116 (27105253)
26839277;DBWLM RELEASE UPDATE 12.2.0.1.0(ID:170913) (26839277)

OPatch succeeded.
[oracle@rac122pri1 ~]$

Did you notice the “Tomcat Release Update 12.2.0.1.0” in the Grid Home? So much to investigate!

Patching node 2

After the patch completed on node 1 I continued with node 2. At first, everything looked quite normal, and opatch did it’s thing. I have taken a habit to always apply anything that depends on the network connection to be up in a screen (1) session. That protects the work I’m doing from intermittent network trouble, and allows me to keep a view on how things are progressing.

After a few minutes of work, opatchauto reported this (formatted for readability):

Start applying binary patch on home /u01/app/12.2.0.1/grid
Failed while applying binary patches on home /u01/app/12.2.0.1/grid

Execution of [OPatchAutoBinaryAction] patch action failed, check log for more details. Failures:
Patch Target : rac122pri2->/u01/app/12.2.0.1/grid Type[crs]
Details: [
---------------------------Patching Failed---------------------------------
Command execution failed during patching in home: /u01/app/12.2.0.1/grid, host: rac122pri2.
Command failed:  /u01/app/12.2.0.1/grid/OPatch/opatchauto  apply /u01/stage/27100009 -oh /u01/app/12.2.0.1/grid
    -target_type cluster -binary -invPtrLoc /u01/app/12.2.0.1/grid/oraInst.loc 
    -jre /u01/app/12.2.0.1/grid/OPatch/jre 
    -persistresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_rac122pri2_crs.ser 
    -analyzedresult /u01/app/12.2.0.1/grid/OPatch/auto/dbsessioninfo/sessionresult_analyze_rac122pri2_crs.ser
Command failure output:
==Following patches FAILED in apply:

Patch: /u01/stage/27100009/27335416
Log: /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/core/opatch/opatch2018-01-22_09-37-57AM_1.log
Reason: Failed during Patching: oracle.opatch.opatchsdk.OPatchException: Re-link fails on target "install_srvm".

Not exactly what you want to see during patching. But there’s no need to panic (just yet ;) The error message points me to a log file. Opening the log file it quickly become apparent why the issue occurred:

[22-Jan-2018 09:40:30] [INFO]       [OPSR-TIME] Finished applying patch "27335416" to local system
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading raw inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all components from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] Loaded all one offs from inventory. Heap memory in use: 314 (MB)
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Raw inventory loaded successfully
[22-Jan-2018 09:40:31] [INFO]       [OPSR-TIME] Loading cooked inventory
[22-Jan-2018 09:40:31] [INFO]       [OPSR-MEMORY] : Loading cooked one offs. Heap memory used 314 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-MEMORY] : Loaded cooked oneoffs. Heap memory used : 363 (MB)
[22-Jan-2018 09:40:32] [INFO]       [OPSR-TIME] Cooked inventory loaded successfully
[22-Jan-2018 09:40:32] [INFO]       OUI-67050:Running make for target install_srvm
[22-Jan-2018 09:40:32] [INFO]       Start invoking 'make' at Mon Jan 22 09:40:32 GMT 2018Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [INFO]       Finish invoking 'make' at Mon Jan 22 09:40:32 GMT 2018
[22-Jan-2018 09:40:32] [WARNING]    OUI-67200:Make failed to invoke "/usr/bin/make -f ins_srvm.mk install_srvm 
                                    ORACLE_HOME=/u01/app/12.2.0.1/grid"....'/bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
                                    '
[22-Jan-2018 09:40:32] [INFO]       Stack Description: java.lang.RuntimeException: /bin/sh: /usr/bin/gcc: No such file or directory
                                    make: *** [/u01/app/12.2.0.1/grid/rdbms/lib/config.o] Error 127
[22-Jan-2018 09:40:32] [INFO]       StackTrace: oracle.opatch.MakeAction.apply(MakeAction.java:534)

This is actually quite simple: for some reason opatch wants to use gcc, and doesn’t find it. After installing gcc and dependencies, I resumed opatchauto and finished patching successfully. I haven’t understood why Oracle wants to use gcc on node 2 after it didn’t require it on node 1.

Just thought I’d pass this on in case you hit the same problem. Happy patching!

European Indexing Internals Seminar Events: Please Help Me Select (Station To Station)

I’m currently determining which European countries to initially target for a series of my 2 day Oracle Indexing Internals and Best Practices seminars tentatively scheduled for the late May / early June time frame. The cost for the 2 day seminar is to be confirmed but will be approximately 1200 Euros (plus any local taxes), […]

Execution plans on LiveSQL

To protect the integrity of people’s data, and isolate sessions on LiveSQL, we lock down the environment.  Clearly if you are doing some testing with sensitive data, you don’t want an anonymous member of the user community mining V$SQL to see what commands you have been running.  Conversely, we want to allow people to perform most of the tasks that would do on a standard database installation without having to install or configure anything.  That’s the great thing about LiveSQL.

So for that reason, you don’t get access to all of the features of DBMS_XPLAN.  But since we do provide access to a limited set of V$ view access, you can get execution plan details by going back to first principles.  Here’s some scripts you can use


--
-- First run your query
--
select /*+ gather_plan_statistics */ .... (my_query)

--
--  Then locate the SQL_ID/CHILD_NUMBER for your just executed query
--
select * from v$sqlstats where sql_text like '...';


--
-- Plug them into the queries below.  
--
-- The first one gets the execution plan, the second one get the runtime statistics
--
with plan_table as 
(
select * from v$sql_plan_statistics_all
where sql_id = "my sql id"
and child_number = "my child"
)
select id "Id",
  rpad(
  lpad(' ',2*level) || operation || ' ' || options , 40 ) "Operation",
  object_name "Name",
  cardinality "Rows",
  bytes "Bytes", 
  cost "Cost",
  rtrim(to_char(trunc(time/3600),'fm00')||':'||
  to_char(mod(trunc(time/60),1),'fm00')||':'||
  to_char(trunc(mod(time,60)),'fm00'),':')     "Time"
from plan_table
connect by prior id = parent_id 
start with id = 0
union all
select null, null,null, null,null, null,null from dual
union all
select null, lpad(id,4) ||'-filter '||FILTER_PREDICATES,null, null,null, null,null 
from plan_table where FILTER_PREDICATES is not null
union all
select null, lpad(id,4) ||'-access '||ACCESS_PREDICATES,null, null,null, null,null 
from plan_table where ACCESS_PREDICATES is not null
order by 1 nulls last, 2 nulls first;



with plan_table as 
(
select * from v$sql_plan_statistics_all
where sql_id = 'd09nv17gy4j3z'
and child_number = 0
)
select id "Id",
  rpad(
  lpad(' ',2*level) || operation || ' ' || options , 40 ) "Operation",
  object_name "Name",
  last_starts "Starts",
  cardinality "E-Rows", 
  LAST_OUTPUT_ROWS "A-Rows",
  rtrim(to_char(trunc(time/3600),'fm00')||':'||
  to_char(mod(trunc(time/60),1),'fm00')||':'||
  to_char(trunc(mod(time,60)),'fm00'),':')     "Time",
  LAST_CR_BUFFER_GETS+LAST_CU_BUFFER_GETS "Buffers"
from plan_table p
connect by prior id = parent_id 
start with id = 0
union all
select null, null,null, null,null, null,null,null from dual
union all
select null, lpad(id,4) ||'-filter '||FILTER_PREDICATES,null, null,null, null,null,null 
from plan_table where FILTER_PREDICATES is not null
union all
select null, lpad(id,4) ||'-access '||ACCESS_PREDICATES,null, null,null, null,null,null 
from plan_table where ACCESS_PREDICATES is not null
order by 1 nulls last, 2 nulls first;


18c Read Only Oracle Home

Capture18c000This is the big new feature of Oracle 18c about database software installation. Something that was needed for decades for the ease of software deployment. Piet de Visser raised this to Oracle a long time ago, and we were talking about that recently when discussing this new excitement to deploy software in Docker containers. Docker containers are by definition immutable images. You need a Read Only Oracle Home, all the immutable files (configuration, logs, database) being in an external volume. Then, to upgrade the software, you just open this volume with an image of the new database version.

roohctl

In 12.2 you may have seen a ‘roohctl’ script in ORACLE_HOME/bin. The help explains that ‘rooh’ stands for Read-Only Oracle Home:

[oracle@VM122 ~]$ roohctl -help
Usage: roohctl [] [ ]
Following are the possible flags:
-help
 
Following are the possible commands:
-enable Enable Read-only Oracle Home
-disable Disable Read-only Oracle Home

Note that in 18c the help does not show ‘-disable’ even if it is accepted….
So in 12cR2 you were able to run ‘roohctl -enable’ but the only thing it did was changing the Yes/No flag in orabasetab:

cat $ORACLE_HOME/install/orabasetab
#orabasetab file is used to track Oracle Home associated with Oracle Base
/u01/app/oracle/product/12.2.0/dbhome_1:/u01/app/oracle:OraDB12Home1:Y:

Oracle 18

Here is an Oracle 18 that I re-installed (as an Oracle Home Clone) with the following:

runInstaller -clone ORACLE_HOME=/u01/app/oracle/product/181 ORACLE_HOME_NAME=O181 ORACLE_BASE=/u00/app/oracle

My idea is to be able to easily differentiate the different paths (ORACLE_HOME under /u01 and ORACLE_BASE under /u00)

The $ORACLE_HOME/install/orabasetab records the ORACLE_HOME, ORACLE_BASE and ORACLE_HOME_NAME:

[oracle@VM181 18c]$ cat $ORACLE_HOME/install/orabasetab
#orabasetab file is used to track Oracle Home associated with Oracle Base
/u01/app/oracle/product/181:/u01/app/oracle:O181:N:

ORACLE_HOME: This may seem useless because this file is under ORACLE_HOME, so if you read it you are supposed to know the ORACLE_HOME. However, you may find it from different paths (symbolic links, /../.) and this is a good way to normalize it.

ORACLE_BASE: This will be used to externalize the mutable files outside of the ORACLE_HOME

ORACLE_HOME_NAME: is the name of Oracle Home that you provide when installing and you can find in the Oracle Inventory.

The last field is ‘N’ when the mutable files are under ORACLE_HOME and ‘Y’ when they are externalized to have an immutable Read Only Oracle Home.

We are not supposed to use this file directly. It is modified by runInstaller and roohctl. And it is read by orabasehome and orabaseconfig

orabasehome and orabaseconfig

We have two new location name derived from the orabasetab content.

One is the ‘Oracle Base Config’ which is mostly there to find the configuration files (.ora, .dat) in the /dbs subdirectory. With Read Only Oracle Home, this is set to the ORACLE_BASE:

[oracle@VM181 18c]$ orabaseconfig
/u00/app/oracle

Most of the files in /dbs have the ORACLE_SID in their name, which is unique in the host, and this is why they can all go into the same directory. However, I would prefer a subdirectory per database. When you move a database from one system to another, it is easier to move a directory. You can do per-file symbolic links but be sure to maintain them as they may be re-created as files.

The other is the ‘Oracle Base Home’ which is mostly there for the /network subdirectory (with the SQL*Net configuration files, logs and trace) and the /assistant (DBCA templates) and /install ones. With Read Only Oracle Home, this goes to a /homes subdirectory of ORACLE_BASE

[oracle@VM181 18c]$ orabasehome
/u00/app/oracle/homes/O181

As you see, there is an additional subdirectory with the name of the Oracle Home. In my opinion, it is not a good idea to put sqlnet.ora, tnsnames.ora and listener.ora here. It is better to have one common TNS_ADMIN. However, because the default was one directory per Oracle Home, the Read Only Oracle Home feature had to keep this possibility. In 12.2 an ORACLE_HOME/env.ora was introduced to set TNS_ADMIN in a consistent way.

With Read Only Oracle Home enabled, I strace-ed a ‘startup’ to show which files are read:

[oracle@VM181 18c]$ ORACLE_SID=CDB18 strace -e trace=file -f sqlplus / as sysdba <<&1 | grep /u00
...
open("/u00/app/oracle/homes/O181/network/admin/oraaccess.xml", O_RDONLY) = -1 ENOENT (No such file or directory)
access("/u00/app/oracle/homes/O181/network/admin/oraaccess.xml", F_OK) = -1 ENOENT (No such file or directory)
open("/u00/app/oracle/homes/O181/network/admin/oraaccess.xml", O_RDONLY) = -1 ENOENT (No such file or directory)
access("/u00/app/oracle/homes/O181/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
...
access("/u00/app/oracle/homes/O181/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u00/app/oracle/homes/O181/network/admin/intchg.ora", F_OK) = -1 ENOENT (No such file or directory)
access("/u00/app/oracle/homes/O181/network/admin/tnsnav.ora", F_OK) = -1 ENOENT (No such file or directory)
...
open("/u00/app/oracle/dbs/cm_CDB18.dat", O_RDONLY|O_SYNC) = -1 ENOENT (No such file or directory)
[pid 15339] access("/u00/app/oracle/homes/O181/network/admin/sqlnet.ora", F_OK) = -1 ENOENT (No such file or directory)
...
[pid 15339] stat("/u00/app/oracle/dbs/spfileCDB18.ora", 0x7ffe6a5785b8) = -1 ENOENT (No such file or directory)
[pid 15339] open("/u00/app/oracle/dbs", O_RDONLY) = 8
[pid 15339] stat("/u00/app/oracle/dbs/spfileCDB18.ora", 0x7ffe6a578010) = -1 ENOENT (No such file or directory)
[pid 15339] stat("/u00/app/oracle/homes/O181/dbs/spfile.ora", 0x7ffe6a5785b8) = -1 ENOENT (No such file or directory)
[pid 15339] open("/u00/app/oracle/homes/O181/dbs", O_RDONLY) = 8
[pid 15339] stat("/u00/app/oracle/homes/O181/dbs/spfile.ora", 0x7ffe6a578010) = -1 ENOENT (No such file or directory)
[pid 15339] access("/u00/app/oracle/dbs/initCDB18.ora", F_OK) = -1 ENOENT (No such file or directory)
[pid 15339] open("/u00/app/oracle/dbs/initCDB18.ora", O_RDONLY) = -1 ENOENT (No such file or directory)
LRM-00109: could not open parameter file '/u00/app/oracle/dbs/initCDB18.ora'

The files were not there as I’ve not created any database here. The goal is to show that there is no attempt to read any configuration file under ORACLE_HOME.

You can also see that DBCA will search for templates in this new directory:

Capture18c003

I mentioned network and assistant subdirectories. But it concerns all directories where the instance can write files:

[oracle@VM181 18c]$ du $ORACLE_BASE/homes
4 /u01/app/oracle/homes/O181/assistants/dbca/templates
8 /u01/app/oracle/homes/O181/assistants/dbca
12 /u01/app/oracle/homes/O181/assistants
4 /u01/app/oracle/homes/O181/network/trace
4 /u01/app/oracle/homes/O181/network/admin
4 /u01/app/oracle/homes/O181/network/log
16 /u01/app/oracle/homes/O181/network
4 /u01/app/oracle/homes/O181/dbs
4 /u01/app/oracle/homes/O181/install
64 /u01/app/oracle/homes/O181/rdbms/log
72 /u01/app/oracle/homes/O181/rdbms/audit
140 /u01/app/oracle/homes/O181/rdbms
180 /u01/app/oracle/homes/O181
184 /u01/app/oracle/homes

You may wonder why we see a /dbs subdirectory here as the instance configuration files are in the common /u01/app/oracle/dbs. The /dbs is also the current working directory for oracle processes. And this one will be set to ORACLE_BASE/homes/oracle_home_name/dbs.

We can also see /rdbms/log here. I opened a bug 2 years ago about SBTIO.LOG not going to the right place under ADR_HOME, but going to ORACLE_HOME/rdbms/log (Bug 23559013 USER_DUMP_DEST VALUE NOT IGNORED EVEN THOUGH DIAGNOSTIC_DEST IS SET). I’ve no idea about the status of the bug, but at least this will not go to Oracle Home anymore. Even if you don’t really have the need to have a Read Only Oracle Home, this feature is a good way to ensure that it will not grow and fill the filesystem.

Enable Read Only Oracle Home

You enable this feature with ‘roohctl -enable’ after software installation and before any creation of databases or listeners:

[oracle@VM181 18c]$ roohctl -enable
Enabling Read-Only Oracle home.
Update orabasetab file to enable Read-Only Oracle home.
Orabasetab file has been updated successfully.
Create bootstrap directories for Read-Only Oracle home.
Bootstrap directories have been created successfully.
Bootstrap files have been processed successfully.
Read-Only Oracle home has been enabled successfully.
Check the log file /u01/app/oracle/cfgtoollogs/roohctl/roohctl-180217PM111551.log.

If the utility tool finds an existing database or listener related to this Oracle Home, it will return this kind of error:

Cannot enable Read-Only Oracle home in a configured Oracle home.
The Oracle Home is configured with databases 'CDB18'.
The Oracle Home is configured with listeners 'LISTENER'.

There is an undocumented ‘-force’ parameter to add to ‘roohctl -enable’ which can proceed anyway, but it will not move the configuration files.

I have not tested all possibilities because the General Availability of 18c is currently limited to Exadata and Oracle Public Cloud. But it seems that this roohctl will work the same on Windows (with /database instead of /dbs and with registry settings instead of orabasetab) and with Grid Infrastructure (there’s a -nodeList argument).

I mentioned above that the ORACLE_HOME/install goes to $(orabasehome)/install. I don’t know which files go there when ROOH is enabled. The orabasetab remains under ORACLE_HOME, of course. And some logs, such as re-running root.sh, still go to ORACLE_HOME/install:

[oracle@VM181 ~]$ sudo $ORACLE_HOME/root.sh
Check /u01/app/oracle/product/181/install/root_VM181_2018-02-18_19-06-23-833474515.log for the output of root script

This looks strange, but remember that the idea of a Read Only Oracle Home is to ship it after all changes are done. If you have something to change (patch, re-link, …) that will go to another Oracle Home. Maybe cloned from the other, then made Read Only after the changes.

?/

Do you use the question mark as a shortcut to ORACLE_HOME? This does not change and remains the ORACLE_HOME:

[oracle@VM181 ~]$ sqlplus / as sysdba
 
SQL*Plus: Release 18.0.0.0.0 Production on Sun Feb 18 20:26:33 2018
Version 18.1.0.0.0
Copyright (c) 1982, 2017, Oracle. All rights reserved.
Connected to an idle instance.
SQL> start ?
SP2-0310: unable to open file "/u01/app/oracle/product/181.sql"
SQL> exit
Disconnected

This is ok as I mostly use it to read files from the software distribution (such as ?/rdbms/admin/awrrpt)

If you use it in database configuration files, then be careful. Here I have enabled ROOH and defined a pfile mentioning the spfile with the ‘?’ shortcut

[oracle@VM181 ~]$ orabaseconfig
/u00/app/oracle
[oracle@VM181 ~]$ cat $(orabaseconfig)/dbs/init$ORACLE_SID.ora
spfile=?/dbs/xxxx

However, the ‘?’ is resolved to ORACLE_HOME and not Oracle Base Config:

[oracle@VM181 ~]$ strace -f sqlplus / as sysdba <<&1 | grep xxx
[pid 1898] read(10, "spfile=?/dbs/xxx\n", 4096) = 17
[pid 1898] stat("/u01/app/oracle/product/181/dbs/xxx", 0x7ffc5ac1c758) = -1 ENOENT (No such file or directory)
[pid 1898] stat("/u01/app/oracle/product/181/dbs/xxx", 0x7ffc5ac1c1b0) = -1 ENOENT (No such file or directory)
ORA-01565: error in identifying file '?/dbs/xxx'

So what?

Having a read-only Oracle Home, or at least be sure that you have no files written into it, is a good idea. Easier to manage space. Easier to deploy by cloning Oracle Home. Good practice to separate big software directory from small configuration files. And to have the current working directory outside of that. Having configuration files at the same place as the software is always a bad idea (and reminds me the .ini files in C:\WIN directory a long time ago). So, even if it is not enabled by default, Read Only Oracle Home is the way to go.

I think the risks are very limited once tested, as it is just changing the directories and any problem can be worked around with symbolic links on directories. However, this may change some habits and scripts. Not finding the right configuration file in a stressful situation may be annoying.

So, don’t wait, and even in 12c, you can change your habits and replace all references to ${ORACLE_HOME}/dbs by $(orabaseconfig)/dbs and other ${ORACLE_HOME} to $(orabasehome). In 12c they will go to the same ORACLE_HOME. And they you will be ready to enable ROOH in 18c.

 

Cet article 18c Read Only Oracle Home est apparu en premier sur Blog dbi services.

Update to RANDOM_NINJA. Enables you to create even more random production like test data using pl/sql.

Had a chance to finish the data generators that I had on my list for release 1.4 of
RANDOM_NINJA
. New data domain additions include random science data, random game data, investment data and medical data.

Read below for a full list of the current functionality. More will be added for release 1.5.