Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Redo Dumps

A thread started on the Oracle-L list-server a few days ago asking for help analysing a problem where a simple “insert values()” (that handled millions of rows per day) was running very slowly. There are many reasons why this might happen, ranging from the trivial (someone has locked the table in exclusive mode), through the slightly subtle (we’re trying to insert a row that collides on a uniqueness constraint with an uncommitted insert from another session) to the subtle (Oracle has to read through the undo to check current versions of blocks against read-consistent versions) ending up at the esoteric (the ASSM space management blocks are completely messed up again).

A 10046 trace of a session doing an insert showed only that there was a lot of time spent on single block reads. Unfortunately, since this was on an Exadata system the waits were reported as “cell single block physical read”. Unfortunately the parameters to this wait event are “cellhash#”, “diskhash#”, and “bytes” and we don’t see the file_id, block_id which can be very helpful for a case like this. The only information we got from the trace file was that the object_id was for the table were rows were being inserted.

Before digging into exotic debugging methods, the OP supplied us with a 1-second session report from Tanel Poder’s snapper script:


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  SID @INST, USERNAME  , TYPE, STATISTIC                                                 ,         DELTA, HDELTA/SEC,    %TIME, GRAPH       , NUM_WAITS,  WAITS/SEC,   AVERAGES
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   1070  @2, SYSADMIN, STAT, session logical reads                                     ,         13865,      7.73k,         ,             ,          ,           ,      14.1k total buffer visits
>   1070  @2, SYSADMIN, STAT, user I/O wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait time                                        ,           141,      78.65,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, non-idle wait count                                       ,         12230,      6.82k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total IO requests                           ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read requests optimized                          ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes optimized                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read total bytes                                 ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, cell physical IO interconnect bytes                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gcs messages sent                                         ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets                                             ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache                                  ,         13860,      7.73k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block gets from cache (fastpath)                       ,          7737,      4.32k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets                                           ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets from cache                                ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin                                       ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, consistent gets pin (fastpath)                            ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, logical read bytes from cache                             ,     113541120,     63.34M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads                                            ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical reads cache                                      ,          6111,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, physical read IO requests                                 ,          6112,      3.41k,         ,             ,          ,           ,      8.19k bytes per request
>   1070  @2, SYSADMIN, STAT, physical read bytes                                       ,      50069504,     27.93M,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, db block changes                                          ,            11,       6.14,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer requested                                     ,          6112,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, hot buffers moved to head of LRU                          ,           958,     534.39,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, free buffer inspected                                     ,          6144,      3.43k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, shared hash latch upgrades - no wait                      ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, blocks decrypted                                          ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo entries                                              ,          6120,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size                                                 ,        465504,    259.67k,         ,             ,          ,           ,          ~ bytes per user commit
>   1070  @2, SYSADMIN, STAT, redo entries for lost write detection                     ,          6110,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo size for lost write detection                        ,        464756,    259.25k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, redo subscn max counts                                    ,             7,        3.9,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, file io wait time                                         ,       1408659,    785.78k,         ,             ,          ,           ,   230.47us bad guess of IO wait time per IO request
>   1070  @2, SYSADMIN, STAT, gc current blocks received                                ,             3,       1.67,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, gc local grants                                           ,          6116,      3.41k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM cbk:blocks examined                                  ,         12366,       6.9k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L1 bitmaps examined                              ,          2478,      1.38k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:L2 bitmaps examined                              ,             1,        .56,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, ASSM gsp:reject db                                        ,         12388,      6.91k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, STAT, buffer is pinned count                                    ,           230,      128.3,         ,             ,          ,           ,       1.63 % buffer gets avoided thanks to buffer pin caching
>   1070  @2, SYSADMIN, STAT, cell flash cache read hits                                ,          6723,      3.75k,         ,             ,          ,           ,          ~ per execution
>   1070  @2, SYSADMIN, TIME, background cpu time                                       ,        365192,   203.71ms,    20.4%, [##        ],          ,           ,
>   1070  @2, SYSADMIN, TIME, background elapsed time                                   ,       1273623,   710.45ms,    71.0%, [########  ],          ,           ,      28.95 % unaccounted time
>   1070  @2, SYSADMIN, WAIT, gc current block busy                                     ,           629,   350.87us,      .0%, [          ],         3,       1.67,   209.67us average wait
>   1070  @2, SYSADMIN, WAIT, cell single block physical read                           ,       1557638,   868.88ms,    86.9%, [WWWWWWWWW ],      6746,      3.76k,    230.9us average wait
>  
> --  End of Stats snap 1, end=2019-05-18 12:58:58, seconds=1.8

My first step was simply to read down the list (using a very small font to get the entire width on screen without wrapping) to see if anything stood out as unusual. The report showed two things I rarely see in the session stats:


blocks decrypted                                          ,          6110
redo entries for lost write detection                     ,          6110

These stats tell me that there are two “uncommon” features enabled: db_lost_write_protect, and block level encryption. (So whatever else is going on it’s just possible that mixing in two rarely used – and therefore less frequently tested – features may be confusing the issue.

Lost write protection means Oracle writes a “block read record” (BRR) to the redo log every time it reads a block from disc, so I decided to follow up the 6,110 figure to see what other stats reported similar values.


physical read total IO requests                           ,          6112
physical read requests optimized                          ,          6111
physical reads                                            ,          6111
physical reads cache                                      ,          6111
physical read IO requests                                 ,          6112
free buffer requested                                     ,          6112
redo entries                                              ,          6120
redo entries for lost write detection                     ,          6110
gc local grants                                           ,          6116
cell flash cache read hits                                ,          6723

There’s nothing particularly surprising here – basically we see all the blocks being read as single block reads, into cache. All the necessary global cache (gc) grants are local so it’s possible the table of interest has been remastered to this node. The value for “cell flash cache read hits” look a little odd as the cache is hit more frequently than blocks are read – but dynamic performance views are not read-consistent and this session is hammering away like crazy so this might just be a side effect of the time to gather the data for the report.

We can chase the redo a little further – the number of redo entries is slightly larger than the number of blocks read, so (even though small inconsistencies are not necessarily meaningful) this might tell us something:


redo entries                                              ,          6120
redo size                                                 ,        465504
redo entries for lost write detection                     ,          6110
redo size for lost write detection                        ,        464756 
db block changes                                          ,            11

The number of “redo entries” that were NOT for lost write detection is 10, totalling 748 bytes (not a lot – so indicative of “non-user” activity). The number of “db block changes” is 11 (close enough to 10), and generally it’s changes to db blocks that require redo to be generated. The final significant number is the one that isn’t there – there’s no undo generated, so no user-change to data. This system is working like crazy achieving absolutely nothing at this point.

The next point to ponder is what sort of work it is doing – so let’s check how the physical reads turn into buffer gets.


session logical reads                                     ,         13865
db block gets                                             ,         13860
db block gets from cache                                  ,         13860
db block gets from cache (fastpath)                       ,          7737
consistent gets                                           ,             1
consistent gets from cache                                ,             1
consistent gets pin                                       ,             1
consistent gets pin (fastpath)                            ,             1
hot buffers moved to head of LRU                          ,           958

buffer is pinned count                                    ,           230 

The unusual thing you notice with these figures is that virtually every buffer get is a current get. We’ve also got a number of blocks pinned – this might just be the segment header block, or the segment header and level 2 bitmap block that we keep revisiting. Finally we can see a lot of hot buffers being moved to the head of the LRU; since our session has been doing a lot of work for a long time it seems likely that those buffers are ones that our session is keeping hot – and for a big insert that shouldn’t really be happening unless, perhaps, we were managing to do a lot of maintenance of (well-clustered) indexes.

I’ve isolated the (new in 12.2) “ASSM gsp (get space)” statistics from this output – they’re all about handling blocks, but I wanted to look at them without being distracted by other stats.


ASSM cbk:blocks examined                                  ,         12366 
ASSM gsp:L1 bitmaps examined                              ,          2478 
ASSM gsp:L2 bitmaps examined                              ,             1 
ASSM gsp:reject db                                        ,         12388

We can see that we’ve examined 2,478 “level 1” bitmap blocks. A level 1 block holds the basic “bitmap” that records the state of a number of data blocks (typically 128 blocks once the object gets very large) so our session has worked its way through 2,478 maps trying to find a data block that it could use to insert a row. The “reject db” statistic tells us about data blocks that have been examined and rejected (presumably because the row we want to insert is too large to fit, or maybe because there are no free ITL (interested transaction list) entries available in the block). So we seem to be spending all our time searching for somewhere to insert rows. This shouldn’t really be happening – it’s a type of problem that Oracle has been worrying away at for quite some time: how do you avoid “losing” space by updating bitmap blocks too soon on inserts without going to the opposite extreme and leaving bitmap blocks that claim the space is free when it’s in use by uncommitted transactions.

Note to self: I don’t know how we managed to reject more blocks (12,388) than we’ve examined (12,366) but possibly it’s just one of those timing glitches (the error is less than one fifth of one percent) , possibly it’s something to do with the reject count including some of the L1 bitmap blocks.

Clearly there’s something funny going on with space management – and we need to look at a few blocks that are exhibiting the problems. But how do we find a few suitable blocks? And that’s where, finally, we get to the title of the piece.

We are in the lucky position of having “lost write protection” enabled – so the redo log file will hold lots of “block read records”. We can’t get the file and block addresses we need from the “cell physical read” wait events so let’s ask the redo log to supply them. We just have to pick a log file (online or archived) and tell Oracle to dump some of it – and we can probably get away with a fairly small dump since we want just a single type of redo record over a short period of time. Here’s an example showing the format of two slightly different commands you could execute:


alter system dump logfile '/u01/app/oracle/oradata/orcl12c/redo03.log'
        rba min 2781    1
        rba max 2781    1000
        layer 23 opcode 2
;

alter system dump redo 
        scn min 19859991 scn max 19964263 
        layer 23 opcode 2
;

The first command is to dump a log file by name – but you may have to fiddle around a bit to find the names of an archived log file because if you choose this option you need to know the sequence number (sequence# in v$log_history) of the file if you want to restrict the size of the dump. The second command simply dumps redo for (in this example) an SCN range – and it’s easy to query v$log_history to find dates, times, and SCN ranges – Oracle will work out for itself which file it has to access. In both cases I’ve restricted the dump to just those redo records that contain change vectors of type BRR (block read records) which is what the layer 23 opcode 2 line is about.

Here’s an example of a redo record that contains nothing but a single BRR. (It’s from a single-block read, a multi-block read would produce a redo record with multiple change vectors, one vector for each block read.)


REDO RECORD - Thread:1 RBA: 0x000add.00000019.01b0 LEN: 0x004c VLD: 0x10 CON_UID: 2846920952
SCN: 0x00000000025a7c13 SUBSCN:  1 05/23/2019 10:42:51
CHANGE #1 CON_ID:3 TYP:2 CLS:6 AFN:9 DBA:0x00407930 OBJ:40 SCN:0x00000000001a1e2a SEQ:2 OP:23.2 ENC:0 RBL:0 FLG:0x0000
 Block Read - afn: 9 rdba: 0x00407930 BFT:(1024,4225328) non-BFT:(1,31024)
              scn: 0x00000000001a1e2a seq: 0x02
              flags: 0x00000006 ( dlog ckval )
              where: qeilwh05: qeilbk

If you’re wondering about the two interpretations of the rdba (relative datablock address), one is for BFTs (big file tablespaces) and one for non-BFTs. The other thing you’ll notice about the interpretations is that neither file number (1024 or 1) matches the afn (absolute file number). In smaller, non-CDB databases you will probably find that the afn matches the file number in the non-BFT interpretation, but I happen to be testing on a PDB and the first file in my SYSTEM tablespace happens to be the 9th file created in the CDB – connecting as SYS in my PDB I can compare the absolute and “relative” file number very easily:


SQL> select file#, rfile#, name from v$datafile;

     FILE#     RFILE# NAME
---------- ---------- ----------------------------------------------------------------
         9          1 /u01/app/oracle/oradata/orcl12c/orcl/system01.dbf
        10          4 /u01/app/oracle/oradata/orcl12c/orcl/sysaux01.dbf
        11          9 /u01/app/oracle/oradata/orcl12c/orcl/undotbs01.dbf
        12         12 /u01/app/oracle/oradata/orcl12c/orcl/users01.dbf
        13         13 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1991375173370654.dbf
        14         14 /u01/app/oracle/oradata/orcl12c/orcl/APEX_1993195660370985.dbf
        22         22 /u01/app/oracle/oradata/orcl12c/orcl/test_8k_assm.dbf
        23         23 /u01/app/oracle/oradata/orcl12c/orcl/test_8k.dbf

8 rows selected.

For bigfile tablespaces the “relative” file number is a complete fake and simply reports 1024 – you’re only allowed one file in a bigfile tablespace, so there is no “relativity” involved. (Unless you’re working at CERN and storing data about particle collisions in the LHC.)

The key point to remember when reading BRRs then, is that you should take the file number from the afn and the block number from the (appropriate) interpretation of the rdba. For the example above I would issue: “alter database dump datafile 9 block 31024;”

Finally

The originator of the thread hasn’t yet made any public response to the suggestion of dumping and reviewing blocks – possibly they’ve started a private conversation with Stefan Koehler who had suggested a strategy that examined function calls rather than block contents – so we’re unable to do any further analysis on what’s going on behind the scenes.

What we would be looking for is any indication that Oracle is repeatedly re-reading the same bitmap blocks and the same data blocks (by a simple check of block addresses); and if that is the case we would want to get some clue about why that might be happening by examining the contents of the data blocks that are subject to repeated reads without changing their status in the bitmap from “space available” to “full”.  As it is we just have to wait for the OP to tell us if they’ve made any further progress.

 

PeopleTools Table Reference Generator

Like many other PeopleSoft professionals, I spend a lot of time looking at the PeopleTools tables because they contain meta-data about the PeopleSoft application. Much of the application is stored in PeopleTools tables. Some provide information about the Data Model. Many of my utility scripts reference the PeopleTools tables, some of them update them too. Therefore, it is very helpful to be able to understand what is in these tables. In PeopleSoft for the Oracle DBA, I discussed some tables that are of regular interest. I included the tables that correspond to the database catalogue and that are used during the PeopleSoft login procedure. The tables that are maintained by the process scheduler are valuable because they contain information about who ran what process when, and how long they ran for.
I am not the only person to have started to document the PeopleTools tables on their website or blog, most people have picked a few tables that are of particular interest. I wanted to produce a complete reference.  However, with over 4000 PeopleTools tables and views, it is simply not viable to do even a significant number of them manually.  So, I wrote some SQL and PL/SQL to dynamically generate a page for each PeopleTools table and views and put the generated pages on my website.  If you use Google to search for a PeopleTools name you will probably find the reference page.
I have now revisited that code and made a number of changes and improvements.

  • I have used a later version of PeopleTools to generate the reference on my website.  The list of PeopleTools tables is no longer defined in PeopleTools by object security, so I have used an independent and somewhat broader definition: Table or View records that are either owned by PPT or whose SQLTABLENAME is the same as the record name.
  • There is much less reliance on static pages.  There are now only 3 such pages everything else is generated.  Instead, additional data is loaded from static scripts into the PLAN_TABLE that should always be present in an Oracle database and so it doesn't have to be created.  It should be a global temporary table so there is no problem with debris being left behind or interference with other processes.  That data is then combined with data in the PeopleTools tables during the generation process.
  • The utility has been rewritten as a PL/SQL package that should be created in the PeopleSoft Owner ID schema (usually SYSADM).
  • The generated HTML is simply tidier, and more consistent.

The source is available on GitHub at https://github.com/davidkurtz/PTRef, so you can download and generate your own reference on your own current version of PeopleTools. An example of the generated output can be found on my website.
The idea of the original PeopleTools tables reference was the people could contribute additional descriptions and information that were not in the PeopleTools tables.  That can still happen by making changes to the scripts that load the additional data and uploading new versions to GitHub.

CPU percent

A recent post on the ODC General Database forum asked for an explanation of the AWR report values “%Total CPU” and “%Busy CPU” under the “Instance CPU” label, and how the “%Busy CPU “ could be greater than 100%.  Here’s a text reproduction of the relevant sample supplied:

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
2 2 1 0.30 1.23 10.7 5.6 5.3 77.7

Instance CPU

%Total CPU %Busy CPU %DB Time waiting for CPU (Resource Manager)
29.8 133.8 0.0

The answer is probably “It’s 12.1 and it’s a programmer error”.

  • Note that the Host CPU %Idle is not consistent with the three usage figures:  10.7 + 5.6 + 5.3 = 21.6 whereas 100 – 77.7 = 22.3.
  • So let’s run with 22.3% and see what else we can notice: 29.8 / 22.3 = 1.3363 – that’s pretty close (when expressed as a percentage) to 133.8%

Hypothesis:

Someone did the division the wrong way round when trying to work out the percentage of the host’s non-idle CPU that could be attributed to the instance. In this example the “%Busy CPU” should actually report 100 * 22.3 / 29.8 = 74.8%

Note – the difference between 133.8 and 133.63 can be attributed to the fact that the various figures reported in this bit of the AWR are rounded to the nearest 1 decimal place.

Note 2 – I don’t think this error is present in 11.2.0.4 or 12.2.0.1

 

 

 

 

 

Learning Kubernetes: persistent storage with Minikube

As part of my talk at (the absolutely amazing) Riga Dev Days 2019 I deployed Oracle Restful Data Services (ORDS) in Minikube as my application’s endpoint. I’ll blog about deploying ORDS 19 in docker and running it on Kubernetes later, but before I can do so I want to briefly touch about persistent storage in Minikube because I saw it as a pre-requisite.

In a previous post I wrote about using Minikube as a runtime environment for learning Kubernetes. This post uses the same versions as before – Docker 18.09 and Kubernetes 1.14 are invoked by Minikube 1.1.0.

Quick & dirty intro to storage concepts for Kubernetes

Container storage is – at least to my understanding – ephemeral. So, in other words, if you delete the container, and create a new one, locally stored data is gone. In “standalone” Docker, you can use so-called volumes to store things more permanently. A similar concept exists in Kubernetes in form of a persistent volume (PV) and a corresponding persistent volume claim (PVC).

In a nutshell, and greatly simplified, a persistent volume is a piece of persistent storage pre-allocated on a Kubernetes cluster that can be “claimed”. Hence the names …

There is a lot more to say about persistent data in Kubernetes, and others have already done that, so you won’t find an explanation of storage concepts in Kubernetes here. Please head over to the documentation set for more details.

Be careful though, as the use of persistent volumes and persistent volume claims is quite an in-depth topic, especially outside the developer-scoped Minikube. I specifically won’t touch on the subject of persistent storage outside of Minikube, I’d rather save that for a later post.

Persistent Volumes for my configuration files

ORDS requires you to run an installation routine first, in the cause of which it will create a directory containing its runtime configuration. The shell script I’m using to initialise my container checks for the existence of the configuration directory and skips the installation step if it finds one. It proceeds straight to starting the Tomcat container. This is primarily done to speed the startup process up. If I were not to use the PV/PVC combination the pods in my deployment would have to run the installation each time they start up, something I wanted to avoid.

A less complex example please

I didn’t want to make things more complicated than necessary, so instead I’ll use a much simpler example before writing up on my ORDS deployment. My example is based on the official Ubuntu image, and I’m writing a “heartbeat” file into the mounted volume and see if I can find it on the underlying infrastructure.

The Minikube documentation informs us that Minikube preserves data in /data and a few other locations. Anything you try to put elsewhere will be lost. With that piece of information at hand I proceeded with the storage creation.

Experienced Minikube users might point out at this stage that pre-creating a volume isn’t needed as Minikube supports dynamic volume creation. From what I can tell that’s correct, but not what I chose to do.

Creating a persistent volume

Based on the previously mentioned documentation I created a persistent volume and accompanying volume claim like this:

$ cat persistent-volumes.yml 

kind: PersistentVolume
apiVersion: v1
metadata:
  name: research-vol
  labels:
    type: local
spec:
  capacity:
    storage: 2Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/research-vol"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: research-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ""
  volumeName: research-vol
  resources:
    requests:
      storage: 1Gi

This approach is very specific to Minikube because I’m using persistent volumes of type HostPath.

Translated to plain English this means that I’m creating a 2 GB volume pointing to /data/research-vol on my Minikube system. And I’m asking for 1 GB in my persistent volume claim. The access mode (ReadWriteOnce) seems to be related to mounting the volume (concurrently) on multi-node clusters. Or it’s a bug because I successfully wrote to a single PVC from multiple pods as you can see in a bit… In my opinion the documentation wasn’t particularly clear on the subject.

Build the Docker image

Before I can deploy my example application to Minikube, I need to build a Docker image first. This is really boring, but I’ll show it here for the sake of completeness:

$ cat Dockerfile 
FROM ubuntu
COPY run.sh /usr/local/bin
RUN chmod +x /usr/local/bin/run.sh
ENTRYPOINT ["/usr/local/bin/run.sh"]

The shell script I’m running is shown here:

 $ cat run.sh 
 #!/usr/bin/env bash

 set -euxo pipefail

 if [[ ! -d $VOLUME ]]; then
   /bin/echo ERR: cannot find the volume $VOLUME, exiting
   exit 1
 fi

 while true; do
   /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
   /bin/sleep 10
 done

As you can see it is just an infinite loop writing heartbeat files into the directory indicated by the VOLUME environment variable. The image is available to Minikube after building it. Refer to my previous post on how to build Docker images for Minikube, or have a look at the documentation. I built the image as “research:v2”

Deploy

With the foundation laid, I can move on to defining the deployment. I’ll do that again with the help of a YAML file:

 $ cat research-deployment.yml 
 apiVersion: extensions/v1beta1
 kind: Deployment
 metadata:
   labels:
     app: research
   name: research
 spec:
   replicas: 2
   selector:
     matchLabels:
       app: research
   template:
     metadata:
       labels:
         app: research
     spec:
       containers:
       - image: research:v2
         env:
         - name: VOLUME
           value: "/var/log/heartbeat"
         name: research
         volumeMounts:
         - mountPath: "/var/log/heartbeat"
           name: research-vol
       volumes:
       - name: research-vol
         persistentVolumeClaim: 
           claimName: research-pvc
       restartPolicy: Always

If you haven’t previously worked with Kubernetes this might look daunting, but it isn’t actually. I’m asking for the deployment of 2 copies (“replicas”) of my research:v2 image. I am passing an environment variable – VOLUME – to the container image. It contains the path to the persistent volume’s mount point. I’m also mounting a PV named research-vol as /var/log/heartbeat/ in each container. This volume in the container scope is based on the definition found in the volumes: section of the YAML file. It’s important to match the information in volumes and volumeMounts.

Running

After a quick kubectl apply -f research-deployment.yml I have 2 pods happily writing to the persistent volume.

 $ kubectl get pods
 NAME                       READY   STATUS    RESTARTS   AGE
 research-f6668c975-98684   1/1     Running   0          6m22s
 research-f6668c975-z6pll   1/1     Running   0          6m22s

The directory exists as specified on the Minikube system. If you used the VirtualBox driver, use minikube ssh to access the VM and change to the PV’s directory:

# pwd
/data/research-vol
# ls -l *research-f6668c975-z6pll* | wc -l
53
# ls -l *research-f6668c975-98684* | wc -l
53

As you can see, both pods are able to write to the directory. The volume’s contents were preserved when I deleted the deployment:

$ kubectl delete deploy/research
deployment.extensions "research" deleted
$ kubectl get pods
No resources found.

This demonstrates that both pods from my deployment are gone. What does it look like back on the Minikube VM?

# ls -l *research-f6668c975-z6pll* | wc -l
63
# ls -l *research-f6668c975-98684* | wc -l
63

You can see that files are still present. The persistent volume lives up to its name.

Next I wanted to see if (new) Pods pick up what’s in the PV. After a short modification to the shell script and a quick docker build followed by a modification of the deployment to use research:v3, the container prints the number of files:

#!/usr/bin/env bash

set -euxo pipefail

if [[ ! -d $VOLUME ]]; then
  /bin/echo ERR: cannot find the volume $VOLUME, exiting
  exit 1
fi

/bin/echo INFO: found $(ls -l $VOLUME/* | wc -l) files in the volume

while true; do
  /usr/bin/touch ${VOLUME}/$(hostname)-$(date +%Y%m%d-%H%M%S)
  /bin/sleep 10
done

As proven by the logs:

 $ kubectl get pods
 NAME                        READY   STATUS        RESTARTS   AGE
 research-556cc9989c-bwzkg   1/1     Running       0          13s
 research-556cc9989c-chg76   1/1     Running       0          13s

 $ kubectl logs research-556cc9989c-bwzkg
 [[ ! -d /var/log/heartbeat ]]
 ++ wc -l
 ++ ls -l /var/log/heartbeat/research-7d4c7c8dc8-5xrtr-20190606-105014 
 ( a lot of output not shown )
 /bin/echo INFO: found 166 files in the volume
 INFO: found 166 files in the volume
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105331
 /bin/sleep 10
 true
 ++ hostname
 ++ date +%Y%m%d-%H%M%S
 /usr/bin/touch /var/log/heartbeat/research-556cc9989c-bwzkg-20190606-105341
 /bin/sleep 10 

It appears as if I successfully created and used persistent volumes in Minikube.

PS360 enhancement: Added report of DDL models

I have written several blogs and presentations recently about how and how not to collect statistics in PeopleSoft.

  • Managing Cost-Based Optimizer Statistics for PeopleSoft recommends
    • If you are going to continue to use DBMS_STATS in the DDL model then
      • Do not specify ESTIMATE_PERCENT because it disables the hash-based number-of-distinct-values calculation, forcing it to go back to the COUNT(DISTINCT ...) method that requires a sort, and may not produce accurate values because it only samples data.
      • Do not specify METHOD_OPT='FOR ALL INDEXED COLUMNS SIZE 1' because this will not collect histograms on indexed columns, and will not update column statistics on unindexed columns.
      • Do specify FORCE=>TRUE so that you can lock and delete statistics on temporary records.
    • However, the alternative is to use GFCSTATS11 package to collects these statistics.  This package is controlled by a metadata table so you can defined statistics collection behaviour for specific records.
  • How Not to Collect Optimizer Statistics in an Application Engine Program
    • This blog explains why you should not explicitly code DBMS_STATS calls into Application Engine programs.

This has prompted me to add a new report to the PS360 utility that simply reports the various DDL models for Oracle.  Thus during a health check, I can see how statistics are collected during batch processes.

PS360 can be download from https://github.com/davidkurtz/ps360.

Scalar Subquery Costing

A question came up on Oracle-l list-server a few days ago about how Oracle calculates costs for a scalar subquery in the select list. The question included an example to explain the point of the question. I’ve reproduced the test below, with the output from an 18.3 test system. The numbers don’t match the numbers produced in the original posting but they are consistent with the general appearance.

rem
rem     Script:         ssq_costing.sql
rem     Author:         Jonathan Lewis
rem     Dated:          May 2019
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t_1k ( n1 integer ) ;
create table t_100k ( n1 integer ) ;

insert into t_1k
  select
         level
    from dual
    connect by level <= 1e3;

insert into t_100k
  select level
    from dual
    connect by level <= 1e5;

commit ;

begin
  dbms_stats.gather_table_stats ( null, 'T_1K') ;
  dbms_stats.gather_table_stats ( null, 'T_100K') ;
end ;
/

explain plan for
select 
        /*+ qb_name(QB_MAIN) */
        (
        select /*+ qb_name(QB_SUBQ) */ count(*)
        from t_1k
        where t_1k.n1 = t_100k.n1
        )
from t_100k
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |   100K|   488K|  1533K  (2)| 00:01:00 |
|   1 |  SORT AGGREGATE    |        |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL| T_1K   |     1 |     4 |    17   (0)| 00:00:01 |
|   3 |  TABLE ACCESS FULL | T_100K |   100K|   488K|    36   (9)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T_1K"."N1"=:B1)

The key point to note is this – the scalar subquery has to execute 100,000 times because that’s the number of rows in the driving table. The cost for executing the scalar subquery once is 17 – so the total cost of the query should be 1,700,036 – not 1,533K (and for execution plans the K means x1000, not x1024). There’s always room for rounding errors, of course, but a check of the 10053 (CBO trace) file shows the numbers to be 17.216612 for the t_1k tablescan, 36.356072 for the t_100K tablescan, and 1533646.216412 for the whole query. So how is Oracle managing to get a cost that looks lower than it ought to be?

There’s plenty of scope for experimenting to see how the numbers change – and my first thought was simply to see what happens as you change the number of distinct values in the t_100K.n1 column. It would be rather tedious to go through the process of modifying the data a few hundred times to see what happens, so I took advantage of the get_column_stats() and set_column_stats() procedures in the dbms_stats package to create a PL/SQL loop that faked a number of different scenarios that lied about the actual table data.


delete from plan_table;
commit;

declare

        srec                    dbms_stats.statrec;
        n_array                 dbms_stats.numarray;

        m_distcnt               number;
        m_density               number;
        m_nullcnt               number;
        m_avgclen               number;


begin

        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 't_100k',
                colname         => 'n1', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => srec,
                avgclen         => m_avgclen
        ); 

        for i in 1 .. 20 loop

                m_distcnt := 1000 * i;
                m_density := 1/m_distcnt;

                dbms_stats.set_column_stats(
                        ownname         => user,
                        tabname         => 't_100k',
                        colname         => 'n1', 
                        distcnt         => m_distcnt,
                        density         => m_density,
                        nullcnt         => m_nullcnt,
                        srec            => srec,
                        avgclen         => m_avgclen
                ); 


        execute immediate
        '
                explain plan set statement_id = ''' || m_distcnt || 
        '''
                for
                select
                        /*+ qb_name(QB_MAIN) */
                        (
                        select /*+ qb_name(QB_SUBQ) */ count(*)
                        from t_1k
                        where t_1k.n1 = t_100k.n1
                        )
                from t_100k
        ';
        
        end loop;       

end;
/

The code is straightforward. I’ve declared a few variables to hold the column stats from the t_100k.n1 column, called get_column stats(), then looped 20 times through a process that changes the number of distinct values (and corresponding density) recorded in the column stats, then used execute immediate to call “explain plan” for the original query.

You’ll notice I’ve given each plan a separate statement_id that corresponds to the num_distinct that generated the plan. In the code above I’ve changed the num_distinct from 1,000 to 20,000 in steps of 1,000.

Once the PL/SQL block ends I’ll have a plan table with 20 execution plans stored in it and, rather than reporting those plans with calls to dbms_xplan.display(), I’m going to be selective about which rows and columns I report.

select
        statement_id, 
        io_cost,
        io_cost - lag(io_cost,1) over (order by to_number(statement_id)) io_diff,
        cpu_cost,
        cpu_cost - lag(cpu_cost,1) over (order by to_number(statement_id)) cpu_diff,
        cost
from 
        plan_table
where 
        id = 0
order by 
        to_number(statement_id)
;

I’ve picked id = 0 (the top line of the plan) for each statement_id and I’ve reported the cost column, which is made up of the io_cost column plus a scaled down value of the cpu_cost column. I’ve also used the analytic lag() function to calculate how much the io_cost and cpu_cost changed from the previous statement_id. Here are my results from 18c:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
1000                                17033            1099838920                 17253
2000                                34033      17000 2182897480 1083058560      34470
3000                                51033      17000 3265956040 1083058560      51686
4000                                68033      17000 4349014600 1083058560      68903
5000                                85033      17000 5432073160 1083058560      86119
6000                               102033      17000 6515131720 1083058560     103336
7000                               119033      17000 7598190280 1083058560     120553
8000                               136033      17000 8681248840 1083058560     137769
9000                               153033      17000 9764307400 1083058560     154986
10000                              170033      17000 1.0847E+10 1083058560     172202
11000                              197670      27637 1.2608E+10 1760725019     200191
12000                              338341     140671 2.1570E+10 8962036084     342655
13000                              457370     119029 2.9153E+10 7583261303     463200
14000                              559395     102025 3.5653E+10 6499938259     566525
15000                              647816      88421 4.1287E+10 5633279824     656073
16000                              725185      77369 4.6216E+10 4929119846     734428
17000                              793452      68267 5.0565E+10 4349223394     803565
18000                              854133      60681 5.4431E+10 3865976350     865019
19000                              908427      54294 5.7890E+10 3459031472     920005
20000                              957292      48865 6.1003E+10 3113128324     969492

The first pattern that hits the eye is the constant change of 17,000 in the io_cost in the first few lines of the output. For “small” numbers of distinct values the (IO) cost of the query is (33 + 17 * num_distinct) – in other words, the arithmetic seems to assume that it will execute the query once for each value and then cache the results so that repeated executions for any given value will not be needed. This looks as if the optimizer is trying to match its arithmetic to the “scalar subquery caching” mechanism.

But things change somewhere between 10,000 and 11,000 distinct values. The point comes where adding one more distinct value causes a much bigger jump in cost than 17, and that’s because Oracle assumes it’s reached a point where there’s a value that it won’t have room for in the cache and will have to re-run the subquery multiple times for that value as it scans the rest of the table. Let’s find the exact break point where that happens.

Changing my PL/SQL loop so that we calculate m_distcnt as “19010 + i” this is the output from the final query:


-- m_distcnt := 10910 + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
10911                              185520            1.1834E+10                187887
10912                              185537         17 1.1835E+10    1083059     187904
10913                              185554         17 1.1836E+10    1083058     187921
10914                              185571         17 1.1837E+10    1083059     187938
10915                              185588         17 1.1838E+10    1083058     187956
10916                              185605         17 1.1839E+10    1083059     187973
10917                              185622         17 1.1841E+10    1083059     187990
10918                              185639         17 1.1842E+10    1083058     188007
10919                              185656         17 1.1843E+10    1083059     188025
10920                              185673         17 1.1844E+10    1083058     188042
10921                              185690         17 1.1845E+10    1083059     188059
10922                              185707         17 1.1846E+10    1083058     188076
10923                              185770         63 1.1850E+10    4027171     188140
10924                              185926        156 1.1860E+10    9914184     188298
10925                              186081        155 1.1870E+10    9912370     188455
10926                              186237        156 1.1880E+10    9910555     188613
10927                              186393        156 1.1890E+10    9908741     188770
10928                              186548        155 1.1900E+10    9906928     188928
10929                              186703        155 1.1909E+10    9905114     189085
10930                              186859        156 1.1919E+10    9903302     189243

If we have 10,922 distinct values in the column the optimizer calculates as if it will be able to cache them all; but if we have 10,923 distinct values the optimizer thinks that there’s going to be one value where it can’t cache the result and will have to run the subquery more than once.

Before looking at this in more detail let’s go to the other interesting point – when does the cost stop changing: we can see the cost increasing as the number of distinct values grows, we saw at the start that the cost didn’t seem to get as large as we expected, so there must be a point where it stops increasing before it “ought” to.

I’ll jump straight to the answer: here’s the output from the test when I start num_distinct off at slightly less than half the number of rows in the table:


 -- m_distcnt := (50000 - 10) + i;

STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
49991                             1514281            9.6488E+10               1533579
49992                             1514288          7 9.6489E+10     473357    1533586
49993                             1514296          8 9.6489E+10     473337    1533594
49994                             1514303          7 9.6490E+10     473319    1533601
49995                             1514311          8 9.6490E+10     473299    1533609
49996                             1514318          7 9.6491E+10     473281    1533616
49997                             1514325          7 9.6491E+10     473262    1533624
49998                             1514333          8 9.6492E+10     473243    1533631
49999                             1514340          7 9.6492E+10     473224    1533639
50000                             1514348          8 9.6493E+10     473205    1533646
50001                             1514348          0 9.6493E+10          0    1533646
50002                             1514348          0 9.6493E+10          0    1533646
50003                             1514348          0 9.6493E+10          0    1533646
50004                             1514348          0 9.6493E+10          0    1533646
50005                             1514348          0 9.6493E+10          0    1533646
50006                             1514348          0 9.6493E+10          0    1533646
50007                             1514348          0 9.6493E+10          0    1533646
50008                             1514348          0 9.6493E+10          0    1533646
50009                             1514348          0 9.6493E+10          0    1533646
50010                             1514348          0 9.6493E+10          0    1533646

The cost just stops changing when num_distinct = half the rows in the table.

Formulae

During the course of these experiments I had been exchanging email messages with Nenad Noveljic via the Oracle-L list-server (full monthly archive here) and he came up with the suggesion of a three-part formula that assumed a cache size and gave a cost of

  • “tablescan cost + num_distinct * subquery unit cost” for values of num_distinct up to the cache size;
  • then, for values of num_distinct greater than the cache_size and up to half the size of the table added a marginal cost representing the probability that some values would not be cached;
  • then for values of num_distinct greater than half the number of rows in the table reported the cost associated with num_distinct = half the number of rows in the table.

Hence:

  • for 1 <= num_distinct <= 10922, cost = (33 + num_distinct + 17)
  • for 10,923 <= num_distinct <= 50,000, cost = (33 + 10,922 * 17) + (1 – 10,922/num_distinct) * 100,000 * 17
  • for 50,000 <= num_distinct <= 100,000, cost = cost(50,000).

The middle line needs a little explanation: ( 1-10,922 / num_distinct ) is the probability that a value will not be in the cache; this has to be 100,000 to give the expected number of rows that will not be cached, and then multiplied by 17 as the cost of running the subquery for those rows.

The middle line can be re-arranged as 33 + 17 * (10,922 + (1 – 10,922/num_distinct) * 100,000)

Tweaking

At this point I could modify my code loop to report the calculated value for the cost and compare it with the actual cost to show you that the two values didn’t quite match. Instead I’ll jump forward a little bit to a correction that needs to be made to the formula above. It revolves around how Oracle determines the cache size. There’s a hidden parameter (which I mentioned in CBO Fundamentals) that controls scalar subquery caching. In the book I think I only referenced it in the context of subqueries in the “where” clause. The parameter is “_query_execution_cache_max_size” and has a default value of 131072 (power(2,7)) – so when I found that the initial formula didn’t quite work I made the following observation:

  • 131072 / 10922 = 12.00073
  • 131072 / 12 = 10922.666…

So I put 1092.66667 into the formula to see if that would improve things.

For the code change I added a variable m_cost to the PL/SQL block, and set it inside the loop as follows:

m_cost := round(33 + 17 * (10922.66667 + 100000 * (1 - (10922.66667 / m_distcnt))));

Then in the “execute immediate” I changed the “explain plan” line to read:

explain plan set statement_id = ''' || lpad(m_distcnt,7) || ' - ' || lpad(m_cost,8) ||

This allowed me to show the formula’s prediction of (IO)cost in final output, and here’s what I got for values of num_distinct in the region of 10,922:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  10911 -   183901                 185520            1.1834E+10                187887
  10912 -   184057                 185537         17 1.1835E+10    1083059     187904
  10913 -   184212                 185554         17 1.1836E+10    1083058     187921
  10914 -   184368                 185571         17 1.1837E+10    1083059     187938
  10915 -   184524                 185588         17 1.1838E+10    1083058     187956
  10916 -   184680                 185605         17 1.1839E+10    1083059     187973
  10917 -   184836                 185622         17 1.1841E+10    1083059     187990
  10918 -   184992                 185639         17 1.1842E+10    1083058     188007
  10919 -   185147                 185656         17 1.1843E+10    1083059     188025
  10920 -   185303                 185673         17 1.1844E+10    1083058     188042
  10921 -   185459                 185690         17 1.1845E+10    1083059     188059
  10922 -   185615                 185707         17 1.1846E+10    1083058     188076
  10923 -   185770                 185770         63 1.1850E+10    4027171     188140
  10924 -   185926                 185926        156 1.1860E+10    9914184     188298
  10925 -   186081                 186081        155 1.1870E+10    9912370     188455
  10926 -   186237                 186237        156 1.1880E+10    9910555     188613
  10927 -   186393                 186393        156 1.1890E+10    9908741     188770
  10928 -   186548                 186548        155 1.1900E+10    9906928     188928
  10929 -   186703                 186703        155 1.1909E+10    9905114     189085
  10930 -   186859                 186859        156 1.1919E+10    9903302     189243

The formula is only supposed to work in the range 10923 – 50,000, so the first few results don’t match; but in the range 10,923 to 10,930 the match is exact. Then, in the region of 50,000 we get:


STATEMENT_ID                      IO_COST    IO_DIFF   CPU_COST   CPU_DIFF       COST
------------------------------ ---------- ---------- ---------- ---------- ----------
  49991 -  1514281                1514281            9.6488E+10               1533579
  49992 -  1514288                1514288          7 9.6489E+10     473357    1533586
  49993 -  1514296                1514296          8 9.6489E+10     473337    1533594
  49994 -  1514303                1514303          7 9.6490E+10     473319    1533601
  49995 -  1514311                1514311          8 9.6490E+10     473299    1533609
  49996 -  1514318                1514318          7 9.6491E+10     473281    1533616
  49997 -  1514325                1514325          7 9.6491E+10     473262    1533624
  49998 -  1514333                1514333          8 9.6492E+10     473243    1533631
  49999 -  1514340                1514340          7 9.6492E+10     473224    1533639
  50000 -  1514348                1514348          8 9.6493E+10     473205    1533646
  50001 -  1514355                1514348          0 9.6493E+10          0    1533646
  50002 -  1514363                1514348          0 9.6493E+10          0    1533646
  50003 -  1514370                1514348          0 9.6493E+10          0    1533646
  50004 -  1514377                1514348          0 9.6493E+10          0    1533646
  50005 -  1514385                1514348          0 9.6493E+10          0    1533646
  50006 -  1514392                1514348          0 9.6493E+10          0    1533646
  50007 -  1514400                1514348          0 9.6493E+10          0    1533646
  50008 -  1514407                1514348          0 9.6493E+10          0    1533646
  50009 -  1514415                1514348          0 9.6493E+10          0    1533646
  50010 -  1514422                1514348          0 9.6493E+10          0    1533646

Again, the formula applies only in the range up to 50,000 (half the rows in the table) – and the match is perfect in that range.

Next steps

The work so far gives us some idea of the algorithm that the optimizer is using to derive a cost, but this is just one scenario and there are plenty of extra questions we might ask. What, as the most pressing one, is the significance of the number 12 in the calculation 131,072/12. From previous experience I guess that is was related to the length of the input and output values of the scalar subquery – as in “value X for n1 returns value Y for count(*)”.

To pursue this idea I recreated the data sets using varchar2(10) as the definition of n1 and lpad(rownum,10) as the value – the “breakpoint” dropped from 10,922 down to 5,461. Checking the arithmetic 131,072 / 5461 = 24.001456, then 131,072/24 = 5461.333… And that’s the number that made fhe formular work perfectly for the modified data set.

Then I set used set_column_stats() to hack the avg_col_,len of t_100K.n1 to 15 and the break point dropped to 4,096.  Again we do the two arithmetic steps: 131072/4096 = 32 (but then we don’t need to do the reverse step since the first result is integral).

Checking the original data set when n1 was a numeric the avg_col_len was 5, so we have three reference points:

  • Avg_col_len = 5. “Cache unit size” = 12
  • Avg_col_len = 11. Cache unit size = 24 (don’t forget the avg_col_len includes the length byte, so our padded varchar2(10) has a length of 11).
  • Avg_col_len = 15, Cache unit size = 32

There’s an obvious pattern here: “Cache unit size” = (2 x avg_col_len + 2).  Since I hadn’t been changing the t_1k.n1 column at the same time, that really does look like a deliberate factor of 2 (I’d thought intially that maybe the 12 was affected by the lengths of both columns in the predicate – but that doesn’t seem to be the case.)

The scientific method says I should now make a prediction based on my hypothesis – so I set the avg_col_len for t_100K.n1 to 23 and guessed that the break point would be at 2730 – and it was.  (131072 / (2 * 23 + 2) = 2730.6666…) .

The next question, of course, is “where does the “spare 2″ come from?” Trying to minimize the change in the code I modified my subquery to select sum(to_number(n1)) rather than count(*), then to avg(to_number(n1)) – remember I had changed n1 to a varchar2(10) that looked like a number left-padded with spaces. In every variant of the tests I’d done so far all I had to do to get an exact match between the basic formula and the optimizer’s cost calculation was to use “2 * avg_col_len + 22” as the cache unit size – and 22 is the nominal maximum length of an internally stored numeric column.

Bottom line: the cache unit size seems to be related to the input and output values, but I don’t know why there’s a factor of 2 applied to the input column length, and I don’t know why the length of count(*) is deemed to be 2 when other derived numeric outputs use have the more intuitive 22 for their length.

tl;dr

The total cost calculation for a scalar subquery in the select list is largely affected by:

  • a fixed cache size (131,072 bytes) possibly set by hidden parameter _query_execution_cache_max_size
  • the avg_col_len of the input (correlating) column(s) from the driving table
  • the nominal length of the output (select list) of the subquery

There is an unexplained factor of 2 used with the avg_col_len of the input, and a slightly surprising value of 2 if the output is simply count(*).

If the number N of distinct values for the driving column(s) is less than the number of possible cache entries the effect of the scalar subquery is to add N * estimated cost of executing the subquery once.  As the number of distinct values for the driving column(s) goes above the limit then the incremental effect of the subquery is based on the expected number of times an input value will not be cached. When the number of distinct values in the driving column(s) exceeds half the number of rows in the driving table the cost stops increasing – there is no obvious reason when the algorithm does this.

There are many more cases that I could investigate at this point – but I think this model is enough as an indication of general method. If you come across a variation where you actually need to work out how the optimizer derived a cost then this framework will probably be enough to get you started in the right direction.

 

PFCLATK - Audit Trail Toolkit - Checksums

We have a toolkit called PFCLATK that is used in customer engagements to assist our customers to create comprehensive and useful audit trails for their databases. The toolkit is used in consulting engagements at the moment but will be adding....[Read More]

Posted by Pete On 06/06/19 At 03:08 PM

Hibernate for Oracle DBAs

Warning: any smart developer may feel sick when reading this ;)

I am not a developer, but I like to discuss with developers: share my side of the IT (the database that we want rock stable and durable) and listen to their side (the application that they want easy to maintain and evolve). And, as I like to understand what I’m talking about, I often need to test some snippets.

Many DBAs complain about Hibernate when they come upon the queries generated by a wrong mapping. They think it was designed to be bad (who would do that?). And they are convinced that JDBC and SQL are sufficient to build applications. Actually, many DBAs I have seen are persuaded that they understand everything about coding because they have written some ugly PERL scripts to automate their job. And that anything going beyond has the only goal to break the database.

I didn’t go this way. As I like to understand before building my opinion, I have read “Hibernate In Action” and tested some object to relational mapping. I’m talking about Hibernate 3 times here. I’ve found those tests in a folder from 2008. I re-used this today for a short test and that’s the reason for this post.

Here I am showing how I’m doing those small tests with Hibernate. I’m a DBA and I cannot have an Eclipse environment taking all my screens, hiding those database top activity charts. And all my RAM is already eaten by SQL Developer and Chrome Grid Control windows, no room for Eclipse. And anyway, those mouse-focused IDEs are not friends with my carpal tunnel. I like the keyboard and tty.

So, for simple tests, I need simple things which can be reduced to a command line and 1 file that I can open with vi. The goal of this post is to show how it is easy to test some Hibernate thing in this case. Of course, any real developer will vomit when looking at this… don’t forget this is about short tests only.

Libraries

So, no Maven for me. I download the whole Hibernate .zip and build a CLASSPATH with everything I found in the required lib folder.

wget https://netix.dl.sourceforge.net/project/hibernate/hibernate-orm/5.4.3.Final/hibernate-release-5.4.3.Final.zip
unzip hibernate-release-5.4.3.Final.zip
for i in hibernate-release-5.4.3.Final/lib/required/*.jar 
do
CLASSPATH="${CLASSPATH}:$i"
done
export CLASSPATH=.:$ORACLE_HOME/jdbc/lib/ojdbc8.jar:$CLASSPATH

You can see that I’ve added the Oracle JDBC as I’ll connect to an Oracle database that I have locally (I use Oracle Cloud DBaaS here).

Compile

No Ant here. I compile the .java files I have in my folder (I don’t use packages and subfolders for simple tests). Note that, from a past admiration for makefiles, I add enough intelligence (like “test -nt”) to compile only when the code is newer than the source.

for i in *.java
do
if [ $i -nt $(basename $i .java).class ]
then
$ORACLE_HOME/jdk/bin/javac $i
fi
done

ORM Mapping

My goal was to quickly test the following mapping from @vlad_mihalcea:

The best way to map a Composite Primary Key with JPA and Hibernate - Vlad Mihalcea

So what do I have in those .java files? Testing Hibernate needs having many classes. And in Java, each class goes to its own file. But did I say that I want to open only 1 file? I use inner classes.

import java.io.*;
import java.sql.*;
import java.util.*;
import oracle.jdbc.*;
import org.hibernate.*;
import org.hibernate.cfg.*;
import javax.persistence.*;
public class Franck {
@Entity(name = "Company")
@Table(name = "company")
public class Company {
    @Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
private Long id;
    private String name;
    public Long getId() {
return id;
}
public void setId(Long id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof Company)) return false;
Company company = (Company) o;
return Objects.equals(getName(), company.getName());
}
@Override
public int hashCode() {
return Objects.hash(getName());
}
}
@Embeddable
public class EmployeeId implements Serializable {
@ManyToOne
@JoinColumn(name = "company_id")
private Company company;
    @Column(name = "employee_number")
private Long employeeNumber;
    public EmployeeId() {
}
public EmployeeId(Company company, Long employeeId) {
this.company = company;
this.employeeNumber = employeeId;
}
public Company getCompany() {
return company;
}
public Long getEmployeeNumber() {
return employeeNumber;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (!(o instanceof EmployeeId)) return false;
EmployeeId that = (EmployeeId) o;
return Objects.equals(getCompany(), that.getCompany()) &&
Objects.equals(getEmployeeNumber(), that.getEmployeeNumber());
}
@Override
public int hashCode() {
return Objects.hash(getCompany(), getEmployeeNumber());
}
}
@Entity(name = "Employee")
@Table(name = "employee")
public class Employee {
@EmbeddedId
private EmployeeId id;
    private String name;
    public EmployeeId getId() {
return id;
}
public void setId(EmployeeId id) {
this.id = id;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}
public static void main(String[] args) throws SQLException {
SessionFactory sf=new Configuration()
.addAnnotatedClass(Employee.class)
.addAnnotatedClass(Company.class)
.setProperty("hibernate.connection.url","jdbc:oracle:thin:@//localhost/PDB1")
.setProperty("hibernate.connection.driver_class","oracle.jdbc.driver.OracleDriver")
.setProperty("hibernate.connection.username","demo")
.setProperty("hibernate.connection.password","demo")
.setProperty("hibernate.format_sql","true")
.setProperty("hibernate.show_sql","true")
.setProperty("hibernate.hbm2ddl.auto","create")
.buildSessionFactory();
}
}

The main class name (here Franck) matches the file name (Franck.java) and all my entities are inner classes here. After compilation here are my files:

-rwxr--r--. 1 oracle 4931 Jun  5 20:18 Franck.java
-rw-r--r--. 1 oracle 1304 Jun 5 20:19 Franck$Company.class
-rw-r--r--. 1 oracle 1398 Jun 5 20:19 Franck$EmployeeId.class
-rw-r--r--. 1 oracle 922 Jun 5 20:19 Franck$Employee.class
-rw-r--r--. 1 oracle 1557 Jun 5 20:19 Franck.class

Execution

$ORACLE_HOME/jdk/bin/java Franck

This generates the following:

That’s all I need to verify what my annotations generate with the Oracle 12c Dialect. Ugly on-file code, but sufficient for this goal.

Java as a Shell

Ok, now that I think that any real developer has stopped reading, I can confess that I add the following on the top of my .java file:

/*TAG-FOR-SHELL 2>/dev/null
CLASSPATH=.
# Oracle JDBC
CLASSPATH=${CLASSPATH}:$ORACLE_HOME/jdbc/lib/ojdbc8.jar
# Download Hibernate
[ -f /var/tmp/hibernate.zip ] || wget -O /var/tmp/hibernate.zip https://netix.dl.sourceforge.net/project/hibernate/hibernate-orm/5.4.3.Final/hibernate-release-5.4.3.Final.zip
# Unzip Hibernate
[ -d /var/tmp/hibernate*?/lib/required ] || unzip -d /var/tmp /var/tmp/hibernate.zip
# add libs to CLASSPATH
for l in /var/tmp/hibernate*?/lib/required/*.jar ; do CLASSPATH="${CLASSPATH}:$l" ; done ; export CLASSPATH
# compile all Java
for s in $(find -name "*.java"); do
s="$(basename $s .java)"
[ $s.java -nt $s.class ] && {
echo "Compiling $s..." >&2
$ORACLE_HOME/jdk/bin/javac $s.java || exit 1
}
done
# execute
$ORACLE_HOME/jdk/bin/java $(basename $0 .java)
exit
*/

And then I “chmod u+x” this .java file and run it as a shell. The shell part is included in Java comment so that this file can be compiled as Java source code. And it gets the libraries if not there already, builds CLASSPATH, compile what’s new in the directory, and run it as a Java program. all that with a simple:

./Franck.java

Any comments welcome on Twitter: @FranckPachot

Kscope Sunday will be awesome

Yeah yeah I know. What kind of dufus cut-pastes a giant image into their blog post. That would be ….. me Smile

But the reality is, there is just soooooo much going on this year on the Sunday before the “official” start of the Kscope conference, it was too hard to condense it into a few sentences. So I just dumped the image from the website, but you can read about it here.

It’s my first Kscope and it will be blast to be running a day of high quality but dirt cheap learning with Maria, Jeff, Steven and Blaine.

And we’ll even have some fun and games. How good is your knowledge of the Oracle Database and the Oracle community? Our Sunday quiz will let you shine! (Well, to be honest, we’ve designed the quiz so that anyone in the community has a chance to win, so come along to have some fun and maybe pick up some prizes as well!)

See you in Seattle!

 

image

So long …

… and thanks for any Douglas Adams references.

I decided on Friday that after more than 31 years using the Oracle software and more than 37 years of being self-employed in the computer industry – not to mention a further 8 years being an enthusiastic amateur – the time had finally come for a change of pace.  As from 1st June 2019 I am retired.

I have a couple of long term commitments that I’ll still be doing a little work for over the next few months, and I wouldn’t be surprised if someone occasionally tempts me into doing a few days solving interesting problems.  I’ll still be answering questions on the Oracle Developer Community forum and (if the email gets through) on the Oracle-L listserver , and I’ll still be writing blog notes about Oracle and submitting abstracts to conferences for a few years. Who knows, I may even get round to writing another book – after all, I’ve already downloaded Oracle 19 so I’ll still be experimenting, investigating, and trying to break things Some people retire to do the Times crossword or Sudoku, my retirement pastime is going to be playing with Oracle.

But now I’ve got 3 musical instruments to re-learn, a garden to re-engineer, apple trees to train, and a lot of rowing [U.S. “crew”] time to catch up on so that I can get fit enough to get back to playing squash (which might be an interesting experience now that my eyesight has been fixed). And for relaxation I’ll be sitting in the garden watching the birds, drinking fine wine and reading a good book.

Update (6th June 2019)

Thanks for all the appreciative comments to know. It’s nice to have a record of the fact that I have made a difference for so many people.