Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Ansible tips’n’tricks: even more output options

In my last post I wrote about the “debug” option to format Ansible output differently. I came across this setting simply by searching the usual developer forums for an alternative Ansible output option.

Having found out about the “debug” option made me curious, especially since there wasn’t an awful lot of documentation available about additional alternatives. Or so It thought before writing this post, there is actually, as you will see later. So to recap what I had so far: I noticed “skippy” in my distribution’s /etc/ansible/ansible.cfg although it is commented out. And I found the “debug” option via my favourite search engine, and there is the “default” as well.

There surely had to be more …

This wasn’t quite good enough for me and I started to wonder if there were more of these callbacks. Here is my Ansible version in case some of these callbacks might be quite recent:

[martin@controller environment]$ ansible --version | head -n 1
ansible 2.6.4

My first idea was to perform a file system search using find – I assumed trying to look out for “debug” or “default” would result in too many irrelevant hits, but “skippy” sounded unique enough a search term.

$ find / -iname "*skippy*" 2> /dev/null
/usr/lib/python2.7/site-packages/ansible/plugins/callback/skippy.py
/usr/lib/python2.7/site-packages/ansible/plugins/callback/skippy.pyc
/usr/lib/python2.7/site-packages/ansible/plugins/callback/skippy.pyo
[martin@linuxdev ~]$ 

This looks like a hit! I opened the first file, skippy.py, and found a documentation section:

DOCUMENTATION = '''
    callback: skippy
    callback_type: stdout
    requirements:
      - set as main display callback
    short_description: Ansible screen output that ignores skipped status
    version_added: "2.0"
    extends_documentation_fragment:
      - default_callback
    description:
        - This callback does the same as the default except it does not output skipped host/task/item status

So there you go, more information about the purpose of this callback! Are there any others that might be useful candidates for stdout_callback? Possibly! Let’s see (still in the callback directory)…

$ grep -i 'type: stdout' *.py
actionable.py:    type: stdout
debug.py:    type: stdout
default.py:    type: stdout
dense.py:type: stdout
full_skip.py:    type: stdout
json.py:    type: stdout
minimal.py:    type: stdout
null.py:    callback_type: stdout
oneline.py:    type: stdout
selective.py:    callback_type: stdout
skippy.py:    callback_type: stdout
stderr.py:    callback_type: stdout
unixy.py:    type: stdout
yaml.py:    type: stdout

Having unearthed this list I eventually found the relevant part of the documentation set! So instead of showing you the DOCUMENTATION variable for each of these callbacks, I point you to the official Ansible 2.6 documentation:

https://docs.ansible.com/ansible/2.6/plugins/callback.html#plugin-list

Happy scripting!

Installing an #Exasol 6.1 Cluster on VirtualBox

After having installed the latest VirtualBox version, an ISO file with the latest Exasol version has to be downloaded. The machine hosting VirtualBox should have at least 16 GB RAM and 80 GB free disk space in order to run a 2+1 Cluster with 3 data nodes and one license server. I’m doing it on my Windows 10 notebook.

https://uhesse.files.wordpress.com/2018/10/createls.png?w=150&h=135 150w, https://uhesse.files.wordpress.com/2018/10/createls.png?w=300&h=269 300w, https://uhesse.files.wordpress.com/2018/10/createls.png?w=768&h=689 768w, https://uhesse.files.wordpress.com/2018/10/createls.png?w=1024&h=919 1024w, https://uhesse.files.wordpress.com/2018/10/createls.png 1137w" sizes="(max-width: 620px) 100vw, 620px" />

1.5 GB RAM is sufficient for the License Server and it needs only one disk

https://uhesse.files.wordpress.com/2018/10/ls_disk.png?w=150&h=134 150w, https://uhesse.files.wordpress.com/2018/10/ls_disk.png?w=300&h=269 300w, https://uhesse.files.wordpress.com/2018/10/ls_disk.png?w=768&h=689 768w, https://uhesse.files.wordpress.com/2018/10/ls_disk.png?w=1024&h=918 1024w, https://uhesse.files.wordpress.com/2018/10/ls_disk.png 1132w" sizes="(max-width: 620px) 100vw, 620px" />

Insert the ISO file as a virtual CD

https://uhesse.files.wordpress.com/2018/10/ls_iso.png?w=1240&h=818 1240w, https://uhesse.files.wordpress.com/2018/10/ls_iso.png?w=150&h=99 150w, https://uhesse.files.wordpress.com/2018/10/ls_iso.png?w=300&h=198 300w, https://uhesse.files.wordpress.com/2018/10/ls_iso.png?w=768&h=507 768w, https://uhesse.files.wordpress.com/2018/10/ls_iso.png?w=1024&h=676 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Make sure the License Server boots from CD first

https://uhesse.files.wordpress.com/2018/10/ls_boot.png?w=1240&h=814 1240w, https://uhesse.files.wordpress.com/2018/10/ls_boot.png?w=150&h=98 150w, https://uhesse.files.wordpress.com/2018/10/ls_boot.png?w=300&h=197 300w, https://uhesse.files.wordpress.com/2018/10/ls_boot.png?w=768&h=504 768w, https://uhesse.files.wordpress.com/2018/10/ls_boot.png?w=1024&h=672 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Configure the private Network for the License Server

https://uhesse.files.wordpress.com/2018/10/ls_private_network.png?w=1240... 1240w, https://uhesse.files.wordpress.com/2018/10/ls_private_network.png?w=150&... 150w, https://uhesse.files.wordpress.com/2018/10/ls_private_network.png?w=300&... 300w, https://uhesse.files.wordpress.com/2018/10/ls_private_network.png?w=768&... 768w, https://uhesse.files.wordpress.com/2018/10/ls_private_network.png?w=1024... 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Configure the public network for the License Server

https://uhesse.files.wordpress.com/2018/10/ls_public_network.png?w=1240&... 1240w, https://uhesse.files.wordpress.com/2018/10/ls_public_network.png?w=150&h=99 150w, https://uhesse.files.wordpress.com/2018/10/ls_public_network.png?w=300&h... 300w, https://uhesse.files.wordpress.com/2018/10/ls_public_network.png?w=768&h... 768w, https://uhesse.files.wordpress.com/2018/10/ls_public_network.png?w=1024&... 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Now power on the virtual machine just created and configured. It should come up with this screen and you type in install

https://uhesse.files.wordpress.com/2018/10/ls_install1.png?w=1240&h=832 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install1.png?w=150&h=101 150w, https://uhesse.files.wordpress.com/2018/10/ls_install1.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/10/ls_install1.png?w=768&h=515 768w, https://uhesse.files.wordpress.com/2018/10/ls_install1.png?w=1024&h=687 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Tab to OK and press return

https://uhesse.files.wordpress.com/2018/10/ls_install2.png?w=1240&h=828 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install2.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/10/ls_install2.png?w=300&h=200 300w, https://uhesse.files.wordpress.com/2018/10/ls_install2.png?w=768&h=513 768w, https://uhesse.files.wordpress.com/2018/10/ls_install2.png?w=1024&h=684 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Tab to OK and press return

https://uhesse.files.wordpress.com/2018/10/ls_install3.png?w=1240&h=834 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install3.png?w=150&h=101 150w, https://uhesse.files.wordpress.com/2018/10/ls_install3.png?w=300&h=202 300w, https://uhesse.files.wordpress.com/2018/10/ls_install3.png?w=768&h=517 768w, https://uhesse.files.wordpress.com/2018/10/ls_install3.png?w=1024&h=689 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Tab to OK and press return

https://uhesse.files.wordpress.com/2018/10/ls_install4.png?w=1240&h=832 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install4.png?w=150&h=101 150w, https://uhesse.files.wordpress.com/2018/10/ls_install4.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/10/ls_install4.png?w=768&h=515 768w, https://uhesse.files.wordpress.com/2018/10/ls_install4.png?w=1024&h=687 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Choose a password. I used exasol12

https://uhesse.files.wordpress.com/2018/10/ls_install5.png?w=1240&h=830 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install5.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/10/ls_install5.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/10/ls_install5.png?w=768&h=514 768w, https://uhesse.files.wordpress.com/2018/10/ls_install5.png?w=1024&h=686 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Enter the public IP of the License Server. My VirtualBox Host-Only Ethernet Adapter is configured with 192.168.43.1 – therefore I use 192.168.43.10. It should also work with the VirtualBox Standard setting, in this case use 192.168.56.10. When in doubt, call ipconfig from the command shell.

https://uhesse.files.wordpress.com/2018/10/ls_install6.png?w=1240&h=830 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install6.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/10/ls_install6.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2018/10/ls_install6.png?w=768&h=514 768w, https://uhesse.files.wordpress.com/2018/10/ls_install6.png?w=1024&h=686 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Tab to OK and press return

https://uhesse.files.wordpress.com/2018/10/ls_install7.png?w=1240&h=828 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install7.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/10/ls_install7.png?w=300&h=200 300w, https://uhesse.files.wordpress.com/2018/10/ls_install7.png?w=768&h=512 768w, https://uhesse.files.wordpress.com/2018/10/ls_install7.png?w=1024&h=683 1024w" sizes="(max-width: 620px) 100vw, 620px" />

The installation started from the last screen took me about 5 Minutes. Now type local and wait for the License Server to boot from disk. You may remove the virtual CD from it afterwards.

https://uhesse.files.wordpress.com/2018/10/ls_install8.png?w=1240&h=828 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install8.png?w=150&h=100 150w, https://uhesse.files.wordpress.com/2018/10/ls_install8.png?w=300&h=200 300w, https://uhesse.files.wordpress.com/2018/10/ls_install8.png?w=768&h=513 768w, https://uhesse.files.wordpress.com/2018/10/ls_install8.png?w=1024&h=684 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Do yourself a favor and pause for 5 Minutes after the machine booted from disk before you try to connect to EXAoperation. I’m using Chrome because it didn’t work well with FireFox:

https://uhesse.files.wordpress.com/2018/10/ls_install9.png?w=1240&h=698 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install9.png?w=150&h=84 150w, https://uhesse.files.wordpress.com/2018/10/ls_install9.png?w=300&h=169 300w, https://uhesse.files.wordpress.com/2018/10/ls_install9.png?w=768&h=432 768w, https://uhesse.files.wordpress.com/2018/10/ls_install9.png?w=1024&h=576 1024w" sizes="(max-width: 620px) 100vw, 620px" />

https://uhesse.files.wordpress.com/2018/10/ls_install10.png?w=1240&h=852 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install10.png?w=150&h=103 150w, https://uhesse.files.wordpress.com/2018/10/ls_install10.png?w=300&h=206 300w, https://uhesse.files.wordpress.com/2018/10/ls_install10.png?w=768&h=528 768w, https://uhesse.files.wordpress.com/2018/10/ls_install10.png?w=1024&h=704 1024w" sizes="(max-width: 620px) 100vw, 620px" />

The default password of the user admin is admin

https://uhesse.files.wordpress.com/2018/10/ls_install11.png?w=1240&h=908 1240w, https://uhesse.files.wordpress.com/2018/10/ls_install11.png?w=150&h=110 150w, https://uhesse.files.wordpress.com/2018/10/ls_install11.png?w=300&h=220 300w, https://uhesse.files.wordpress.com/2018/10/ls_install11.png?w=768&h=562 768w, https://uhesse.files.wordpress.com/2018/10/ls_install11.png?w=1024&h=749 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Congratulation when you see the below screen! This was already the hardest part </p />
</p></div>

    	  	<div class=

ODC Appreciation Day: Reduce CPU usage by running the business logic in the Oracle Database

A new blog post on the Databases at CERN blog to think about:

Where to run business logic: in the database or another tier?

What language for coding business logic: SQL, PL/SQL, JavaScript?

Guess how to reduce licensing costs? https://db-blog.web.cern.ch/blog/franck-pachot/2018-10-odc-appreciation-day-reduce-cpu-usage-running-business-logic-oracle

ODC Appreciation Day : Reduce CPU usage by running the business logic in the Oracle Database

ODC Appreciation Day–LOB compression

LOBs tend to be large. Well duh…it’s right there in the name! “Large Object”. So one of the cool things I like with the SECUREFILE option in recent releases of Oracle Database is the ability to compress LOBs. Here’s a quick demo of that in action:


SQL> create table t1
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    nocompress);

Table created.

SQL>
SQL> create table t2
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    compress);

Table created.

SQL>
SQL> insert into t1
  2  select rownum, rpad(owner,6000,owner||','||object_type)
  3  from   dba_objects
  4  commit;

82902 rows created.

SQL>
SQL> insert into t2
  2  select * from t1;

82902 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select sum(bytes) from user_segments
  2  where
  3     segment_name = 'T1' or
  4     segment_name = ( select segment_name from user_lobs
  5                      where table_name = 'T1' );

SUM(BYTES)
----------
1548943360

1 row selected.

SQL>
SQL> select sum(bytes) from user_segments
  2  where
  3     segment_name = 'T2' or
  4     segment_name = ( select segment_name from user_lobs
  5                      where table_name = 'T2' );

SUM(BYTES)
----------
  19005440

1 row selected.

Now as you can see, the compression achieved here is pretty spectacular but a careful eye will quickly spot why this is the case – the data is lot of repeated pairings of the owner and object_type values from DBA_OBJECTS.

Just to balance the ledger, I’ll repeat the example using totally random strings


SQL> create table t1
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    nocompress);

Table created.

SQL>
SQL> create table t2
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    compress);

Table created.

SQL>
SQL> insert into t1
  2  select rownum, dbms_random.string('x',12000)
  3  from   dual
  4  connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL>
SQL> insert into t2
  2  select * from t1;

10000 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> select sum(bytes) from user_segments
  2  where
  3     segment_name = 'T1' or
  4     segment_name = ( select segment_name from user_lobs
  5                      where table_name = 'T1' );

SUM(BYTES)
----------
  93126656

1 row selected.

SQL>
SQL> select sum(bytes) from user_segments
  2  where
  3     segment_name = 'T2' or
  4     segment_name = ( select segment_name from user_lobs
  5                      where table_name = 'T2' );

SUM(BYTES)
----------
  84017152

1 row selected.

 

As you can see, even in this case, we still managed to get some benefit from the compression, although (as expected) the gains are not as dramatic. The benefit that you get from your data will probably fall somewhere in between these values. Also note that there is an overhead to storing the metadata for any LOB, and you would expect the amount of metadata to be larger for a compressed LOB. So compressing tiny LOBs may yield an actual growth in segment size – so always perform some measurements on realistic data before embarking down this path.

And finally, in the vein of “There is no such thing as a free lunch”, compressing anything, LOBs or otherwise, takes processing power. So always keep in mind the trade-off between less storage, faster access for smaller LOBs versus the processing cost of compressing them in the first place.

Thanks ODC!

Hybrid Fake

Oracle 12c introduced the “Hybrid” histogram – a nice addition to the available options and one that (ignoring the bug for which a patch has been created) supplies the optimizer with better information about the data than the equivalent height-balanced histogram. There is still a problem, though, in the trade-off between accuracy and speed: just as it does with height-balanced histograms when using auto_sample_size Oracle samples (typically) about 5,500 rows to create a hybrid histogram, and the SQL it uses to generate the necessary summary is essentially an aggregation of the sample, so either you have a small sample with the risk of lower accuracy or a large sample with an increase in workload. This being the case it’s worth knowing how to create a hybrid histogram using the dbms_stats.set_column_stats() API.

It’s fairly easy to identify the cases where a hybrid histogram could be helpful.  You have a large volume of data spread over a large number (more than 2048) of distinct values, but a few values (typically less than 250) which are responsible for a significant fraction of the data. You would like to tell Oracle about the special “extreme” cases so that the optimizer can take defensive if you query for one of those values, but at the same time you would like to give Oracle a picture of the way the rest of the data is distributed. This is similar in some respects to the Top-N (a.k.a. Top-Frequency) histogram which says to Oracle “We have a small number of popular values, and some odds and ends on the side that are pretty ignorable”, the critical difference is that you need the hybrid histogram when it’s not safe to “ignore” the odds and ends.

Here’s an example of creating some data and then generating a completely artificial hybrid histogram. The code demonstrates 3 points – the principle feature of creating hybrid histograms and a couple of generic details about Oracle’s histograms:

  • The main point is that Oracle 12c introduces a new numeric array in the dbms_stats.statrec structure. This allows each row (bucket) in a histogram to hold a second statistic about the bucket so we can now store a frequency figure for the bucket as a whole, and a “repeat-count” figure for the highest value in the bucket. (Warning – there is a counter-intuitive conflict between the name of the new structure and the way it is used for hybrid histograms).
  • As side-point I’ve included a code variation that shows you the remarkable similarity between generating a Frequency histogram and a Hybrid histogram.
  • As a second side-point I have also highlighted the effect you see in the dba_tab_histograms view when your popular values are “too similar” to each other – i.e. when they match on the first 6 characters.

We start by creating a table as a copy of the view all_objects – then we’re going to create a hybrid histogram on the object_type column that looks nothing like the  data. The histogram will say:

  • for every 15,000 rows (where the column is not null)
    • 5,000 will have values less than or equal to ‘C’, of which 3,000 will have the value ‘C’
    • The next 2,000 (i.e. running total 7,000) will have values greater than ‘C’ and up to ‘PPPPPP1’, but ‘PPPPPP1’ itself is not a popular value
    • The next 2,000 (i.e. running total 9,000) will have values greater than ‘PPPPPP1’ and up to ‘PPPPPP2’, but ‘PPPPPP2’ itself is not a popular value
    • The next 2,000 (i.e. running total 11,000) will have values greater than ‘PPPPPP2’ and up to ‘PPPPPP3’, but ‘PPPPPP3’ itself is not a popular value
    • The last 4,000 (i.e. running total 15,000) will have values greater than ‘PPPPPP3’ and up to ‘X’ of which 3,000 will have the value ‘X’

Note particularly that the “how many rows hold the endpoint value” are stored in the statrec.bkvals array – just as they would be for a frequency histogram – and the cumulative count of rows is stored in the statrec.rpcnts structure. All we have to do to create a frequency histogram instead of a hybrid histogram is to store zeros in the statrec.rpcnts structure, or leave it uninitialized.

You’ll notice that since I’m creating a histogram on a character column I’ve used an array of type dbms_stats.chararray to hold the list of values (in ascending order) that I want the histogram to describe.


rem
rem     Script:         12c_hybrid_histogram_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2018
rem 

create table t1
as
select * from all_objects
;

begin
        dbms_stats.gather_table_stats(
                ownname         => null,
                tabname         => 't1',
                method_opt      => 'for all columns size 1'
        );
end;
/

declare
                c_array         dbms_stats.chararray;
                m_rec           dbms_stats.statrec;
                m_distcnt       number;
                m_density       number;
                m_nullcnt       number;
                m_avgclen       number;

begin
        dbms_stats.get_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        );

        m_rec.epc    := 5;

        c_array      := dbms_stats.chararray( 'C',  'PPPPPP1',  'PPPPPP2',  'PPPPPP3',   'X');
        m_rec.bkvals := dbms_stats.numarray (3000,          1,          1,          1,  3000);

        m_rec.rpcnts := dbms_stats.numarray (5000,       7000,       9000,      11000, 15000);
--      m_rec.rpcnts := dbms_stats.numarray (0000,       0000,       0000,       0000, 00000);

        dbms_stats.prepare_column_values(m_rec, c_array);

        dbms_stats.set_column_stats(
                ownname         => user,
                tabname         => 'T1',
                colname         => 'OBJECT_TYPE', 
                distcnt         => m_distcnt,
                density         => m_density,
                nullcnt         => m_nullcnt,
                srec            => m_rec,
                avgclen         => m_avgclen
        ); 
end;
/

That’s it – it’s remarkably simple. To show the effect of running this code I can report the content of user_tab_histograms for the column. I’ve actually run the code and queried the results twice; first for the case where I created the hybrid histogram and then after modifying the PL/SQL block to set the rpcnts array to zeros to create a frequency histogram.


column endpoint_actual_value format a22
column endpoint_value        format 999,999,999,999,999,999,999,999,999,999,999,999

select
        endpoint_number, endpoint_value, endpoint_actual_value, endpoint_repeat_count
from
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = 'OBJECT_TYPE'
order by
        endpoint_value
;

With non-zero rpcnts (hybrid histogram)
=======================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                       3000
           7000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    1
           9000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    1
          11000  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    1
          15000  456,922,123,551,065,000,000,000,000,000,000,000 X                                       3000


With rpcnts set to zero (frequency histogram)
=============================================
ENDPOINT_NUMBER                                   ENDPOINT_VALUE ENDPOINT_ACTUAL_VALUE  ENDPOINT_REPEAT_COUNT
--------------- ------------------------------------------------ ---------------------- ---------------------
           3000  347,883,889,521,833,000,000,000,000,000,000,000 C                                          0
           3001  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP1                                    0
           3002  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP2                                    0
           3003  417,012,704,559,973,000,000,000,000,000,000,000 PPPPPP3                                    0
           6003  456,922,123,551,065,000,000,000,000,000,000,000 X                                          0

I made a comment earlier on that the naming and use of the rpcnts structure was somewhat counter-intuitive. As you can see in the results above, when I created the hybrid histogram the values I stored in the rpcnts structure are not the values reported as the “repeat count”, the numbers reported as the “repeat count” are from the bkvals (bucket values).  As far as I’m concerned this means I have to go back to my basic examples every time I want to fake a histogram because I’m never too sure which arrays I should populate with what values – and whether I should use absolute or cumulative values.

One last minor point: you’ll see that the endpoint_actual_value has been populated in this example. This is because (with Oracle’s interesting transformation from character to numeric) the three ‘PPPPPPx’ character values turn into the same number – so Oracle stores the first 64 bytes (or 32 for versions of Oracle prior to 12c) of the actual value.

 

Eye, Eye, Cap’n

By the time you read this I will have had the lenses in both my eyes replaced, so I won’t be staring at a computer screen for a while – and that means, in particular, I won’t be doing any further investigation into join cardinality for a while.

For those who might otherwise feel deprived of my exquisite prose I have, however, prepared a couple of lightweight articles for automatic launching over the next few days. But please don’t expect any prompt follow-ups if you add comments or send email over the next couple of weeks.

OpenWorld Tuesday … yep, still screwed

Well…I’m only planning day 2 of OpenWorld and already I’ve pretty much given up on being able to see all the talks that I want to see Smile The challenge for me will be trying to coax those people that are inside Oracle to give me a synopsis of their talk after the event. That is probably my best hope here.

In any event, if you’re a database person like me, here’s my tips for what I’ll be trying to sneak into on Tuesday.

Tuesday morning

An Insider’s Guide to Oracle Autonomous OLTP Database Cloud [TRN3979]
Maria Colgan, master product manager at Oracle Corporation , Oracle
Robert Greene, Senior Director, Product Management, Oracle
Tuesday, Oct 23, 11:15 a.m. – 12:00 p.m. | Moscone West – Room 3003

which overlaps with

Finally: DevOps for Databases [TRN4088]
Gerald Venzl, Senior Principal Product Manager, Oracle
Kris Rice, Senior Director, Oracle
Tuesday, Oct 23, 11:15 a.m. – 12:00 p.m. | Moscone West – Room 3009

which overlaps with

Oracle Real Application Clusters 18c Internals [TRN4022]
Anil Nair, Senior Principal Product Manager, Oracle
Tuesday, Oct 23, 11:15 a.m. – 12:00 p.m. | Moscone West – Room 3007

which overlaps with

Upgrade to Oracle Database 18c: Live and Uncensored [TRN4029]
Kamran Aghayev, Dba team head, Azercell telecom
Roy Swonger, Vice President, Database Upgrade & Utilities, Oracle
Mike Dietrich, Master Product Manager – Database Upgrades and Migrations, Oracle
Tuesday, Oct 23, 11:15 a.m. – 12:00 p.m. | Moscone West – Room 3004

which overlaps with

DBAs Versus Autonomous Databases [PRO1072]
Julian Dontcheff, Global Database Lead, Accenture Oy
Tuesday, Oct 23, 11:15 a.m. – 12:00 p.m. | Moscone West – Room 3002

which overlaps with

Cool New Features for Developers in Oracle Database 18c and Oracle Database 12c [DEV6243]
Timothy Hall, DBA, Developer, Author, Trainer, oracle-base.com
Code One Tracks: Database, Big Data, and Data Science
Session Type: Developer Session
Tuesday, Oct 23, 11:30 a.m. – 12:15 p.m. | Moscone West – Room 2003


Tuesday lunchtime

Zero Downtime Upgrade of Core Banking Systems with Edition-Based Redefinition [CAS4042]
Martin Buechi, Systems Architect, Avaloq Evolution AG
Bryn Llewellyn, Distinguished Product Manager, Database Division, Oracle
Nicolas Blanc, Software Development Engineer, Avaloq Evolution AG
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3005

which overlaps with

Oracle Database In-Memory on Exadata: A Potent Combination [PRO4016]
Gurmeet Goindi, Technical Product Strategist, Oracle
Shasank Chavan, Vice President, In-Memory Technologies, Oracle
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3008

which overlaps with

Oracle Sharding: Geo-Distributed, Scalable, Multimodel Cloud-Native DBMS [PRO4037]
Mark Dilman, Director, Software Development, Oracle
Srinagesh Battula, Sr. Principal Product Manager, Oracle
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3007

which overlaps with

Oracle Database 18c: New Features for Agile Development [TIP2903]
Erik Benner, VP Enterprise Transformation, Mythics, Inc.
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3001

which overlaps with

Python and Oracle Database on the Table [TIP4076]
Christopher Jones, Senior Principal Product Manager, Oracle
Anthony Tuininga, Software Developer 5, Oracle
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3009

which overlaps with

Real-World Performance with Oracle Autonomous Data Warehouse Cloud [TRN4025]
John Clarke, Software Development Director, Real World Performance, Oracle
Steven Yang, Oracle
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3004

which overlaps with

Oracle Database: What’s New and Coming Next [PRO5512]
Dominic Giles, Master Product Manager, Oracle
Penny Avril, VP Oracle Database Product Management, Oracle
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 3003

which overlaps with

Modern (NoSQL) Database Application Development Everywhere [DEV5614]
Beda Hammerschmidt, Consulting Member of Technical Staff, Oracle
Vikas Arora, Senior Director, Oracle
Maxim Orgiyan, PMTS, Oracle
Code One Tracks: Database, Big Data, and Data Science, Development Tools
Session Type: Developer Session
Tuesday, Oct 23, 12:30 p.m. – 1:15 p.m. | Moscone West – Room 2003
Tuesday early afternoon

Python and Oracle Database 18c: Scripting for the Future [HOL6329]
Christopher Jones, Senior Principal Product Manager, Oracle
Anthony Tuininga, Software Developer 5, Oracle
Blaine Carter, Oracle Developer Advocate for Open Source, Oracle
Tuesday, Oct 23, 2:15 p.m. – 3:15 p.m. | Marriott Marquis (Yerba Buena Level) – Salon 3/4


Tuesday mid afternoon

Oracle REST Data Services and REST APIs for Your Oracle Database [PRO4063]
Colm Divilly, Consulting Member of Technical Staff, Oracle
Tuesday, Oct 23, 3:45 p.m. – 4:30 p.m. | Moscone West – Room 3009

which overlaps with

Oracle JavaScript Extension Toolkit Composite Components Bring Agility to App Development [TIP2364]
Lucas Jellema, CTO, AMIS Services BV
Tuesday, Oct 23, 3:45 p.m. – 4:30 p.m. | Moscone West – Room 302

which overlaps with

Even the ACEs make Mistakes – What did they learn?” [TIP1989]
Debra Lilley, VP Certus Cloud Services & Oracle Alliance, Certus Solutions an Accenture Company
Ralf Koelling, Senior Consultant, CGI Deutschland Ltd. & Co. KG
Fiona Martin, “Director, Strategic Alliances -Oracle”, KPMG LLP
Tuesday, Oct 23, 3:45 p.m. – 4:30 p.m. | Moscone West – Room 3001

which overlaps with

DevOps Tools for Database Developers [DEV5055]
Blaine Carter, Oracle Developer Advocate for Open Source, Oracle
Code One Tracks: DevOps and Pipelines, Development Tools
Session Type: Developer Session
Tuesday, Oct 23, 4:00 p.m. – 4:45 p.m. | Moscone West – Room 2010
Tuesday late afternoon

An Automatic Cloud Document Store Using Oracle Text [PRO4008]
Roger Ford, Product Manager, Oracle
Tuesday, Oct 23, 4:45 p.m. – 5:30 p.m. | Moscone West – Room 3005

which overlaps with

Multitenant: Best Practices for High Availability [PRO4004]
Patrick Wheeler, Senior Director, Product Management, Oracle Database, Oracle
Giridhar Ravipati, Consulting Member of Technical Staff, Oracle
Tuesday, Oct 23, 4:45 p.m. – 5:30 p.m. | Moscone West – Room 3004

which overlaps with

All You Need to Know About Backup and Recovery [TIP3266]
Francisco Munoz Alvarez, Director of Innovation, Red Stack Tech Ltd
Tuesday, Oct 23, 4:45 p.m. – 5:30 p.m. | Moscone West – Room 3001

which overlaps with

Driving Hands-Free with Oracle Autonomous OLTP Database Cloud [TRN3980]
Maria Colgan, master product manager at Oracle Corporation , Oracle
Robert Greene, Senior Director, Product Management, Oracle
Tuesday, Oct 23, 4:45 p.m. – 5:30 p.m. | Moscone West – Room 3003

which overlaps with

Saving Your Database from Memory Loss: Oracle Database In-Memory Analytics Improvements [TRN1263]
James Czuprynski, Advanced Technical Consultant Oracle Solutions, ViON Corporation
Tuesday, Oct 23, 4:45 p.m. – 5:30 p.m. | Moscone West – Room 3006

 

Tuesday evening

Autonomous and Beyond: Security in the Age of the Autonomous Database [PRM4108]
Russ Lowenthal, Director, Product Management, Oracle
Tuesday, Oct 23, 5:45 p.m. – 6:30 p.m. | Moscone West – Room 3006

which overlaps with

Oracle Autonomous Data Warehouse Cloud: What Every DBA Should Know [PRO4050]
George Lumpkin, Vice President, Product Management, Oracle
Tuesday, Oct 23, 5:45 p.m. – 6:30 p.m. | Moscone West – Room 3005

which overlaps with

Real-World DBA Best Practices for the Cloud [TIP2048]
Arup Nanda, VP, Data Services, Priceline
Tuesday, Oct 23, 5:45 p.m. – 6:30 p.m. | Moscone West – Room 3002

which overlaps with

How to Migrate to Oracle Autonomous Database Cloud [TRN4034]
Julian Dontcheff, Global Database Lead, Accenture Oy
Roy Swonger, Vice President, Database Upgrade & Utilities, Oracle
William Beauregard, Oracle
Tuesday, Oct 23, 5:45 p.m. – 6:30 p.m. | Moscone West – Room 3004

No /proc/diskstats Does Not Track **Your** Physical I/O Requests

You have applications that scan disk using large sequential reads so you take a peek at /proc/diskstats (field #4 on modern Linux distributions) before and after your test in order to tally up the number of reads your application performed. That’s ok. That’s also a good way to get erroneous data.

Your application makes calls for I/O transfers of a particular size. The device drivers for your storage might not be able to accommodate your transfer request in a single DMA and will therefore “chop it up” into multiple transfers. This is quite common with Fibre Channel device drivers where, for example, I/O requests larger than, say, 256KB get rendered into multiple 256KB transfers in the kernel.

This is not a new phenomenon. However, folks may not naturally expect how stats in /proc/diskstats reflect this phenomenon.

The following screen shot shows a simple shell script I wrote to illustrate the point I’m making. The script is very simple. As the screen shot shows, the script will execute a single dd(1) command with direct I/O for 1,000 reads of sizes varying from 4KB to 1024KB.

 

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/b3.png?w=500&h=337" alt="" width="500" height="337" srcset="https://kevinclosson.files.wordpress.com/2018/10/b3.png?w=500&h=337 500w, https://kevinclosson.files.wordpress.com/2018/10/b3.png?w=150&h=101 150w, https://kevinclosson.files.wordpress.com/2018/10/b3.png?w=300&h=202 300w, https://kevinclosson.files.wordpress.com/2018/10/b3.png 757w" sizes="(max-width: 500px) 100vw, 500px" />

First, I executed the script by pointing it to a file in an XFS file system on an NVMe drive. As the next screenshot shows, diskstats accurately tallied the application reads until the I/O sizes were larger than 256KB. We can see that the device is only taking DMA requests of up to 256KB. When the script was conducting 512KB and 1024KB I/O requests the count of reads doubled and quadrupled, respectively, as per /proc/diskstats.

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/b1.png?w=500&h=207" alt="" width="500" height="207" srcset="https://kevinclosson.files.wordpress.com/2018/10/b1.png?w=500&h=207 500w, https://kevinclosson.files.wordpress.com/2018/10/b1.png?w=150&h=62 150w, https://kevinclosson.files.wordpress.com/2018/10/b1.png?w=300&h=124 300w, https://kevinclosson.files.wordpress.com/2018/10/b1.png?w=768&h=317 768w, https://kevinclosson.files.wordpress.com/2018/10/b1.png 835w" sizes="(max-width: 500px) 100vw, 500px" />

I’m sure readers are wondering if the file system (XFS) might be muddying the water. It is not. The following screen shot shows scanning the device directly and the resultant diskstats data remained the same as it was when I scanned a file on that device.

 

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/b2.png?w=500" alt="" srcset="https://kevinclosson.files.wordpress.com/2018/10/b2.png 480w, https://kevinclosson.files.wordpress.com/2018/10/b2.png?w=150 150w, https://kevinclosson.files.wordpress.com/2018/10/b2.png?w=300 300w" sizes="(max-width: 480px) 100vw, 480px" />

Summary

This was not meant to be a life-changing blog post. The main point I’m sharing is that it’s important to not confuse your I/O requests with the I/O requests of the kernel. You can scan with whatever read size you’d like. However, the kernel has to abide by the transfer size limit announced by the device driver. To that end, you have your reads and the kernel has its reads. Just remember that /proc/diskstats is tracking the kernel’s read requests–not yours.

Join Cardinality – 3

In the previous posting I listed the order of precision of histograms as:

  • Frequency
  • Top-Frequency
  • Hybrid
  • Height-balanced
  • None

Having covered the Frequency/Frequency join (for a single column, no nulls, equijoin) in the previous posting I’ve decided to work down the list and address Frequency/Top-Frequency in this posting. It gets a little harder to generate data as we move to the less precise histograms since we need to have skew, we want some gaps, and (for Top-Frequency) we need to have some data that can be “ignored”. On the plus side, though, I want to work with a small number of buckets to keep the output of any queries I run fairly short so I’m going to stick with a small number of buckets, which means the “small” volume of “ignorable” data (the “spare” bucket) can be relative large. Here’s the code I used to generate data for my investigation – 100 rows for the table with a frequency histogram and 800 rows for the table with a top-frequency.


rem
rem     Script:         freq_hist_join_05.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Oct 2018
rem     Purpose:        
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

execute dbms_random.seed(0)

create table t1 (
        id              number(6),
        n04             number(6),
        n05             number(6),
        n20             number(6),
        j1              number(6)
)
;

create table t2(
        id              number(8,0),
        n20             number(6,0),
        n30             number(6,0),
        n50             number(6,0),
        j2              number(6,0)      
)
;

insert into t1
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   4) + 1                    n04,
        mod(rownum,   5) + 1                    n05,
        mod(rownum,  20) + 1                    n20,
        trunc(2.5 * trunc(sqrt(v1.id*v2.id)))   j1
from
        generator       v1,
        generator       v2
where
        v1.id <= 10 -- > comment to avoid WordPress format issue
and     v2.id <= 10 -- > comment to avoid WordPress format issue
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        mod(rownum,   20) + 1                   n20,
        mod(rownum,   30) + 1                   n30,
        mod(rownum,   50) + 1                   n50,
        28 - round(abs(7*dbms_random.normal))        j2      
from
        generator       v1
where
        rownum <= 800 -- > comment to avoid WordPress format issue
;

begin
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1 for columns j1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname          => null,
                tabname          => 'T2',
                method_opt       => 'for all columns size 1 for columns j2 size 16'
        );
end;
/

In this example I’ve used the sqrt() function and the dbms_random.normal() function to generate the data. The scaling and truncating I’ve done on the results has given me two sets of data which have a nice skew, some gaps, but different patterns (though both have a small number of small values and a larger number of larger values). The data from dbms_random.normal() will produce 22 distinct values, so I’ve requested a histogram with 16 buckets and checked that this will produce a Top-Frequency histogram. (If I want a Hybrid histogram – for the next thrilling installment in the series – I’ll just reduce the number of buckets slightly).

Here are the resulting stats, preceded by the code that reported them:


select  table_name, column_name, histogram, num_distinct, num_buckets, density
from    user_tab_cols
where   table_name in ('T1','T2')
and     column_name in ('J1','J2')
order by table_name
;

select  table_name, num_rows
from    user_tables
where   table_name in ('T1','T2')
order by table_name
;

break on table_name skip 1 on report skip 1

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select f1.* from f1
union all
select f2.* from f2
order by 1,2
;


TABLE_NAME           COLUMN_NAME          HISTOGRAM       NUM_DISTINCT NUM_BUCKETS    DENSITY
-------------------- -------------------- --------------- ------------ ----------- ----------
T1                   J1                   FREQUENCY                 10          10       .005
T2                   J2                   TOP-FREQUENCY             22          16    .000625

TABLE_NAME             NUM_ROWS
-------------------- ----------
T1                          100
T2                          800

TABLE_NAME                VALUE ROW_OR_BUCKET_COUNT ENDPOINT_NUMBER
-------------------- ---------- ------------------- ---------------
T1                            2                   5               5
                              5                  15              20
                              7                  15              35
                             10                  17              52
                             12                  13              65
                             15                  13              78
                             17                  11              89
                             20                   7              96
                             22                   3              99
                             25                   1             100

T2                            1                   1               1
                             13                  14              15
                             15                  11              26
                             16                  22              48
                             17                  34              82
                             18                  31             113
                             19                  36             149
                             20                  57             206
                             21                  44             250
                             22                  45             295
                             23                  72             367
                             24                  70             437
                             25                  87             524
                             26                 109             633
                             27                  96             729
                             28                  41             770

Table t1 reports 100 rows, 10 distinct values and a Frequency histogram with 10 buckets.
Table t2 reports 800 rows, 22 distinct values and a Top-Frequency histogram with 16 buckets.

Things we notice from the histograms are: t1 has a range from 2 to 25, while t2 has a range from 1 to 28. We also notice that the highest endpoint_number for t2 is only 770 out of a possible 800 – we’ve “lost” 30 rows. We don’t really care what they are for the purposes of the arithmetic, but if we did a quick “select j2, count(*)” query we’d see that we had lost the following:


SQL> select j2, count(*) from t2 group by j2 order by count(*), j2;

	J2   COUNT(*)
---------- ----------
	 1	    1
	 9	    1  *
	 8	    3  *
	11	    4  *
	10	    5  *
	12	    8  *
	14	    9  *
	15	   11
...

The reason why the total number of rows accounted for is less than the total number of rows in the table comes in two parts. The Top-Frequency histogram is designed to hold the Top N most popular entries in the table, so there will be some entries that don’t make an appearance in the histogram despite contributing rows to the total table count; the number of “lost” rows can then be increased because the Top N popular values may not include the column low and high values, and these two values must appear in the histogram. Looking at the output above we can see that we could have reported 14 as the 16th most popular value, instead we have to record 1, losing a further 9 rows and regaining 1.

Let’s test the pure join query on the two tables to see what the optimizer is predicting as the join cardinality, and then try to re-create that cardinality from the histogram data:


alter session set statistics_level = all;
alter session set events '10053 trace name context forever';
alter session set tracefile_identifier='BASELINE';

select
        count(*) 
from
        t1, t2
where
        t1.j1 = t2.j2
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
alter session set events '10053 trace name context off';


-----------------------------------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      41 |       |       |          |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      41 |       |       |          |
|*  2 |   HASH JOIN         |      |      1 |   1608 |   1327 |00:00:00.01 |      41 |  2545K|  2545K| 1355K (0)|
|   3 |    TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       7 |       |       |          |
|   4 |    TABLE ACCESS FULL| T2   |      1 |    800 |    800 |00:00:00.01 |       7 |       |       |          |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."J1"="T2"."J2")

Our target is to work out how we can query the histogram data in a way that gets the result 1,608. Ideally we’ll also think of a rationale for justifying our method, and then we’ll apply the same method with 15 buckets and 17 buckets, and with a couple of variations to the data (e.g. update all rows where j1 = 25 to set j1 = 28), to see if the method still gets the right result.

All we did with the frequency/frequency join was to join the two histograms on matching values, multiply the frequencies on each resulting row , then sum down the set, and this automatically eliminated rows which were outside the “highest low” and “lowest high” (i.e. we only examined rows where the histograms overlapped). We might hope that things shouldn’t be too different when one of the histograms is a top-frequency histogram.

There is an important difference, though, between frequency and top-frequency histograms – in the latter case there are values in the table which will not be in the histogram, so we ought to make some allowance for these (even though it’s only “one bucket’s worth”). It’s possible that some of these values might match values in the frequency histogram so we need to include a mechanism for adding in a factor to allow for them. So as a first step let’s work out the “average number of rows per value” for the missing values.

We have 22 distinct values and 16 end points so there are 6 missing values. We have 800 rows in the table but only 770 rows reported in the histogram so there are 30 missing rows. So let’s say the missing values have an average cardinality of 30/6 = 5 (and we might extend that to say they have an average selectivity of 5/800 = 0.00625).

Let’s bring that value into the query we wrote for the frequency/frequency case by using an outer join (which I’ll write as an “ANSI” Full Outer Join”) with a predicate in place that restricts the result to just the overlapping range, which is [2,25], the “higher low value” and “lower high value” across the two histograms. Here’s some code – with an odd little detail included:


column product format 999,999,999.99
compute sum of product on report

compute sum of t1_count on report
compute sum of t1_value on report
compute sum of t2_count on report
compute sum of t2_value on report

with f1 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T1' 
and     column_name = 'J1'
order by 
        endpoint_value
),
f2 as (
select 
        table_name,
        endpoint_value                                                            value, 
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) row_or_bucket_count,
        endpoint_number
from 
        user_tab_histograms 
where 
        table_name  = 'T2' 
and     column_name = 'J2'
order by 
        endpoint_value
)
select
        f1.value f1_value,
        f2.value f2_value,
        nvl(f1.row_or_bucket_count,0.00) t1_count, 
        nvl(f2.row_or_bucket_count,800*0.00625) t2_count,
        nvl(f1.row_or_bucket_count,0.00) * 
        nvl(f2.row_or_bucket_count,800*0.006250) product
from
        f1
full outer join
        f2
on
        f2.value = f1.value
where
        coalesce(f1.value, f2.value) between 2 and 25
order by
        coalesce(f1.value, f2.value)
;

I’ve included an nvl() on the columns for the top-frequency histograms that convert nulls (i.e. the preserved rows derived from the frequency histogram) into the average frequency we’ve just calculated, using the “num_rows * selectivity” representation. The odd little detail that I commented on above does something similar for the preserved rows derived from the top-frequency histogram because this first guess at the calculation was wrong and needed an adjustment which I’m anticipating. Here are the results I got with this code:

  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13          0         14             .00
        15         15         13         11          143.00
                   16          0         22             .00
        17         17         11         34          374.00
                   18          0         31             .00
                   19          0         36             .00
        20         20          7         57          399.00
                   21          0         44             .00
        22         22          3         45          135.00
                   23          0         72             .00
                   24          0         70             .00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233        100        548        1,463.00

The figure is too low, so there has to be an adjustment. What if the code is allowing for the “maybe there are other values” algorithm that the optimizer uses with fequency histograms ? If you’ve gathered a frequency histogram on a column but query it with a value that isn’t in the histogram than Oracle applies an algorithm that looks like: “if you’re asking for something that isn’t in the histogram I’ll assume that there must be some data there and use a frequency that’s half the lowest frequency I have recorded”**Important footnote. The value 25 appears once in our histogram so let’s include a fudge-factor of 0.5 (i.e. half a row) in the nvl() expression for the t1 frequencies and see what happens. This is what the new results look like:


  T1_VALUE   T2_VALUE   T1_COUNT   T2_COUNT         PRODUCT
---------- ---------- ---------- ---------- ---------------
         2                     5          5           25.00
         5                    15          5           75.00
         7                    15          5           75.00
        10                    17          5           85.00
        12                    13          5           65.00
                   13         .5         14            7.00
        15         15         13         11          143.00
                   16         .5         22           11.00
        17         17         11         34          374.00
                   18         .5         31           15.50
                   19         .5         36           18.00
        20         20          7         57          399.00
                   21         .5         44           22.00
        22         22          3         45          135.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          1         87           87.00
---------- ---------- ---------- ---------- ---------------
       135        233      103.5        548        1,607.50

Since we were looking for 1,608 I’m going to call that a success. I can check precision, of course, by looking at the 10053 trace file. Extracting a few critical lines:

egrep -e"Density" -e"Join Card" orcl12c_ora_6520_BASELINE.trc

    AvgLen: 3 NDV: 22 Nulls: 0 Density: 0.006250 Min: 1.000000 Max: 28.000000
    AvgLen: 3 NDV: 10 Nulls: 0 Density: 0.005000 Min: 2.000000 Max: 25.000000

Join Card:  1607.500000 = outer (100.000000) * inner (800.000000) * sel (0.020094)

The “Density” lines come from the column statistics – note the 0.00625 that matches the “average selectivity” I derived from the top-frequency figures. You might also note that the “half the least frequent value” could be derived from the t1.j1 density (0.005) * t1.num_rows (100).

The “Join Card” line is exactly what it says – the join cardinality calculation showing that the plan’s prediction of 1,608 rows was actually a rounded 1607.5

There is one more important thing to check before I start tweaking the data to see if there are any other factors involved. Is the 0.5 I stuck into the query really the value of “half the least common frequency” or is it a fixed value in all cases. A nice easy way of testing this is to update the t1 table to change one row from 22 to 25 (22 will still be present in the table and histogram before and after this test, so it’s a minimal and safe change). Making this change and re-running the calculation query leaving the 0.5 unchanged gives the following:


update t1 set j1 = 25 where j1 = 22 and rownum = 1;

...

                   21         .5         44           22.00
        22         22          2         45           90.00
                   23         .5         72           36.00
                   24         .5         70           35.00
        25         25          2         87          174.00
                      ---------- ---------- ---------------
sum                        103.5        548        1,649.50

Without reporting all the details:

  • the estimate in the plan went up from 1,608 to 1,794
  • leaving 0.5 in the query the derived result was 1,649.5 (last few lines of output above)
  • changing the 0.5 to 1.0 the derived result was 1,794.0

Conclusion – the “fudge factor” is consistent with the model the optimizer uses with frequency histogram calculations. The optimizer models “missing” rows in the join calculation as “half the number of the least frequently occuring value**Important footnote

Filter Predicates:

After a dozen tests varying the number of buckets in the top-frequency histogram (and checking it really was still a top-frequency histogram), and tweaking the t1 (frequency histogram) data to use values on the boundaries of, or outside, the range of the t2 (top-frequency) data, I concluded that my approach was probably correct. Outer join the two histograms, restrict to the overlap, supply the “num_rows * density” figure on the top-frequency side, and “half the lowest frequency”**Important footnote on the frequency side, and the query produces the same result as the optimizer for the pure join cardinality.

So the next step is to check what happens when you add filter predicates on one, or both, sides. I listed a fragment of code earlier on to execute the pure join and count the number of rows it produced, enabling the 10053 trace and pulling the actual plan from memory at the same time. I repeated this code with 3 variations and checked the “Join Card” lines from the resulting trace files:


select count(*) from  t1, t2 where  t1.j1 = t2.j2
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2
select count(*) from  t1, t2 where  t1.j1 = t2.j2                and t2.n30 = 25
select count(*) from  t1, t2 where  t1.j1 = t2.j2 and t1.n04 = 2 and t2.n30 = 25

egrep -e"Join Card" orcl12c_ora_10447*.trc

orcl12c_ora_10447_BASELINE.trc:Join Card:  1607.500000 = outer (800.000000) * inner (100.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ1.trc:Join Card:  401.875000 = outer (800.000000) * inner (25.000000) * sel (0.020094)
orcl12c_ora_10447_FILTERJ2.trc:Join Card:  53.583333 = outer (100.000000) * inner (26.666667) * sel (0.020094)
orcl12c_ora_10447_FILTJ1J2.trc:Join Card:  13.395833 = outer (26.666667) * inner (25.000000) * sel (0.020094)

As you can see in all 4 cases, Oracle reports an inner and outer cardinality estimate and a join selectivity. The join selectivity remains unchanged throughout; it’s the value we can derive from our pure join test (0.020094 = 1607.5 / (100 * 800)). All that changes is that the individual table predicates are applied to the base tables before the join selectivity is applied to the product of the filtered base table cardinalities:

  • Column n04 has 4 distinct values in 100 rows – filter cardinality = 100/4 = 25
  • Column n30 has 30 distinct values in 800 rows – filter cardinality = 800/30 = 26.66666…

Conclusion

For a single column equijoin on columns with no nulls where one column has a frequency histogram and the other has a top-frequency histogram the optimizer calculates the “pure” join cardinality using the overlapping range of column values and two approximating frequencies, then derives the filtered cardinality by applying the base table filters, calculates the cardinality of the cartesian join of the filtered data sets, then multiplies by the pure join selectivity.

 

 

**Important Footnote  Until Chinar Aliyev questioned what I had written, I had never noticed that the “half the lowest frequency” that I describe at various point in the arithmetic was anything other than a fixed fudge factor. In fact, in perfect symmetry with the expression used for the average selectivity in the top-frequency part of the calculcation, this “fudge factor” is simple “num_rows * column_density” for the column with the frequency histogram. (Whether the “half the lowest frequency” drops out as a side effect of the density calculation, or whether the column density is derived from half the lowest frequency is another matter.)

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part II (Aladdin Sane)

In Part I of this series, I highlighted how a Non-Partitioned Global Index on a Partitioned Table is able to effectively perform “Partition Pruning” by reading only the associated index entries to access just the table blocks of interest from relevant table partitions when the table partitioned keys are specified in an SQL Predicate. Understanding […]