Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Nologging

Bobby Durrett recently published a note about estimating the volume of non-logged blocks written by an instance with the aim of getting some idea of the extra redo that would be generated if a database were switched to “force logging”.

Since my most recent blog notes have included various extracts and summaries from the symbolic dumps of redo logs it occurred to me that another strategy for generating the same information would be to dump the redo generated by Oracle when it wanted to log some information about non-logged blocks. This may sound like a contradiction, of course, but it’s the difference between data and meta-data: if Oracle wants to write data blocks to disc without logging their contents it needs to write a note into the redo log saying “there is no log of the contents of these blocks”.

In terms of redo op codes this is done through “layer 19”, the set of op codes relating to direct path loads, with op code 19.2 being the specific “invalidate range” one that we are (probably)interested in.

So here’s a little demo of extracting the information we need:

rem
rem     Script:         nologging_writes.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2019
rem
rem     Last tested 
rem             12.2.0.1
rem

column c_scn new_value m_scn_1
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

create table t1 nologging
as
select  * 
from    all_objects
where   rownum <= 10000
/

column c_scn new_value m_scn_2
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'TABLE';
alter system dump redo scn min &m_scn_1 scn max &m_scn_2 layer 19;


create index t1_i1
on t1(object_name, owner, object_id)
pctfree 80
nologging
/

column c_scn new_value m_scn_3
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'INDEX';
alter system dump redo scn min &m_scn_2 scn max &m_scn_3 layer 19;


insert /*+ append */ into t1
select * from t1
/

column c_scn new_value m_scn_4
select to_char(current_scn,'999999999999999999999999') c_scn from v$database;

alter session set tracefile_identifier = 'APPEND';
alter system dump redo scn min &m_scn_3 scn max &m_scn_4 layer 19;


I’ve executed a “create table nologging”, a “create index nologging”, then an “insert /*+ append */” into the nologging table. I’ve captured the current SCN before and after each call, added an individual identifier to the tracefile name for each call, then dumped the redo between each pair of SCNs, restricting the dump to layer 19. (I could have been more restrictive and said “layer 19 opcode 2”, but there is an opcode 19.4 which might also be relevant – though I don’t know when it might appear.)

Here’s the list of trace files I generated, plus a couple extra that appeared around the same time:

 ls -ltr *.trc | tail -6
-rw-r----- 1 oracle oinstall 361355 Sep 12 19:44 orcl12c_ora_23630.trc
-rw-r----- 1 oracle oinstall   5208 Sep 12 19:44 orcl12c_ora_23630_TABLE.trc
-rw-r----- 1 oracle oinstall  27434 Sep 12 19:44 orcl12c_ora_23630_INDEX.trc
-rw-r----- 1 oracle oinstall   2528 Sep 12 19:44 orcl12c_ora_23630_APPEND.trc
-rw-r----- 1 oracle oinstall 162633 Sep 12 19:45 orcl12c_mmon_3042.trc
-rw-r----- 1 oracle oinstall 162478 Sep 12 19:45 orcl12c_gen0_2989.trc


And having identified the trace files we can now extract the block invalidation records (I’ve inserted blank lines to separate the results from the three separate files):

grep OP orcl12c_ora_23630_*.trc

orcl12c_ora_23630_APPEND.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058001bd BLKS:0x0043 OBJ:125947 SCN:0x00000b860da6e1a5 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_APPEND.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800482 BLKS:0x006e OBJ:125947 SCN:0x00000b860da6e1a5 SEQ:1 OP:19.2 ENC:0 FLG:0x0000

orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000c4 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e162 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000c8 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e162 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
...
...     70 lines deleted
...
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800424 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e181 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800428 BLKS:0x0004 OBJ:125948 SCN:0x00000b860da6e181 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_INDEX.trc:CHANGE #1 MEDIA RECOVERY MARKER CON_ID:3 SCN:0x0000000000000000 SEQ:0 OP:18.3 ENC:0 FLG:0x0000

orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800103 BLKS:0x000d OBJ:125947 SCN:0x00000b860da6e13e SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800111 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e140 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800121 BLKS:0x0007 OBJ:125947 SCN:0x00000b860da6e141 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800268 BLKS:0x0008 OBJ:125947 SCN:0x00000b860da6e142 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800271 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e144 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800081 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e146 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800091 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e148 SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000a1 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e14a SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x058000b1 BLKS:0x000f OBJ:125947 SCN:0x00000b860da6e14c SEQ:1 OP:19.2 ENC:0 FLG:0x0000
orcl12c_ora_23630_TABLE.trc:CHANGE #1 INVLD CON_ID:3 AFN:22 DBA:0x05800182 BLKS:0x003b OBJ:125947 SCN:0x00000b860da6e14c SEQ:1 OP:19.2 ENC:0 FLG:0x0000

Each line records the number of blocks (BLKS:) allocated and, as you can see, the APPEND trace shows much larger allocations than the TABLE trace (except for the last one) because the tablespace is locally managed with system allocated extents, and the first few invalidation records for the table creation are in the initial 8 block (64KB) extents; by the time we get to the last few blocks of the initial table creation we’ve just allocated the first 128 block (1MB) extent, which is why the last invalidation record for the table can cover so many more blocks than than the first few.

It is interesting to note, though, that the invalidation record for the INDEX trace are all small, typically 4 blocks, sometimes 3, even when we’ve obviously got to a point where we’re allocating from extents of 128 blocks.

I believe that somewhere I have a note explaining that the invalidation records always identified batches of 5 blocks in older versions of Oracle – but that may simply have been a consequence of the way that freelist management used to work (allocating 5 blocks at a time from the segment to the master freelist).

Although we could simply list all the invalidation records and sum the block counts manually we could be a little smarter with our code, summing them with awk, for example.

grep -n "OP:19.2" orcl12c_ora_23630_TABLE.trc |
     sed 's/.*BLKS://' |
     sed 's/ .*$//'  |
     awk '{m = m + strtonum($0) ; printf("%8i %8i \n",strtonum($0),m)}'
 
      13       13 
      15       28 
       7       35 
       8       43 
      15       58 
      15       73 
      15       88 
      15      103 
      15      118 
      59      177 

It’s left as an exercise to the Unix enthusiast to work out how to take the base tracefile name extract all the sets of data, cater for the odd 18.3 records (whose presence I didn’t request), report any lines for 19.x rows other than 19.2 and sum BLKS separately by TABLE, INDEX, and APPEND.

Once you’ve summed the number of blocks across all the invalidation records (and assuming you’re using the standard 8KB block size) the increease in the volume of redo generated if you alter the database to force logging will be (8KB + a little bit) * number of blocks.  The “little bit” will be close to 44 bytes.

If you’ve set your database up to use multiple block sizes you’ll have to aggregate the invalidation recrords by the AFN (absolute file number) entry and check which files use which block size and multiply up accordingly. And if you’re using a pluggable database (as I was) inside a container database you might also want to filter the redo dump by CON_ID.

If you do set the database to force logging and repeat the search for layer 19 in the redo  you’ll find that each individual data block written using a direct path write generates its own redo record, which will have length “data block size + 44” bytes and hold a single change vector of type 19.1 (Direct Loader block redo entry).

Footnote

It’s worth mentioning, that the dump of redo will go back into the archived redo logs in order to cover the entire range requested by the SCN man/max valeus; so it would be perfectly feasible (though possibly a little time and I/O consuming) to run the report across a full 24 hour window.

Dead Connection Detection (DCD) and the Oracle database

Dead Connection Detection is a useful feature of the Oracle database: it allows for the cleanup of “dead” sessions so they don’t linger around consuming memory and other system resources. The idea is simple: if the database detects that a client process is no longer connected to its server process, it cleans up. This can happen in many ways, in most cases this kind of problem is triggered by an end user.

A dead connection shouldn’t be confused with idle connections: an idle connection still maintains the network link between client and server process, except that there is no activity. Idle connections aren’t maintained/controlled via DCD, there are other tools in the database handling such cases.

As a by product, DCD can also help with overly eager firewalls forcibly removing seemingly idle network connections. I found the following posts and the references therein very useful:

With Oracle 12c Oracle changed the way DCD works by no longer relying on its network layer but rather pushing the functionality into the TCP stack on platforms that support it. This change in behaviour is also explained in an Oracle white paper from 2014.

For the first part of this post I set sqlnet.expire_time to 1 as per a MOS note I found, your value is probably different. The parameter is documented in the Net*8 reference, please use it to work out what the best value is for you. As others have pointed out, this parameter has to go into the RDBMS home, more specifically $ORACLE_HOME/network/admin/sqlnet.ora. I am using Oracle 19.4 on Oracle Linux 7.7 for this blog post. I have seen the same behaviour in 12.1.0.2 as well in my lab.

Background

In addition to the proven methods of checking whether TCP_KEEPALIVE is enabled for a given session I wanted to show another one. Using the ss(8) utility it is possible to show socket options. I also tried lsof but on my system I couldn’t get it to print the options:

SQL> select spid from v$process where addr = (select paddr from v$session where username = 'MARTIN');

SPID
------------------------
13656

SQL> exit 

...

[root@server2 ~]# lsof -nP -p 13656 -T f
lsof: unsupported TCP/TPI info selection: f
lsof 4.87

Although the man-page for lsof reads:

       -T [t]   controls the reporting of some TCP/TPI information,  also  reported  by  net‐
                stat(1),  following  the network addresses.  In normal output the information
                appears in parentheses, each item except TCP or TPI state name identified  by
                a keyword, followed by `=', separated from others by a single space:

                     
                     QR=
                     QS=
                     SO=
                     SS=
                     TF=
                     WR=
                     WW=

                Not all values are reported for all UNIX dialects.  Items values (when avail‐
                able) are reported after the item name and '='.

                When the field output mode is in effect  (See  OUTPUT  FOR  OTHER  PROGRAMS.)
                each item appears as a field with a `T' leading character.

                -T with no following key characters disables TCP/TPI information reporting.

                -T with following characters selects the reporting of specific TCP/TPI infor‐
                mation:

                     f    selects reporting of socket options,
                          states and values, and TCP flags and
                          values.

So let’s try something else: ss(8) – another utility to investigate sockets

Revealing socket options

I have used ss(8) in the past when I didn’t have netstat available, which is more and more common now that netstat is deprecated and its designated successor is ss :)

As far as I know you can’t limit ss to show information just for a PID, I use grep to limit the output. The output is in fact very wide, which is why this might not look pretty on the blog depending on whether the renderer decides to wrap output or not.

[root@server2 ~]# ss -nop | egrep 'NetidState|13656'
NetidState Recv-Q Send-Q                                    Local Address:Port                                      Peer Address:Port                                                                                                           
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,4.412ms,0)        
[root@server2 ~]# 

I used the following options:

  • -n for “Do not try to resolve service names”
  • -o for “Show timer information” and finally
  • -p to “Show process using socket”.

The main option here is -o. As per the man page:

       -o, --options
              Show timer information. For tcp protocol, the output format is:

              timer:(,,)

              
                     the name of the timer, there are five kind of timer names:

                     on: means one of these timers: tcp retrans timer, tcp early retrans timer and tail loss probe timer
                     keepalive: tcp keep alive timer
                     timewait: timewait stage timer
                     persist: zero window probe timer
                     unknown: none of the above timers

              
                     how long time the timer will expire

              
                     how many times the retran occurs

With a little shell loop I can show how that timer is decrementing:

[root@server2 ~]# for i in $(seq 1 5); do ss -nop | grep 13656; sleep 1 ; done
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,20sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,19sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,18sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,17sec,0)          
tcp  ESTAB 0      0                               [::ffff:192.168.100.22]:1521                           [::ffff:192.168.100.21]:25780                           users:(("oracle_13656_si",pid=13656,fd=18)) timer:(keepalive,16sec,0)          
[root@server2 ~]# 

Summary

Using the ss utility it is possible to check if a keepalive timer is implemented as a means to support DCD with 12.1 and later releases. Invoking ss(8) hasn’t caused any problems on my system, but as with every such tool you need to ensure it’s safe to use before attempting to look at an important system.

UKOUG 2019

At the UKOUG TechFest 2019 event in Brighton this year I’ll be chairing the CBO Panel session on Monday 2nd December at noon. The panellists will be Maria Colgan, Nigel Bayliss, Christian Antognini and Richard Foote.

It will be possible to propose questions on the day – written or spoken, but this if you have a question that you’d like to give the panellists a little warning about then you can:

Tweeting is particularly good (and I’ll copy any tweeted questions to this note) because 280 characters is long enough to be an interesting question but short enough for people to identify the key point.

 

Installing an #Exasol 2+0 Cluster on Hyper-V

After having installed Hyper-V, an ISO file with the Exasol software needs to be downloaded. The 2+0 cluster consisting of two data nodes and one license server needs a host machine with at least 8 GB memory and 60 GB free disk space. I do it on my Windows 10 notebook with 16 GB memory. This is for educational purposes of course and not suitable for production use.

2+0 means two active nodes and no reserve node. A reserve node can be added later, expanding the environment to a 2+1 cluster.

We’ll start adding two virtual switches to be able to connect the network adapter of the VMs to them later.

Create Hyper-V Switches for Private and Public Network of the cluster

Click the Virtual Switch Manager in Hyper-V Manager:

https://uhesse.files.wordpress.com/2019/09/hyper1.png?w=1240&h=604 1240w, https://uhesse.files.wordpress.com/2019/09/hyper1.png?w=150&h=73 150w, https://uhesse.files.wordpress.com/2019/09/hyper1.png?w=300&h=146 300w, https://uhesse.files.wordpress.com/2019/09/hyper1.png?w=768&h=374 768w, https://uhesse.files.wordpress.com/2019/09/hyper1.png?w=1024&h=498 1024w" sizes="(max-width: 620px) 100vw, 620px" />

 

Select Private and click Create Virtual Switch:

https://uhesse.files.wordpress.com/2019/09/hyper2.png?w=1240&h=476 1240w, https://uhesse.files.wordpress.com/2019/09/hyper2.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2019/09/hyper2.png?w=300&h=115 300w, https://uhesse.files.wordpress.com/2019/09/hyper2.png?w=768&h=295 768w, https://uhesse.files.wordpress.com/2019/09/hyper2.png?w=1024&h=393 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Name the Private switch edu-cluster-interconnect:

https://uhesse.files.wordpress.com/2019/09/hyper3.png?w=1240&h=588 1240w, https://uhesse.files.wordpress.com/2019/09/hyper3.png?w=150&h=71 150w, https://uhesse.files.wordpress.com/2019/09/hyper3.png?w=300&h=142 300w, https://uhesse.files.wordpress.com/2019/09/hyper3.png?w=768&h=364 768w, https://uhesse.files.wordpress.com/2019/09/hyper3.png?w=1024&h=485 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Create another Virtual Switch of type Internal and name it edu-public-network:

https://uhesse.files.wordpress.com/2019/09/hyper4.png?w=1238&h=588 1238w, https://uhesse.files.wordpress.com/2019/09/hyper4.png?w=150&h=71 150w, https://uhesse.files.wordpress.com/2019/09/hyper4.png?w=300&h=142 300w, https://uhesse.files.wordpress.com/2019/09/hyper4.png?w=768&h=365 768w, https://uhesse.files.wordpress.com/2019/09/hyper4.png?w=1024&h=486 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Go to Network and Sharing Center and click on the edu-public-network there, then on Properties:

https://uhesse.files.wordpress.com/2019/09/hyper5.png?w=1240&h=832 1240w, https://uhesse.files.wordpress.com/2019/09/hyper5.png?w=150&h=101 150w, https://uhesse.files.wordpress.com/2019/09/hyper5.png?w=300&h=201 300w, https://uhesse.files.wordpress.com/2019/09/hyper5.png?w=768&h=516 768w, https://uhesse.files.wordpress.com/2019/09/hyper5.png?w=1024&h=688 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Then select TCP/IPv4 and click again Properties:

https://uhesse.files.wordpress.com/2019/09/hyper6.png?w=150&h=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper6.png?w=300&h=298 300w, https://uhesse.files.wordpress.com/2019/09/hyper6.png?w=768&h=762 768w, https://uhesse.files.wordpress.com/2019/09/hyper6.png 898w" sizes="(max-width: 620px) 100vw, 620px" />

Select Use the following IP address and type in 192.168.43.1 and the Subnet mask 255.255.255.0

https://uhesse.files.wordpress.com/2019/09/hyper7.png?w=131&h=150 131w, https://uhesse.files.wordpress.com/2019/09/hyper7.png?w=262&h=300 262w, https://uhesse.files.wordpress.com/2019/09/hyper7.png?w=768&h=878 768w, https://uhesse.files.wordpress.com/2019/09/hyper7.png 982w" sizes="(max-width: 620px) 100vw, 620px" />

The IP above can be different as long as it doesn’t conflict with other IPs used in your virtual environment. I find it convenient to let it end with the digit 1, though.

Create the License Server

In Hyper-V Manager, click New then Virtual Machine:

https://uhesse.files.wordpress.com/2019/09/hyper8.png?w=1974&h=436 1974w, https://uhesse.files.wordpress.com/2019/09/hyper8.png?w=150&h=33 150w, https://uhesse.files.wordpress.com/2019/09/hyper8.png?w=300&h=66 300w, https://uhesse.files.wordpress.com/2019/09/hyper8.png?w=768&h=169 768w, https://uhesse.files.wordpress.com/2019/09/hyper8.png?w=1024&h=226 1024w" sizes="(max-width: 987px) 100vw, 987px" />

Do not use Quick Create.

Give it the name ls and click Next:

https://uhesse.files.wordpress.com/2019/09/hyper9.png?w=1240&h=740 1240w, https://uhesse.files.wordpress.com/2019/09/hyper9.png?w=150&h=90 150w, https://uhesse.files.wordpress.com/2019/09/hyper9.png?w=300&h=179 300w, https://uhesse.files.wordpress.com/2019/09/hyper9.png?w=768&h=458 768w, https://uhesse.files.wordpress.com/2019/09/hyper9.png?w=1024&h=611 1024w" sizes="(max-width: 620px) 100vw, 620px" />

It must be Generation 1:

https://uhesse.files.wordpress.com/2019/09/hyper10.png?w=1580&h=436 1580w, https://uhesse.files.wordpress.com/2019/09/hyper10.png?w=150&h=41 150w, https://uhesse.files.wordpress.com/2019/09/hyper10.png?w=300&h=83 300w, https://uhesse.files.wordpress.com/2019/09/hyper10.png?w=768&h=212 768w, https://uhesse.files.wordpress.com/2019/09/hyper10.png?w=1024&h=283 1024w" sizes="(max-width: 790px) 100vw, 790px" />

Give it 1500 MB memory if your host has >= 16 GB memory, else give it 1100 MB. De-select Use Dynamic Memory:

https://uhesse.files.wordpress.com/2019/09/hyper11.png?w=1556&h=364 1556w, https://uhesse.files.wordpress.com/2019/09/hyper11.png?w=150&h=35 150w, https://uhesse.files.wordpress.com/2019/09/hyper11.png?w=300&h=70 300w, https://uhesse.files.wordpress.com/2019/09/hyper11.png?w=768&h=180 768w, https://uhesse.files.wordpress.com/2019/09/hyper11.png?w=1024&h=240 1024w" sizes="(max-width: 778px) 100vw, 778px" />

Do not yet connect it to a network:

https://uhesse.files.wordpress.com/2019/09/hyper12.png?w=1536&h=272 1536w, https://uhesse.files.wordpress.com/2019/09/hyper12.png?w=150&h=27 150w, https://uhesse.files.wordpress.com/2019/09/hyper12.png?w=300&h=53 300w, https://uhesse.files.wordpress.com/2019/09/hyper12.png?w=1024&h=181 1024w" sizes="(max-width: 768px) 100vw, 768px" />

Give it a disk of 100 GB maximum size:

https://uhesse.files.wordpress.com/2019/09/hyper13.png?w=1518&h=406 1518w, https://uhesse.files.wordpress.com/2019/09/hyper13.png?w=150&h=40 150w, https://uhesse.files.wordpress.com/2019/09/hyper13.png?w=300&h=80 300w, https://uhesse.files.wordpress.com/2019/09/hyper13.png?w=768&h=205 768w, https://uhesse.files.wordpress.com/2019/09/hyper13.png?w=1024&h=273 1024w" sizes="(max-width: 759px) 100vw, 759px" />

Select Install operating system later:

https://uhesse.files.wordpress.com/2019/09/hyper14.png?w=1462&h=540 1462w, https://uhesse.files.wordpress.com/2019/09/hyper14.png?w=150&h=55 150w, https://uhesse.files.wordpress.com/2019/09/hyper14.png?w=300&h=111 300w, https://uhesse.files.wordpress.com/2019/09/hyper14.png?w=768&h=284 768w, https://uhesse.files.wordpress.com/2019/09/hyper14.png?w=1024&h=378 1024w" sizes="(max-width: 731px) 100vw, 731px" />

After creating ls, click on Settings:

https://uhesse.files.wordpress.com/2019/09/hyper15.png?w=1240&h=910 1240w, https://uhesse.files.wordpress.com/2019/09/hyper15.png?w=150&h=110 150w, https://uhesse.files.wordpress.com/2019/09/hyper15.png?w=300&h=220 300w, https://uhesse.files.wordpress.com/2019/09/hyper15.png?w=768&h=564 768w, https://uhesse.files.wordpress.com/2019/09/hyper15.png?w=1024&h=751 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Add a Legacy Network Adapter:

https://uhesse.files.wordpress.com/2019/09/hyper16.png?w=1240&h=488 1240w, https://uhesse.files.wordpress.com/2019/09/hyper16.png?w=150&h=59 150w, https://uhesse.files.wordpress.com/2019/09/hyper16.png?w=300&h=118 300w, https://uhesse.files.wordpress.com/2019/09/hyper16.png?w=768&h=302 768w, https://uhesse.files.wordpress.com/2019/09/hyper16.png?w=1024&h=402 1024w" sizes="(max-width: 620px) 100vw, 620px" />

We need Legacy Adapters in the VMs because later, the data nodes need to boot from network and that requires Legacy Adapters to be used. Connect it to the Virtual Switch edu-cluster-interconnect and click Apply.

Connect a normal (not legacy) Network Adapter to the switch edu-public-network:

https://uhesse.files.wordpress.com/2019/09/hyper17.png?w=1240&h=1104 1240w, https://uhesse.files.wordpress.com/2019/09/hyper17.png?w=150&h=134 150w, https://uhesse.files.wordpress.com/2019/09/hyper17.png?w=300&h=267 300w, https://uhesse.files.wordpress.com/2019/09/hyper17.png?w=768&h=684 768w, https://uhesse.files.wordpress.com/2019/09/hyper17.png?w=1024&h=912 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm the boot order is CD first for ls:

https://uhesse.files.wordpress.com/2019/09/hyper18.png?w=1240&h=472 1240w, https://uhesse.files.wordpress.com/2019/09/hyper18.png?w=150&h=57 150w, https://uhesse.files.wordpress.com/2019/09/hyper18.png?w=300&h=114 300w, https://uhesse.files.wordpress.com/2019/09/hyper18.png?w=768&h=292 768w, https://uhesse.files.wordpress.com/2019/09/hyper18.png?w=1024&h=389 1024w" sizes="(max-width: 620px) 100vw, 620px" />

Now power on the virtual machine just created and configured. It should come up with this screen and you type in install

https://uhesse.files.wordpress.com/2019/09/hyper19-1.png?w=150&h=79 150w, https://uhesse.files.wordpress.com/2019/09/hyper19-1.png?w=300&h=158 300w, https://uhesse.files.wordpress.com/2019/09/hyper19-1.png 713w" sizes="(max-width: 620px) 100vw, 620px" />

There’s no need for device encryption on an educational system. Tab to OK and press return:

https://uhesse.files.wordpress.com/2019/09/hyper20-1.png?w=150&h=76 150w, https://uhesse.files.wordpress.com/2019/09/hyper20-1.png?w=300&h=152 300w, https://uhesse.files.wordpress.com/2019/09/hyper20-1.png 711w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm the LS number as 10. Tab to OK and press return:

https://uhesse.files.wordpress.com/2019/09/hyper21.png?w=150&h=74 150w, https://uhesse.files.wordpress.com/2019/09/hyper21.png?w=300&h=149 300w, https://uhesse.files.wordpress.com/2019/09/hyper21.png 711w" sizes="(max-width: 620px) 100vw, 620px" />

Confirm to install EXAClusterOS on first boot. Tab to OK and press return:

https://uhesse.files.wordpress.com/2019/09/hyper22.png?w=150&h=74 150w, https://uhesse.files.wordpress.com/2019/09/hyper22.png?w=300&h=149 300w, https://uhesse.files.wordpress.com/2019/09/hyper22.png 715w" sizes="(max-width: 620px) 100vw, 620px" />

Enter exasol12 as the password for the maintenance user:

https://uhesse.files.wordpress.com/2019/09/hyper23.png?w=150&h=74 150w, https://uhesse.files.wordpress.com/2019/09/hyper23.png?w=300&h=149 300w, https://uhesse.files.wordpress.com/2019/09/hyper23.png 715w" sizes="(max-width: 620px) 100vw, 620px" />

Check with ipconfig (using the cmd shell) for the Ethernet adapter vEthernet (edu-public-network): Then replace the last digit 1 with 10 for the IP of the public interface. It should be 192.168.43.10

https://uhesse.files.wordpress.com/2019/09/hyper24.png?w=1240&h=282 1240w, https://uhesse.files.wordpress.com/2019/09/hyper24.png?w=150&h=34 150w, https://uhesse.files.wordpress.com/2019/09/hyper24.png?w=300&h=68 300w, https://uhesse.files.wordpress.com/2019/09/hyper24.png?w=768&h=175 768w, https://uhesse.files.wordpress.com/2019/09/hyper24.png?w=1024&h=233 1024w" sizes="(max-width: 620px) 100vw, 620px" />

https://uhesse.files.wordpress.com/2019/09/hyper25.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper25.png?w=300 300w" sizes="(max-width: 485px) 100vw, 485px" />

Check the network configuration comparing it to the tab Networking in Hyper-V Manager

https://uhesse.files.wordpress.com/2019/09/hyper26.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper26.png?w=300 300w" sizes="(max-width: 511px) 100vw, 511px" />

Private must match edu-cluster-interconect and Public must match edu-public-network

https://uhesse.files.wordpress.com/2019/09/hyper27.png?w=150&h=27 150w, https://uhesse.files.wordpress.com/2019/09/hyper27.png?w=300&h=54 300w, https://uhesse.files.wordpress.com/2019/09/hyper27.png 684w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Configure in the Advanced Network configuration to change it accordingly otherwise:

https://uhesse.files.wordpress.com/2019/09/hyper28.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper28.png?w=300 300w" sizes="(max-width: 533px) 100vw, 533px" />

Confirm the (corrected) network configuration. Tab to OK and press return:

https://uhesse.files.wordpress.com/2019/09/hyper29.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper29.png?w=300 300w" sizes="(max-width: 476px) 100vw, 476px" />

This triggers the installation of various components on the License Server and takes some time, depending on the hardware resources of your machine. The critical resource is CPU, so try to avoid running anything else on your machine that consumes CPU during the install.

If the License Server reboots during that process and presents the choice to install, update or boot from local disk, type in local. You may then remove the virtual CD from the CD drive of the VM.

When prompted for login on the node n0010, you may login as the OS user maintenance with the password exasol12 :

https://uhesse.files.wordpress.com/2019/09/hyper30.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper30.png?w=300 300w" sizes="(max-width: 498px) 100vw, 498px" />

Refresh until it shows EXAoperation status: Online

This may take several Minutes depending on your hardware.

Connect to EXAoperation

Now open a Chrome browser (preferred) with the https URL of your License Server’s IP:

https://192.168.43.10

https://uhesse.files.wordpress.com/2019/09/hyper31.png?w=150&h=103 150w, https://uhesse.files.wordpress.com/2019/09/hyper31.png?w=300&h=207 300w, https://uhesse.files.wordpress.com/2019/09/hyper31.png 630w" sizes="(max-width: 620px) 100vw, 620px" />

User Name is admin, Password is admin

Check Network Configuration

The network configuration should look like this:

https://uhesse.files.wordpress.com/2019/09/hyper32.png?w=150&h=143 150w, https://uhesse.files.wordpress.com/2019/09/hyper32.png?w=300&h=286 300w, https://uhesse.files.wordpress.com/2019/09/hyper32.png 756w" sizes="(max-width: 620px) 100vw, 620px" />

Click on Edit and modify it to the above settings otherwise. You may change the Cluster Name to something else. Notice that the value None is not allowed for NTP Servers and you have to remove it. Then click Apply.

Add Monitoring Service

To be able to follow the node’s installation, it’s useful to add a monitoring service now. Click on Services at the left and then on the Add button:

https://uhesse.files.wordpress.com/2019/09/monitoringservice.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/monitoringservice.png?w=300 300w" sizes="(max-width: 560px) 100vw, 560px" />

Select all cluster services with the Log Priority Information to get the most verbose output, then click Add.

Add first data node

In Hyper-V Manager, create another VM with the name n11:

https://uhesse.files.wordpress.com/2019/09/hyper33.png?w=150&h=76 150w, https://uhesse.files.wordpress.com/2019/09/hyper33.png?w=300&h=151 300w, https://uhesse.files.wordpress.com/2019/09/hyper33.png 707w" sizes="(max-width: 620px) 100vw, 620px" />

Specify Generation 1 on next page.

Give it 2500 MB memory if your host has >=16 GB  and 2200 MB otherwise and de-select dynamic memory:

https://uhesse.files.wordpress.com/2019/09/hyper34.png?w=150&h=57 150w, https://uhesse.files.wordpress.com/2019/09/hyper34.png?w=300&h=114 300w, https://uhesse.files.wordpress.com/2019/09/hyper34.png 697w" sizes="(max-width: 620px) 100vw, 620px" />

Give it a 100 GB max. sized disk and name it n11sda:

https://uhesse.files.wordpress.com/2019/09/hyper35.png?w=150&h=53 150w, https://uhesse.files.wordpress.com/2019/09/hyper35.png?w=300&h=107 300w, https://uhesse.files.wordpress.com/2019/09/hyper35.png 682w" sizes="(max-width: 620px) 100vw, 620px" />

Select install OS later and finish the creation dialog. Then click on settings. Add a legacy network adapter to n11 and connect it to the switch edu-cluster-interconnect:

https://uhesse.files.wordpress.com/2019/09/hyper36.png?w=150&h=58 150w, https://uhesse.files.wordpress.com/2019/09/hyper36.png?w=300&h=117 300w, https://uhesse.files.wordpress.com/2019/09/hyper36.png 718w" sizes="(max-width: 620px) 100vw, 620px" />

Connect another normal network adapter to the switch edu-public-network.

Configure the boot order for n11 with legacy network adapter first:

https://uhesse.files.wordpress.com/2019/09/hyper37.png?w=150&h=56 150w, https://uhesse.files.wordpress.com/2019/09/hyper37.png?w=300&h=112 300w, https://uhesse.files.wordpress.com/2019/09/hyper37.png 697w" sizes="(max-width: 620px) 100vw, 620px" />

Add a second hard drive to n11:

https://uhesse.files.wordpress.com/2019/09/hyper38.png?w=150&h=67 150w, https://uhesse.files.wordpress.com/2019/09/hyper38.png?w=300&h=134 300w, https://uhesse.files.wordpress.com/2019/09/hyper38.png 701w" sizes="(max-width: 620px) 100vw, 620px" />

Select New Virtual Harddisk of type VHDX, dynamically expanding up to 100 GB max. and name it n11sdb:

https://uhesse.files.wordpress.com/2019/09/hyper39.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper39.png?w=300 300w" sizes="(max-width: 481px) 100vw, 481px" />

We need to know the MAC addresses of the data nodes upfront to enter them in EXAoperation, so the data nodes can connect to the license server to get booted over network. The MAC addresses below can be different as long as you know they do not conflict with other machines in your virtual environment,

Select Advanced Features of the legacy network adapter and configure it with a static MAC address of 08:00:27:A9:D1:22

https://uhesse.files.wordpress.com/2019/09/hyper40.png?w=150&h=92 150w, https://uhesse.files.wordpress.com/2019/09/hyper40.png?w=300&h=184 300w, https://uhesse.files.wordpress.com/2019/09/hyper40.png 651w" sizes="(max-width: 620px) 100vw, 620px" />

Select Advanced Features of the normal network adapter and configure a static MAC address of  08:00:27:92:10:6B

https://uhesse.files.wordpress.com/2019/09/hyper41.png?w=150&h=108 150w, https://uhesse.files.wordpress.com/2019/09/hyper41.png?w=300&h=217 300w, https://uhesse.files.wordpress.com/2019/09/hyper41.png 663w" sizes="(max-width: 620px) 100vw, 620px" />

Do not yet power on the new VM. Instead, click on Add in the EXAoperation Nodes screen:

https://uhesse.files.wordpress.com/2019/09/hyper42.png?w=150&h=89 150w, https://uhesse.files.wordpress.com/2019/09/hyper42.png?w=300&h=177 300w" sizes="(max-width: 620px) 100vw, 620px" />

Configure the added node:

https://uhesse.files.wordpress.com/2019/09/hyper43.png?w=91 91w, https://uhesse.files.wordpress.com/2019/09/hyper43.png?w=182 182w" sizes="(max-width: 497px) 100vw, 497px" />

Click on Add, then click on the new node to configure it further:

https://uhesse.files.wordpress.com/2019/09/hyper44.png?w=150&h=70 150w, https://uhesse.files.wordpress.com/2019/09/hyper44.png?w=300&h=141 300w, https://uhesse.files.wordpress.com/2019/09/hyper44.png 726w" sizes="(max-width: 620px) 100vw, 620px" />

Scroll down on the node’s detail page and click on Disks:

https://uhesse.files.wordpress.com/2019/09/hyper45.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper45.png?w=300 300w" sizes="(max-width: 535px) 100vw, 535px" />

Click on Edit on the EXACluster Disk Information for node n0011 screen:

https://uhesse.files.wordpress.com/2019/09/hyper46.png?w=150&h=33 150w, https://uhesse.files.wordpress.com/2019/09/hyper46.png?w=300&h=66 300w" sizes="(max-width: 620px) 100vw, 620px" />

Add the device /dev/sda with the other values like on the picture:

https://uhesse.files.wordpress.com/2019/09/hyper47.png?w=150&h=112 150w, https://uhesse.files.wordpress.com/2019/09/hyper47.png?w=300&h=224 300w" sizes="(max-width: 620px) 100vw, 620px" />

Do this accordingly for the other three storage partitions. Make sure to reduce the size to 30 GB for the Data partition and assign /dev/sdb for the Storage partition. It should look like this in the end:

https://uhesse.files.wordpress.com/2019/09/hyper48.png?w=150&h=32 150w, https://uhesse.files.wordpress.com/2019/09/hyper48.png?w=300&h=63 300w" sizes="(max-width: 620px) 100vw, 620px" />

Now power on the n11 VM. It should come up booting from network and start installing. Check that the install starts normally without errors:

https://uhesse.files.wordpress.com/2019/09/hyper49.png?w=150&h=69 150w, https://uhesse.files.wordpress.com/2019/09/hyper49.png?w=300&h=139 300w, https://uhesse.files.wordpress.com/2019/09/hyper49.png?w=768&h=355 768w, https://uhesse.files.wordpress.com/2019/09/hyper49.png 1000w" sizes="(max-width: 620px) 100vw, 620px" />

If you see errors in the log at this point, turn off the VM and check the configuration as above described, especially the disk layout. After having corrected it, power the VM on again. Make sure the state of the node shows as Unknown to install (Not Unknown Active) here. Execute the action Set install flag for the node to change it to the state Unknown to install otherwise.

While the install of n11 is ongoing, add the second data node as a Hyper-V VM just like the n11 VM before. Give it the name n12 with all the same properties as n11, except these MAC addresses:

Legacy network adapter (Private Network) 08:00:27:AA:61:8D

Normal network adapter (Public Network)  08:00:27:A3:AB:46

After having created n12 in Hyper-V, go to EXAoperation and click on the n11 node:

https://uhesse.files.wordpress.com/2019/09/hyper50.png?w=150&h=70 150w, https://uhesse.files.wordpress.com/2019/09/hyper50.png?w=300&h=141 300w, https://uhesse.files.wordpress.com/2019/09/hyper50.png 726w" sizes="(max-width: 620px) 100vw, 620px" />

Scroll down on the node’s detail page and click on Copy Node.

Change the node numbers to 12 and the MAC addresses as above listed:

https://uhesse.files.wordpress.com/2019/09/hyper51.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper51.png?w=300 300w" sizes="(max-width: 393px) 100vw, 393px" />

Click on Copy Node again to finish the creation of n12.

Click on the Nodes screen:

https://uhesse.files.wordpress.com/2019/09/hyper52.png?w=150&h=53 150w, https://uhesse.files.wordpress.com/2019/09/hyper52.png?w=300&h=106 300w, https://uhesse.files.wordpress.com/2019/09/hyper52.png?w=768&h=271 768w, https://uhesse.files.wordpress.com/2019/09/hyper52.png 918w" sizes="(max-width: 620px) 100vw, 620px" />

The install of n11 is still ongoing and n12 is now ready to start in Hyper-V. Power it on.

It should come up and install like n11. The install ends with the nodes prompting for login and a message like Boot process finished after xxx seconds in the logservice.

The install consumes much CPU resources. Try to avoid running anything else on the machine while it is ongoing. It may take quite a long while to complete, depending on your local machine power.

It looks like this in the end:

https://uhesse.files.wordpress.com/2019/09/hyper53.png?w=150&h=44 150w, https://uhesse.files.wordpress.com/2019/09/hyper53.png?w=300&h=88 300w, https://uhesse.files.wordpress.com/2019/09/hyper53.png?w=768&h=225 768w, https://uhesse.files.wordpress.com/2019/09/hyper53.png 901w" sizes="(max-width: 620px) 100vw, 620px" />

Now select the two nodes and execute the action Set active flag for them:

https://uhesse.files.wordpress.com/2019/09/hyper54.png?w=150&h=68 150w, https://uhesse.files.wordpress.com/2019/09/hyper54.png?w=300&h=135 300w, https://uhesse.files.wordpress.com/2019/09/hyper54.png 715w" sizes="(max-width: 620px) 100vw, 620px" />

Their state should now change to Running Active.

On the EXAStorage screen, click on Startup Storage Service:

https://uhesse.files.wordpress.com/2019/09/hyper55.png?w=150&h=52 150w, https://uhesse.files.wordpress.com/2019/09/hyper55.png?w=300&h=105 300w, https://uhesse.files.wordpress.com/2019/09/hyper55.png 682w" sizes="(max-width: 620px) 100vw, 620px" />

Now select the two data nodes and click on Add Unused Disks:

https://uhesse.files.wordpress.com/2019/09/hyper56.png?w=150 150w, https://uhesse.files.wordpress.com/2019/09/hyper56.png?w=300 300w" sizes="(max-width: 509px) 100vw, 509px" />

Add a data volume

On the EXAoperation Storage screen, click on Add Volume:

https://uhesse.files.wordpress.com/2019/09/hyper57.png?w=150&h=39 150w, https://uhesse.files.wordpress.com/2019/09/hyper57.png?w=300&h=78 300w, https://uhesse.files.wordpress.com/2019/09/hyper57.png?w=768&h=201 768w, https://uhesse.files.wordpress.com/2019/09/hyper57.png 815w" sizes="(max-width: 620px) 100vw, 620px" />

Specify a Redundancy of 2, add user admin to Allowed Users, Volume Size should be 8 GB, add n11 and n12 to the Nodes List, Number of Master Nodes is 2 and Disk is d03_storage:

https://uhesse.files.wordpress.com/2019/09/hyper58.png?w=107&h=150 107w, https://uhesse.files.wordpress.com/2019/09/hyper58.png?w=214&h=300 214w" sizes="(max-width: 620px) 100vw, 620px" />

The new volume should then look like this:

https://uhesse.files.wordpress.com/2019/09/hyper59.png?w=150&h=31 150w, https://uhesse.files.wordpress.com/2019/09/hyper59.png?w=300&h=63 300w, https://uhesse.files.wordpress.com/2019/09/hyper59.png 740w" sizes="(max-width: 620px) 100vw, 620px" />

Create a database

On the EXASolution screen, click on Add:

https://uhesse.files.wordpress.com/2019/09/hyper60.png?w=150&h=91 150w, https://uhesse.files.wordpress.com/2019/09/hyper60.png?w=300&h=183 300w" sizes="(max-width: 620px) 100vw, 620px" />

On the Add Database screen, specify 2 Active Nodes, assign the two data nodes and click Add

https://uhesse.files.wordpress.com/2019/09/hyper61.png?w=120&h=150 120w, https://uhesse.files.wordpress.com/2019/09/hyper61.png?w=240&h=300 240w, https://uhesse.files.wordpress.com/2019/09/hyper61.png 649w" sizes="(max-width: 620px) 100vw, 620px" />

The new database shows a Status of Not created. Click on the database link:

https://uhesse.files.wordpress.com/2019/09/hyper62.png?w=150&h=34 150w, https://uhesse.files.wordpress.com/2019/09/hyper62.png?w=300&h=68 300w, https://uhesse.files.wordpress.com/2019/09/hyper62.png?w=768&h=175 768w, https://uhesse.files.wordpress.com/2019/09/hyper62.png 804w" sizes="(max-width: 620px) 100vw, 620px" />

Select the Action Create and click Submit:

https://uhesse.files.wordpress.com/2019/09/hyper63.png?w=150&h=31 150w, https://uhesse.files.wordpress.com/2019/09/hyper63.png?w=300&h=61 300w, https://uhesse.files.wordpress.com/2019/09/hyper63.png?w=768&h=156 768w, https://uhesse.files.wordpress.com/2019/09/hyper63.png 849w" sizes="(max-width: 620px) 100vw, 620px" />

The Action should change to Startup, n11 and n12 appear as Selected. Click Submit again:

https://uhesse.files.wordpress.com/2019/09/hyper64.png?w=150&h=30 150w, https://uhesse.files.wordpress.com/2019/09/hyper64.png?w=300&h=59 300w, https://uhesse.files.wordpress.com/2019/09/hyper64.png?w=768&h=152 768w, https://uhesse.files.wordpress.com/2019/09/hyper64.png 894w" sizes="(max-width: 620px) 100vw, 620px" />

The State changes to Starting for n11 and n12. Click on EXASolution on the Services menu on the left:

The new database should now show as Online.

https://uhesse.files.wordpress.com/2019/09/hyper65.png?w=150&h=33 150w, https://uhesse.files.wordpress.com/2019/09/hyper65.png?w=300&h=65 300w, https://uhesse.files.wordpress.com/2019/09/hyper65.png?w=768&h=167 768w, https://uhesse.files.wordpress.com/2019/09/hyper65.png 844w" sizes="(max-width: 620px) 100vw, 620px" />

Congratulations for going the whole way, it was a bit long at times, wasn’t it? But now you have your own Exasol cluster sandbox on Hyper-V – have fun with it!

Going forward, I find it convenient to create Hyper-V checkpoints with the database and storage service shutdown and the nodes having the cluster services shutdown as well. That way I make sure nothing can change on the disks of the data nodes when I take the checkpoint. I can then just turn off the VMs and apply these checkpoints afterwards. Then starting the VMs saves me the time they would need to go through the normal boot process otherwise. My next article will explain how to add a reserve node to this cluster, bringing it from 2+0 to 2+1. Hope you find this mini-series useful </p />
</p></div>

    	  	<div class=

Presenting at UKOUG Techfest19 Conference in Brighton, UK

I’m very excited to be attending my 3rd UKOUG Conference, this year re-badged as Techfest19. The fact it’s being held in Brighton is a little disconcerting for a Crystal Palace fan, but really looking forward nonetheless to what has always been one of the very best Oracle conferences on the yearly calendar. I have a […]

19c High-Frequency statistics gathering and Real-Time Statistics

Those are the two exciting new features about the optimizer statistics which arrived in the latest release of 12cR2: 19c. Less exciting is that we are not allowed to use them in any other platform than Exadata:

https://apex.oracle.com/database-features/

But let’s cross the fingers and hope that this will be released in the future because they solve real-life problems such as Out-of-Range queries. Here is a little example involving both of them. A table starts empty and is growing during the day. Relying only on the statistics gathered during the maintenance window will give bad estimations. And dynamic sampling may not sample the right blocks.

A little example

Here is a little example where I insert one row every second in a DEMO table and look at the statistics.

Initialization

First I initialize the example by creating the table, gathering stats on it and set the global parameter AUTO_TASK_STATUS to ON. In is not obvious from the name, but in the context of DBMS_STATS, this Auto Task is the “high-frequency” one, running outside of the maintenance window, every 15 minutes by default, as opposed to the Auto Job that runs during the maintenance window every 4 hours.

spool hi-freq-stat.log
set echo on time on
whenever sqlerror exit failure
alter session set nls_date_format='dd-MON-yyyy hh24:mi:ss';
create table DEMO(d date,n number(*,2),h varchar2(2),m varchar2(2));
exec dbms_stats.gather_table_stats(user,'DEMO');
show parameter exadata
exec dbms_stats.set_global_prefs('auto_task_status','on');

This is a lab where I simulate Exadata features. High-Frequency Automatic Optimizer Statistics Collection is available only on Exadata.

Run every second

Every second I run this transaction to add one row. The D column will contain SYSDATE at the time of insert:

insert into DEMO (D, N, H, M) values (sysdate
,sysdate-trunc(sysdate,'HH24')
,to_char(sysdate,'HH24')
,to_char(sysdate,'MI'));
select count(*),min(d),max(d) from DEMO;
commit;

Then I query the column statistics for this D column:

exec dbms_stats.flush_database_monitoring_info;
with function d(r raw) return date as o date;
begin dbms_stats.convert_raw_value(r,o); return o; end;
select column_name,d(low_value),d(high_value),num_distinct,notes
from dba_tab_col_statistics where owner=user and table_name='DEMO'
and column_name='D'
/

As an example, here is the 10000th iteration 2 hours 46 minutes after the initialization:

The min value is from 20:22:16 and the max value is this insert at 23:18:19

Without those 19c features, the statistics would have stayed at 0 distinct values, and null low/high, as it was gathered when the table was empty.

However, here I have two statistics here (visible in the dictionary after flushing database monitoring info):

  • Real-Time Statistics (NOTES=STATS_ON_CONVENTIONAL_DML) which is updated when DML occurs and can be used as dynamic statistics without the need to sample. The number of distinct value is not known there (this is not updated on the fly because it would be expensive to know if new values are within the same set of existing values, or not). high/low values are accurate: in this example less than 2 minutes stale (23:16:42 is the high value known at 23:19:18 point-in-time).
  • The regular Statistics (no NOTES) which are not as accurate, but not too stale either: nearly 15 minutes stale (23:02:06 is the high value known at 23:19:18 point-in-time). And they are full statistics (gathered on the table with Auto Sample size): the number of distinct values is the one we had at 23:02:06 when the statistics were gathered by the “high-frequency” task.

Cleanup

To cleanup the test I drop the table and set back the auto_task_status to the default (off):

set termout off
drop table DEMO;
set termout on
exec dbms_stats.set_global_prefs('auto_task_status','off');

High-Frequency statistics gathering task result

I have run this every second for 10 hours with:

for i in {1..36000} ; do echo @ /tmp/sql2.sql ; done

And I awk’d the spool to get information about the distinct statistics we had during that time (regular statistics only) and the first time they were known:

awk '
#ignore real-time stats here
/STATS_ON_CONVENTIONAL_DML/{next}
#store update time
/> commit;/{time=$1}
# stats on column inserted with SYSDATE
/^D/{if (logtime[$0]==""){logtime[$0]=time}}
END{for (i in logtime){print i,logtime[i]}}
' hi-freq-stat.log | sort -u

As the high value is the SYSDATE at the time of insert, this shows the staleness:

Here, I started my script at 20:22:16 with an empty table, and gathered the statistics, then showing 0 rows and null low/high value. Then one row was inserted each second. And the statistics stay until 20:31:30 where they show 523 lines. The high value here is from 20:31:29 when the high-frequency job has run. Those stats were used by the optimizer until 20:46:38 when the task has run again.

All those task and job execution are logged and visible from the same view:

select * from DBA_AUTO_STAT_EXECUTIONS order by OPID;

The detail for this specific tables from DBA_OPTSTAT_OPERATION_TASKS:

select target,start_time,end_time,notes 
from DBA_OPTSTAT_OPERATION_TASKS
where target like '%DEMO%' order by OPID desc

We see the runs every 15 minutes from 20:22:16, then 20:31:30, then 20:46:38… until 23:02:06

Then, I was till inserting at the same rate but the task, still running every 15 minutes, gathered statistics on this table only every 30 minutes: 23:02:06, then 23:32:12… and we see the latest here every 60 minutes only.

What do you think happened for the high-frequency job not gathering statistics on this table 15 minutes after the 23:02:06 run?

Let’s look at the numbers:

  • I insert one row per second
  • The task runs every 15 minutes

This means that when the task runs, I have inserted 900 rows. I didn’t change the default STALE_PERCENT which is 10%. And when those 900 rows do reach the threshold of 10%? When the table has more than 9000 rows.

And now look at the log at 23:02:06 before the task has run:

The statistics show 8222 rows (from DBA_TAB_STATISTICS but as I know they are all unique I guess them from the NUM_DISTINCT in DBA_TAB_COL_STATISTICS) and then the 900 modifications recorded count for 11%. This is higher than 10%, the table statistics are stale and the next task has re-gathered them:

A side note: the “STATS_ON_CONVENTIONAL_DML” disappeared because statistics were just gathered. But that’s for later…

Now that the table is known to have 9078 rows, when will it be stale again? 15 minutes later we will have inserted 900 rows, but 900/9078=9.9% and that’s under the 10% threshold. This is why the next run of the task did not gather statistics again. But after another 15 minutes, then 1800 rows have been inserted and that’s a 19.8% staleness.

You see the picture: this high-frequency task takes care of the very volatile tables. It runs every 15 minutes (the default AUTO_TASK_INTERVAL) and spends no more than 1 hour (the default AUTO_TASK_MAX_RUN_TIME).

Real-Time statistics

And to go further, Real-Time statistics not only count the modifications to evaluate the staleness but also stores information about low and high value.

Here is my awk script now showing the distinct information from the real-time statistics:

awk '
#ignore real-time stats here
#store update time
/> commit;/{time=$1}
# stats on column inserted with SYSDATE
/^D.*STATS_ON_CONVENTIONAL_DML/{if (logtime[$0]==""){logtime[$0]=time}}
END{for (i in logtime){print i,logtime[i]}}
' hi-freq-stat.log | sort -u

and an except around the time where the high-frequency job ran every 15 or 30 minutes only:

While I was inserting every second, the staleness is 2 minutes only. This reduces even further the risk of out-of-range queries.

Basically, the high-frequency task:

  • gathers statistics in the same way as in the maintenance window job (stale tables first) but more frequently and using fewer resources

and real-time statistics:

  • adds some more fresh information, not from gathering on the table but inferring from the DML that was monitored.

A little remark though: there’s no magic to have the new query executions use the new statistics. The high-frequency task just calls dbms_stats with the default rolling window invalidation.

Announcement: Australia/NZ “Let’s Talk Database” Events October 2019 !!

I’ve very excited to announce the next series of Oracle “Let’s Talk Database” events to be run throughout Australia and New Zealand in October 2019. I’ll be discussing two exciting topics this series, “Oracle Database 19c New Features” and “Oracle Exadata X8“. As always, these sessions run between 9am-1pm, include a networking lunch and are free, […]

Oracle Database 19c Automatic Indexing: Default Index Column Order Part II (Future Legend)

In Part I, we explored some options that Oracle might adopt when ordering the columns within an Automatic Index by default, in the absence of other factors where there is only the one SQL statement to be concerned with. A point worth making is that if all columns of an index are specified within SQL […]

Update restarts

Somewhere I think I’ve published a note about an anomaly that’s been bugging me since at least Oracle 10g – but if it is somewhere on the Internet it’s hiding itself very well and I can’t find it, though I have managed to find a few scripts on my laptop that make a casual reference to the side effects of the provlem. [Ed: a tweet from Timur Ahkmadeev has identified a conversation in the comments on an older post that could explain why I thought I’d written something about the issue.]

Anyway, I’ve decided to create some new code and write the article (all over again, maybe). The problem is a strange overhead that can appear when you do a simple but large update driving off a tablescan.

If you just want the conclusion and don’t feel like reading the detail, it’s this: a big update may roll itself back before it gets to the end of the tablescan and restart “invisibly”, doing a “select for update” (2nd tablescan, one row at a time) followed by the update (3rd tablescan, by array) massively increasing the redo and undo generated.

Here’s the script I’ve created to demonstrate the issue. As its header suggests the results I’ll be showing come from 12.1.0.2


rem
rem     Script:         update_restart.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Seo 2019
rem     Purpose:        
rem
rem     Last tested 
rem             12.1.0.2
rem

create table t1 
pctfree 80 
nologging
as
with g as (
        select rownum id 
        from dual 
        connect by level <= 500 -- > comment to avoid wordpress format issue
)
select
        rownum id, 
        lpad(rownum,10) v1
from    g,g
/

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
end;
/

select  blocks, num_rows
from    user_tables 
where   table_name = 'T1'
/

I’ve created a table with 250,000 rows (500 * 500) and gathered stats on it (the table was 3,345 blocks) as  gathering stats tends to sort out any issues of block cleanout that might confuse the issue. Having done that I’m going to do an update of every row in the table:


alter system switch logfile;
execute snap_my_stats.start_snap

update t1 set id = id;

execute snap_my_stats.end_snap
execute dump_log

The update doesn’t actually change the column value (and in 12.2 there’s an interesting change to the Oracle code that comes into play in that case) but it will still generate redo and undo for every row in the table, and update ITL entries and do all the rest of the work we normally associate with an update.

Before the update I’ve issued a “switch logfile” and after the update I’ve called a procedure I wrote to do a symbolic dump of the current log file (“alter system dump logfile”). The calls to the snap_my_stats package allow me to see the changes in the session activity stats due to the update. A simple check of these statistics can often give you a good idea of what’s going on. So you might ask yourself what you might expect to see in the stats from this update. (As Isaac Asimov once said: The most exciting phrase to hear in science, the one that heralds new discoveries, is not ‘Eureka!’, but ‘That’s funny …’)

Having updated 250,000 rows by tablescan I think we might expect to see a tablescan of 250,000 rows; we might also expect to see (in the worst case) something like 250,000 redo entries although we might expect to see far fewer since Oracle can probably do array updates and create one redo change vector and one undo record per block changed rather than per row changed.

So here are some of the more interesting numbers that we actually see (though the specific results are not 100% reproducible):

Name                                                                     Value
----                                                                     -----
session logical reads                                                  270,494
db block gets                                                          263,181
consistent gets                                                          7,313
db block changes                                                       534,032
free buffer requested                                                    7,274
switch current to new buffer                                             3,534
redo entries                                                           265,128
redo size                                                           83,803,996
rollback changes - undo records applied                                  1,002
table scan rows gotten                                                 522,470
table scan blocks gotten                                                 6,881 
HSC Heap Segment Block Changes                                         264,135
Heap Segment Array Updates                                              14,135

A pair of figures that stand out as most surprising are the tablescan figures – I’ve got a table of 3,345 blocks and 250,000 rows, but my tablescans total 6,881 blocks and 522,370 rows – I seem to have scanned the table (slightly more than) twice! At the same time I’ve requested 7,274 free buffers (some of them probably for undo blocks) and “switched current to new buffer” 3,534 times – so I seem to have read every table block then cloned it in memory.

Side note: the “switched current to new buffer” is a common feature of a tablescan update and not particularly relevant to this article. I assume the intent is to “leave behind” a read-consistent copy for any other queries that might have started before the big update.

Another really strange statistic is the one that says “rollback changes – undo records applied” – I’ve used 1,002 undo records in a rollback operation of some sort. I haven’t show you the stats for “user rollbacks” or “transaction rollbacks” because, despite the undo records being applied, there were no rollbacks reported.

What about changes? We see 534,000 “db block changes” – and maybe it’s okay for that statistic to be roughly twice the number of rows we expect to change because for every row we change we have to generate some undo, so may have modify an undo block to create an undo record – but there are still 34,000 “db block changes” too many. The number of changes is fairly consistent with the number of “redo entries”, of course, allowing for a factor of 2 because redo entries often consist of a matching pair of changes, one for the “forward change” and one for the “undo block change”. Again, though, there a bit of an excess: the 34,000 extra “db block changes” seem to have resulted in an excess 15,000 “redo entries”. And we still have to ask: “Whatever happened to array processing ?”

In fact, we do see some array processing – we did 14,135 “Heap Segment Array Updates” which, at an average of about 18 rows per array would account for our 250,000 row table – but we also did 264,135 “HSC Heap Segment Block Changes”, and here’s a (“That’s funny …”) coincidence: 264,135 – 14,135 = 250,000: the number of rows in the table. Maybe that means something, maybe not. Another funny coincidence: 264,135 (HSC Heap Segment Block Changes) + 1,002 (rollback changes – undo records applied) = 265,137 which is remarkably close to the number of “redo entries”.

If, by now, you’re not beginning to suspect that something a little odd has happened you haven’t been reading enough Agatha Christie novels, or watching enough Sherlock Holmes (or House) videos.

I don’t always … dump trace files but when I do I like to go over the top. My update generated 83MB of redo so I made sure my log files were 100MB each, and when I dumped the current log file I got a trace file that was a little over 466MB and 11M lines. Clearly I had no intention of reading it all. A critical skill in reading ANY type of trace file is knowing in advance what you ought to be looking for and working out a painless way of finding it. (Practising by slugging your way through a very small example is a good starting point.)

In this case I started by using grep to extract all the lines containing “OP:”, stripping them down to just the “OP:” followed by the redo op code, and then sorting and summarising the use of op codes. Here (with some annotations) is what I got:

grep OP or32_ora_14652.trc | sed 's/.*OP/OP/' | sed 's/ .*$//' | sort | uniq -c | sort -n

      1 OP:11.3
      2 OP:10.4
      2 OP:11.5
      3 OP:23.1
      3 OP:4.1
      4 OP:14.4
      4 OP:22.5
      5 OP:13.22
      8 OP:11.2
      8 OP:22.2
     13 OP:10.2
     25 OP:5.4
    119 OP:5.11         Mark undo as applied during rollback
    883 OP:5.6          Applying undo - NB 883 + 119 = 1002
   3776 OP:5.2          Get undo segment header
  15139 OP:11.19        Array update of table
 250000 OP:11.4         Lock table row piece
 264190 OP:5.1          Update undo block

The interesting point here is that I seem to have locked every single row in the table separately, but I’ve also done array updates on the table, and I’ve done some rolling back. So I need to do a more detailed check to see when things happened. So let’s begin by examining a few thousand lines near the start of the trace:

head -10000 or32_ora_14652.trc | grep -n OP | sed 's/CON_ID.*OP/ OP/' | sed 's/ ENC.*$//'

68:CHANGE #1  OP:11.19
214:CHANGE #2  OP:5.2
217:CHANGE #3  OP:11.19
363:CHANGE #4  OP:11.19
509:CHANGE #5  OP:5.2
512:CHANGE #6  OP:11.19
658:CHANGE #7  OP:11.19
692:CHANGE #8  OP:11.19
838:CHANGE #9  OP:11.19
984:CHANGE #10  OP:11.19
1130:CHANGE #11  OP:11.19
1276:CHANGE #12  OP:11.19
1422:CHANGE #13  OP:11.19
1568:CHANGE #14  OP:11.19
1714:CHANGE #15  OP:5.2
1717:CHANGE #16  OP:11.19
1863:CHANGE #17  OP:11.19
2009:CHANGE #18  OP:11.19
2155:CHANGE #19  OP:11.19
2301:CHANGE #20  OP:11.19
2447:CHANGE #21  OP:11.19
2593:CHANGE #22  OP:11.19
2739:CHANGE #23  OP:11.19
2885:CHANGE #24  OP:5.2
2888:CHANGE #25  OP:11.19
3034:CHANGE #26  OP:11.19
3180:CHANGE #27  OP:5.1
3336:CHANGE #28  OP:5.1
3489:CHANGE #29  OP:5.1
3642:CHANGE #30  OP:5.1
3795:CHANGE #31  OP:5.1
3836:CHANGE #32  OP:5.1
3989:CHANGE #33  OP:5.1
4142:CHANGE #34  OP:5.1
4295:CHANGE #35  OP:5.1
4448:CHANGE #36  OP:5.1
4601:CHANGE #37  OP:5.1
4754:CHANGE #38  OP:5.1
4907:CHANGE #39  OP:5.1
5060:CHANGE #40  OP:5.1
5213:CHANGE #41  OP:5.1
5366:CHANGE #42  OP:5.1
5519:CHANGE #43  OP:5.1
5672:CHANGE #44  OP:5.1
5825:CHANGE #45  OP:5.1
5978:CHANGE #46  OP:5.1
6131:CHANGE #47  OP:5.1
6284:CHANGE #48  OP:5.1
6440:CHANGE #1  OP:5.1
6593:CHANGE #2  OP:11.19
6742:CHANGE #1  OP:5.1
6895:CHANGE #2  OP:11.19
7044:CHANGE #1  OP:5.1
7197:CHANGE #2  OP:11.19
7346:CHANGE #1  OP:5.1
7499:CHANGE #2  OP:11.19
7648:CHANGE #1  OP:5.1
7801:CHANGE #2  OP:11.19
7950:CHANGE #1  OP:5.2
7953:CHANGE #2  OP:5.1
8106:CHANGE #3  OP:11.19
8255:CHANGE #1  OP:5.1
8408:CHANGE #2  OP:11.19
8557:CHANGE #1  OP:5.1
8710:CHANGE #2  OP:11.19
8859:CHANGE #1  OP:5.1
9012:CHANGE #2  OP:11.19
9161:CHANGE #1  OP:5.1
9314:CHANGE #2  OP:11.19
9463:CHANGE #1  OP:5.1
9616:CHANGE #2  OP:11.19
9765:CHANGE #1  OP:5.1
9918:CHANGE #2  OP:11.19


Everything up to line 6284 (change# 48) is one big “private redo” record where we do several “11.19 (table array updates)” followed by a matching group of “(5.1 update undo block)”, before switching to the “public redo” mechanism used by long transactions where each redo record is a matched pair of “backward” and “forward” change vectors, i.e. one 5.1 with a matching 11.19.

So we start by doing array updates without doing any single row locking – so I want to know when the array updates stop. Here’s a small but continuous piece from the next grep I ran (which produced an output totalling 15,000 lines):

grep -n "OP:11.19" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 
...
446693:-OP:11.19
446736:-OP:11.19
446891:-OP:11.19
447045:-OP:11.19
447200:-OP:11.19
7461225:-OP:11.19    *** Big jump
7461527:-OP:11.19
7461829:-OP:11.19
7462131:-OP:11.19
7462321:-OP:11.19
7462511:-OP:11.19
...

The 11.19 opcodes were appearing roughly every 300 lines of trace, except they stopped after 2,004 occurrences then disappeared for about 7M lines before reappering and running evenly (ca. every 300 lines) to the end of the trace file. Clearly we need to know what happened somewhere around lines 447200 and 7461225. But before looking too closely, let’s find out when the “11.4 lock row piece” and “5.6 Apply undo” start to appear”

grep -n "OP:11.4" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//' 

447374:-OP:11.4
447405:-OP:11.4
447433:-OP:11.4
447461:-OP:11.4
447489:-OP:11.4
447517:-OP:11.4
...
... lots of evenly spread 11.4, every 28 lines
...
7460948:-OP:11.4
7460976:-OP:11.4
7461004:-OP:11.4
7461032:-OP:11.4
7461060:-OP:11.4

Note where the 11.4 opcodes begin and end – they fit exactly into the gap between the two sets of 11.19 opcodes. And where do the 5.6 op codes come into play ?

grep -n "OP:5.6" or32_ora_14652.trc | sed 's/:.*OP/:-OP/' | sed 's/ .*$//'

295919:-OP:5.6
296046:-OP:5.6
296201:-OP:5.6
296356:-OP:5.6
...
... a few hundred appearances of 5.6, common spaced by 155 lines 
...
446529:-OP:5.6
446684:-OP:5.6
446727:-OP:5.6
446882:-OP:5.6
447191:-OP:5.6

That was the entire set of “5.6 Apply undo” records – a short burst of activity which finishes just before the big gap in the 11.19 opcodes . This leads to the question: “does anything difference happen around the time that the 5.6 op codes start to appear?”  Time to look at a very small extract from a very big grep output:

grep -n "OP" or32_ora_14652.trc | sed 's/ CON_ID.*OP/:-OP/' | sed 's/ ENC.*$//' 
..
294895:CHANGE #1:-OP:5.1
295048:CHANGE #2:-OP:11.19

295197:CHANGE #1:-OP:5.1
295322:CHANGE #2:-OP:11.19

295443:CHANGE #1:-OP:5.1
295596:CHANGE #2:-OP:11.19

295745:CHANGE #1:-OP:22.2

295753:CHANGE #1:-OP:22.5

295760:CHANGE #1:-OP:14.4
295766:CHANGE #2:-OP:22.2

295773:CHANGE #1:-OP:11.19
295919:CHANGE #2:-OP:5.6

295928:CHANGE #1:-OP:11.19
296046:CHANGE #2:-OP:5.6

296055:CHANGE #1:-OP:11.19
296201:CHANGE #2:-OP:5.6

...

I’ve inserted blank lines between redo records in this tiny extract from around the point where the 5.6 opcodes appear. The first few records show Oracle doing the update, then there’s a little bit of space management (Op codes in the 22 and 14 layer) and then Oracle starts rolling back. (Note: whether updating or rolling back Oracle is applying “11.19 table array updates”, but the “5.6 undo applied” appear as change vector #2 in the redo record as we rollback, while the “5.1 create undo” appear as change vector #1 as we make forward changes).

If we count the number of 11.19 opcodes up to the point where the 11.4 opcodes appear we find that we’ve used 1,002 to update the table, then another 1,002 to rollback the update.

Summary

Without going into very fine detail examining individual redo change vectors, what we’ve seen is as follows:

  • On executing the very large update the session activity statistics show the number of rows accessed by tablescan was more than twice the expected number. This obvious oddity made me look more closely and notice the surprising “rollback changes – undo records applied” which just shouldn’t have been there at all, and this led me to more closely at the number of redo entries which averaged more than one per row when it should have been a much smaller number due to Oracle’s internal array update mechanism.
  • Extracting just the redo op codes from a dump of the redo log generated during this update we’ve extracted summary information that the work we’ve done includes: (a) locking every single row in the table separately, (b) performing a load of table array updates, (c) applying a significant number of undo records to rollback changes.
  • Digging in for just a little more detail we see (in order of activity):
    • A thousand array updates  (roughly 20,000 rows by tablescan – I did actually check a few redo records in detail, 20 rows was a common array size)
    • The thousand updates being rolled back (luckily a long time before we got near the end of the first tablescan)
    • 250,000 rows locked individually (250,000 rows by tablescan – the 2nd tablescan) (NB each LKR – lock row – undo record is 68 bytes of undo)
    • 13,000 array updates   (250,000 rows by tablescan – the third tablescan) – at roughly 100 bytes per row of “overhead” in the undo

I don’t know why this happens – and from memory I think it started happening in 10g. I have an idea it relates to changes in the global SCN as the update takes place and something (possibly related to space management updates) making the update think it has to go into “write consistent” mode as its “start SCN” gets too far away from the current SCN.

The bottom line on this, though, is that there may be cases where a large, tablescan-driven, update could actually do the tablescan three times, and generate far more undo and redo than it should. In theory I could have completed the update in about 13,000 pairs of (11.19, 5.1) change vectors. In practice this simple test produced far more work.

And it gets worse …

Update – prompted by a tweet from Tanel Poder

In one of my tests I had added the /+ monitor */ hint to see if the activity would be reported correctly. It didn’t look quite right – but what I hadn’t noticed was that the monitor reported the statement as executing 3 times (and, of course, the tablescan executing 3 times):

Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  TEST_USER (14:13842)
 SQL ID              :  9a166bpd7455j
 SQL Execution ID    :  16777216
 Execution Started   :  09/10/2019 09:41:54
 First Refresh Time  :  09/10/2019 09:41:54
 Last Refresh Time   :  09/10/2019 09:42:00
 Duration            :  6s
 Module/Action       :  MyModule/MyAction
 Service             :  orcl
 Program             :  sqlplus@vbgeneric (TNS V1-V3)

Global Stats
===================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
===================================================================
|    5.84 |    5.77 |     0.01 |     0.06 |     1M |   42 |  26MB |
===================================================================

SQL Plan Monitoring Details (Plan Hash Value=2927627013)
=========================================================================================================================================
| Id |      Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  | Activity | Activity Detail |
|    |                      |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes |   (%)    |   (# samples)   |
=========================================================================================================================================
|  0 | UPDATE STATEMENT     |      |         |      |         4 |     +3 |     3 |        0 |      |       |          |                 |
|  1 |   UPDATE             | T1   |         |      |         6 |     +1 |     3 |        0 |      |       |   100.00 | Cpu (6)         |
|  2 |    TABLE ACCESS FULL | T1   |    250K |  432 |         4 |     +3 |     3 |     698K |   42 |  26MB |          |                 |
=========================================================================================================================================

As you see from Tanel’s tweet, he has a little script you can use to scan through the library cache to see if any (big) updates have suffered from this problem.

 

Addendum

I couldn’t quite manage to leave this problem alone, so I’ve run the test and dumped the redo log three more times and there’s an interesting coincidence. Every time I ran the test the switch from updating to rolling back had these 4 (annotated) redo opcodes separating them:


158766:CHANGE #1:-OP:22.2        ktfbhredo - File Space Header Redo
158774:CHANGE #1:-OP:22.5        ktfbbredo - File BitMap Block Redo:
158781:CHANGE #1:-OP:14.4        kteop redo - redo operation on extent map
158787:CHANGE #2:-OP:22.2        ktfbhredo - File Space Header Redo:

The group represents some allocation of space from a file to an object – so the question is: what object gets an extent map update. Here’s the full redo vchange vector for OP:14.4


REDO RECORD - Thread:1 RBA: 0x001cdd.000028a4.00d8 LEN: 0x00b0 VLD: 0x01 CON_UID: 0
SCN: 0x0b86.0db28b4c SUBSCN: 88 09/10/2019 15:06:13
CHANGE #1 CON_ID:0 TYP:0 CLS:29 AFN:2 DBA:0x00800688 OBJ:4294967295 SCN:0x0b86.0db28b4c SEQ:10 OP:14.4 ENC:0 RBL:0 FLG:0x0000
kteop redo - redo operation on extent map
   ADD: dba:0x802380 len:128 at offset:6
  ADDRET: offset:6 ctime:0
   SETSTAT: exts:9 blks:911 lastmap:0x0 mapcnt:0
   UPDXNT: extent:6 add:TRUE

The “ADD” is adding extent id 6, of 128 blocks, at file 2 block 9088. That happens to be the undo segment which my update statement is using. Further 14.4 change vectors that appear through the trace file take the size of the undo segment up to 35 extents, but in the (small number of ) tests I did it was always first 14.4 that seemed to trigger the rollback and restart.

Preliminary Hypothesis

When the transaction does sufficient work that it forces its undo segment to extend some feature of the resulting recursive transaction causes the transaction to rollback and restart.

This does fit with the observation that the number of rows updated before the rollback/restart occurs varies apparently at random. It also fits the observation that an update that doesn’t rollback and restart in 12.1 does rollback and restart in 12.2 because it gets a bit unlcky using the “single row locking” optimisation for “nochange update” optimisation.

Extending in-lists

A well known “limitation” of Oracle is that you can only have 1000 elements within an IN-LIST style expression in your SQL predicates. I use the quotes because I’ve always found that if you are heading into the territory where you need more than 1000 items in an IN-LIST, then it is often indicative of something else being wrong. For example, if the code is backing a user interface, then where in the design phase did someone not go “Whoa…we are going to have 1000 select-able elements on the screen?”

In any event, you can find many tricks and techniques out there on the intertubes about how workaround this issue, such as:

My preference is for the last one because once you use a temporary table, suddenly all of the power of SQL is at your fingertips. I had an AskTOM question recently which gave me an opportunity to display this. The requester had an IN-LIST set of values, but the values could contain the standard wild-cards. Using a temporary table approach, this becomes trivial with a small adjustment to the join condition.

Here’s my first solution using a familiar XMLTABLE technique.


SQL> create table t ( x varchar2(50));

Table created.

SQL> insert into t values ('stringasd');

1 row created.

SQL> insert into t values ('blah');

1 row created.

SQL> insert into t values ('more data');

1 row created.

SQL> insert into t values ('blah blah');

1 row created.

SQL> insert into t values ('some stuff with qwe in it');

1 row created.

SQL> insert into t values ('more with a double match on qwe');

1 row created.

SQL>
SQL> with
  2  conditions as
  3    ( select 'string%,more%,%qwe%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle;

INLIST_ITEM
--------------------
string%
more%
%qwe%

3 rows selected.

SQL>
SQL>
SQL> with
  2  conditions as
  3    ( select 'string%,more%,%qwe%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle r,
 12         t
 13  where t.x like r.inlist_item;

INLIST_ITEM          X
-------------------- ----------------------------------------
string%              stringasd
more%                more data
%qwe%                some stuff with qwe in it
more%                more with a double match on qwe
%qwe%                more with a double match on qwe

5 rows selected.

SQL>
SQL>

XMLTABLE has its limitations in terms of handling special characters as you can see from below:



SQL> set define off
SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions,
  8         xmltable(('"'||replace(wildcard_list, ',', '","')||'"'))
  9  )
 10  select *
 11  from   rowstyle r,
 12         t
 13  where t.x like r.inlist_item;
with
*
ERROR at line 1:
ORA-19112: error raised during evaluation:
XVM-01003: [XPST0003] Syntax error at '"%abc'
1   "string%","more%","%abc&xyz%"
-                    ^

but this does not rule out the temporary table technique – we just need to use a different mechanism to convert our list into rows. Here’s another common technique, using a regular expression and a hierarchy CONNECT BY syntax.



SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(w) inlist_item
  7  from   conditions c,
  8         lateral(
  9           select trim(regexp_substr(c.wildcard_list, '[^,]+',1,level)) w
 10           from   dual
 11           connect by level <= regexp_count(c.wildcard_list,'[^,]+')
 12           )
 13  )
 14  select *
 15  from   rowstyle r
 16  /

INLIST_ITEM
--------------------
string%
more%
%abc&xyz%

3 rows selected.

With the special characters issue fixed, it is then a simple join with the LIKE to obtain the desired result.



SQL> with
  2  conditions as
  3    ( select 'string%,more%,%abc&xyz%' wildcard_list from dual ),
  4  rowstyle as
  5  (
  6  select trim(column_value) inlist_item
  7  from   conditions c,
  8         lateral(
  9           select regexp_substr(c.wildcard_list, '[^,]+',1,level) column_value
 10           from   dual
 11           connect by level <= regexp_count(c.wildcard_list,'[^,]+')
 12           )
 13  )
 14  select *
 15  from   rowstyle r,
 16         t
 17  where t.x like r.inlist_item;

INLIST_ITEM          X
-------------------- ----------------------------------------
string%              stringasd
more%                more data
more%                more with a double match on qwe

3 rows selected.

So before you look at the “tricks” to allow more than 1000 items in an IN-LIST, do not be too quick to dismiss the value of converting the values to rows, and therefore opening up all the power of SQL to perform your processing.