Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

CQRS, Event Sourcing and the Oracle Database

By Franck Pachot

.
This blog post relates my thoughts when reading about Command Query Responsibility Separation and Event Sourcing, in the context of the Oracle Database (but it can probably apply to any database). We see those terms in the new software architecture diagrams, but they are actually quite old:

Command-Query separation

Command-Query separation was defined by Bertrand Meyer 15 years ago, not for the database but for the Eiffel language. See page 22-44 of Eiffel: a language for software engineering.

This basically states that a function that returns a result does not change anything (“asking a question should not change the answer”). In PL/SQL you can do whatever you want with FUNCTION and PROCEDURE: PROCEDURE can return values with OUT parameters and FUNCTION can modify some data. However, ‘You can’ does not mean that you should. It is easy to define some coding rules where a FUNCTION does not change anything and where on PROCEDURE implements what Bertrand Meyer calls ‘command’.

In SQL this is clearly enforced. A SELECT is for queries and cannot change any data. Even if you call a function that tries to change some data you will encounter:

ORA-14551: cannot perform a DML operation inside a query

Object Oriented data modeling

This concept came from Eiffel, an object-oriented programming language. Object Oriented approach was designed for transient data structures: stored in memory (the object identification is the address in memory) and usually not shared (the representation of the same data is a different object in different systems). But the modular and encapsulation aspects of OO approach being great, OO approach has been extended to data, the persistent data that we store in the database. Before, the functions and the data model were analyzed separately. This designed monolithic applications which were difficult to maintain and evolve. The Object Oriented approach helps to analyze the sub-subsystems separately.

Create Retrieve Update Delete

However, because the objects were the business objects and not the use-cases, or services, then the software architects came with this simplified idea that all interaction with the database is CRUD: you Create (aka INSERT), retrieve (aka SELECT), UPDATE or DELETE. And then you define an Object (mapped to a table) and define those 4 CRUD methods mapped to 4 SQL Statements. And you can do everything. And because some ORM frameworks give the possibility to ignore how the data is stored and shared in the database, a new generation of developers was working with data sets as if it were objects in memory.

And this is completely wrong because

  • one object maps to only one row, and the CRUD approach exposes no method for bulk updates – those where the database can be the most efficient
  • the CRUD methods read and update all table columns – ignoring all efficient access paths to a subset of columns
  • mapping a transient object identifier (address in memory) to a shared persistent identifier (primary key) is not easy
  • and finally, all those CRUD operations belong to an object when they should belong to use-cases and transactions

The last point is more obvious between the queries (the R in CRUD) and the insert (the C in CRUD). In an information system, you usually use (or query or retrieve) the data in a completely different way than it was entered (entered, inserted). Let’s take an example in an Order Entry system. When the customer starts with an order, it is probably the responsibility of the Orders object. But then, it will be processed by for the delivery, and then queried by the Customer object, and finally by the Products for sales analysis. Grouping all that in the same object is easier for the Object-Oriented Analysis, for the modelization of the domain object. But keeping this grouping for the software implementation is a big source of problems. And finally, the same object will be used by completely different use-cases, with different availability, consistency, performance,… requirements. That’s too much responsibility for one object.

Command-Query-Responsability-Separation (CQRS)

Because of this (CRUD row-by-row queries inefficiency and Object over-responsability) the applications started to cache locally the data used by queries. In order to avoid the row-by-row roundtrips and to store them in a model more suited to the query than the way it was inserted. Like dozens of data marts that try to keep in sync with the changes done in the database.

Hibernate, for example, can cache the whole data in a second level cache, using object stores like GigaSpaces or Oracle Coherence. But this doesn’t solve the main problem. The object model, which is not relational but hierarchical, cannot be used efficiently by all use-cases. And I’ve seen very few Hibernated developers accepting to have multiple Entity mappings for the different use-cases. Basically, the relational model can cope with all kind of queries (especially since Oracle In-Memory Columns Store as you can also do analytics on it). But when querying it with an object model, you need several models.

Then came the idea to split the data model between ‘command’ – the modifications, and ‘query’. This is the CQRS from Martin Fowler: https://martinfowler.com/bliki/CQRS.html

This is, again, not new. The relational model exposes data through views and you can, and should, have different views on your data. The data is stored in one normalized model, modified though views or procedures (Transactional API) and queried through views. Separation of responsibility has always been there. And I totally agree with all those diagrams showing multi-layers, separation, and even micro-services. There’s only one problem with them.

Logical vs. physical separation

I don’t really know when, but at a point, the architects looking at those diagrams forgot one step where the logical model should be implemented in a physical model. And what was logical layers became physical tiers without any reasons. Many architects uses ‘layer’, ‘level’, ‘tier’ without even knowing if they are at logical or physical level. It is good to have logical layers, but processing data across multiple physical tiers will exhaust all resources in all tiers just by doing nothing else than waiting on roundtrips latency and expensive context switches.

The CRUD methods had to be coded in an Object Oriented language, and then the idea was Java. Then, the rows have to be shipped between two different processes: one running a JVM, and one running the SQL queries in the RDBMS. Different processes means context switches for each calls. This is latency and overhead, and is not scalable. Oracle has the possibility to run the JVM but this still context switch and datatype conversion. Then, to try to scale, more application servers were needed and this data processing part moved to the application server.

And the bad effect is not only on performance and scalability. Because of this confusion, implementing logical layers into different servers, technologies, languages,… we lost the link between the storage data and the processing of data. Which is exactly the opposite of an Object Oriented approach. With the data logic (you can call it business logic as well, as all data processing is there to implement business rules) in the database you can manage dependencies. The database always keeps track of which function or procedure is using which table, and how.

Event Sourcing

https://martinfowler.com/eaaDev/EventSourcing.html

Oracle can even guarantee consistent reads without changing anything, thanks to MVCC. Of course, SELECT FOR UPDATE can write to the database, not to change data but to write lock information, but you can restrict this by granting only the READ privilege.

This means that, for example, in PL/SQL we use PROCEDURE for what Bertrand Meyer calls ‘command': changing data. And we use FUNCTION to query data

 

Cet article CQRS, Event Sourcing and the Oracle Database est apparu en premier sur Blog dbi services.

Visual Studio- All the Warm and Fuzzy

So I haven’t opened Visual Studio in….oh….let’s just say it’s been a few years…:)

I had a project that I needed to run and was surprised when the Solution Explorer was missing from SSMS 2017.  Its only fair to say, there was also fair warning from Microsoft.

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/sol_ex.png?re... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/sol_ex.png?re... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/sol_ex.png?w=... 1568w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/sol_ex.png?w=... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Due to this, I opened up Visual Studio to use its Solution Explorer and integration for SSIS and other features required for a large project.  I was both happy with the sheer amount of features and have some constructive feedback to make it more user friendly.

I love that I can navigate servers, log into SQL Server databases and manage and verify what’s occurred in my releases.  The properties pane comes in handy to offer valuable information when I’m building out connection strings or looking for data that may have not compiled correctly in a build.

Although the rough instructions were for Solution Explorer for SSMS, I was able, even with as rusty as I was, figure out how to do everything- projects, SSIS and database SQL, in Visual Studio.

The interface is familiar as a Windows user-  right click for options, left click to execute the option.  The interface has links on the left for shortcuts to SSMS Object Explorer, which allows me to log into my database environments, along with browsing servers that I may also be deploying application code to.

Projects make it easy to build out a full, multi-tier deployment and debug it all from one application, too.  Needless to say, I’m happy to report that even with some missing instructions, I was able to do what needed to be done and do it with some grace.

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/vss_dw_deploy... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/vss_dw_deploy... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/vss_dw_deploy... 1400w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/vss_dw_deploy... 2100w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

So what do I see that can be improved?

  1. When you copy and paste to update a path, don’t automatically remove the remainder of the file name, etc., that’s been left on purpose.  This can lead to extra human intervention, which then leads to more chance of human error.
  2. The hints when hovering over a button can become a nuisance instead of a help.  Have the hints auto-hide after 5 seconds.  There’s no reason to leave them up when we’re trying to guide our cursor to a small button.
  3. Make debug display all steps, the errors and then shut down automatically when complete.
  4. Make it easier to keep panes open for the SQL Server Object Explorer, Toolbox, etc. vs. auto-hiding.  The information on the pane may be needed for reference as one works on another configuration panel.
  5. The Control Flow on SSIS package execution shouldn’t be blown up so large that you can’t decipher what a package is doing.  Keep it legible.



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Visual Studio- All the Warm and Fuzzy], All Right Reserved. 2018.

Oracle 18c preinstall RPM on RedHat RHEL

By Franck Pachot

.
The Linux prerequisites for Oracle Database are all documented but using the pre-install rpm makes all things easier. Before 18c, this was easy on Oracle Enterprise Linux (OEL) but not so easy on RedHat (RHEL) where the .rpm had many dependencies on OEL and UEK.
Now that 18c is there to download, there’s also the 18c preinstall rpm and the good news is that it can be run also on RHEL without modification.

This came to my attention on Twitter:

And of course this is fully documented:
https://docs.oracle.com/en/database/oracle/oracle-database/18/cwlin/about-the-oracle-preinstallation-rpm.html#GUID-C15A642B-534D-4E4A-BDE8-6DC7772AA9C8

In order to test it I’ve created quickly a CentOS instance on the Oracle Cloud:
CaptureCentosPreinstall

I’ve downloaded the RPM from the OEL7 repository:

[root@instance-20180803-1152 opc]# curl -o oracle-database-preinstall-18c-1.0-1.el7.x86_64.rpm https ://yum.oracle.com/repo/OracleLinux/OL7/latest/x86_64/getPackage/oracle-database-preinstall-18c-1.0-1 .el7.x86_64.rpm
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 18244 100 18244 0 0 63849 0 --:--:-- --:--:-- --:--:-- 63790

then ran the installation:

[root@instance-20180803-1152 opc]# yum -y localinstall oracle-database-preinstall-18c-1.0-1.el7.x86_ 64.rpm

 
It installs automatically all dependencies:
Installed:
oracle-database-preinstall-18c.x86_64 0:1.0-1.el7
 
Dependency Installed:
compat-libcap1.x86_64 0:1.10-7.el7 compat-libstdc++-33.x86_64 0:3.2.3-72.el7 glibc-devel.x86_64 0:2.17-222.el7 glibc-headers.x86_64 0:2.17-222.el7
gssproxy.x86_64 0:0.7.0-17.el7 kernel-headers.x86_64 0:3.10.0-862.9.1.el7 keyutils.x86_64 0:1.5.8-3.el7 ksh.x86_64 0:20120801-137.el7
libICE.x86_64 0:1.0.9-9.el7 libSM.x86_64 0:1.2.2-2.el7 libXext.x86_64 0:1.3.3-3.el7 libXi.x86_64 0:1.7.9-1.el7
libXinerama.x86_64 0:1.1.3-2.1.el7 libXmu.x86_64 0:1.1.2-2.el7 libXrandr.x86_64 0:1.5.1-2.el7 libXrender.x86_64 0:0.9.10-1.el7
libXt.x86_64 0:1.1.5-3.el7 libXtst.x86_64 0:1.2.3-1.el7 libXv.x86_64 0:1.0.11-1.el7 libXxf86dga.x86_64 0:1.1.4-2.1.el7
libXxf86misc.x86_64 0:1.0.3-7.1.el7 libXxf86vm.x86_64 0:1.1.4-1.el7 libaio-devel.x86_64 0:0.3.109-13.el7 libbasicobjects.x86_64 0:0.1.1-29.el7
libcollection.x86_64 0:0.7.0-29.el7 libdmx.x86_64 0:1.1.3-3.el7 libevent.x86_64 0:2.0.21-4.el7 libini_config.x86_64 0:1.3.1-29.el7
libnfsidmap.x86_64 0:0.25-19.el7 libpath_utils.x86_64 0:0.2.1-29.el7 libref_array.x86_64 0:0.1.5-29.el7 libstdc++-devel.x86_64 0:4.8.5-28.el7_5.1
libverto-libevent.x86_64 0:0.2.5-4.el7 nfs-utils.x86_64 1:1.3.0-0.54.el7 psmisc.x86_64 0:22.20-15.el7 xorg-x11-utils.x86_64 0:7.5-22.el7
xorg-x11-xauth.x86_64 1:1.0.9-1.el7

Note that the limits are stored in limits.d which has priority over limits.conf:

[root@instance-20180803-1152 opc]# cat /etc/security/limits.d/oracle-database-preinstall-18c.conf
 
# oracle-database-preinstall-18c setting for nofile soft limit is 1024
oracle soft nofile 1024
 
# oracle-database-preinstall-18c setting for nofile hard limit is 65536
oracle hard nofile 65536
 
# oracle-database-preinstall-18c setting for nproc soft limit is 16384
# refer orabug15971421 for more info.
oracle soft nproc 16384
 
# oracle-database-preinstall-18c setting for nproc hard limit is 16384
oracle hard nproc 16384
 
# oracle-database-preinstall-18c setting for stack soft limit is 10240KB
oracle soft stack 10240
 
# oracle-database-preinstall-18c setting for stack hard limit is 32768KB
oracle hard stack 32768
 
# oracle-database-preinstall-18c setting for memlock hard limit is maximum of 128GB on x86_64 or 3GB on x86 OR 90 % of RAM
oracle hard memlock 134217728
 
# oracle-database-preinstall-18c setting for memlock soft limit is maximum of 128GB on x86_64 or 3GB on x86 OR 90% of RAM
oracle soft memlock 134217728

Note that memlock is set to 128GB here but can be higher on machines with huge RAM (up to 90% of RAM)

And for information, here is what is set in /etc/sysctl.conf:

fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 1073741824
kernel.shmmax = 4398046511104
kernel.panic_on_oops = 1
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.conf.all.rp_filter = 2
net.ipv4.conf.default.rp_filter = 2
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 9000 65500

Besides that, the preinstall rpm disables NUMA and transparent huge pages (as boot options in GRUB). It creates the oracle user (id 54321 and belonging to groups oinstall,dba,oper,backupdba,dgdba,kmdba,racdba)

 

Cet article Oracle 18c preinstall RPM on RedHat RHEL est apparu en premier sur Blog dbi services.

RAC One node databases are relocated by opatchauto in 12.2 part 2

In a previous post I shared how I found out that RAC One Node databases are relocated on-line during patching and I promised a few more tests and sharing of implications. If you aren’t familiar with RAC One Node I recommend having a look at the official documentation: The Real Application Clusters Administration and Deployment Guide features it prominently in the introduction. One of the things I like to keep in mind when working with RAC One Node is this, quoting from section 1.3 in said admin and deployment guide:

Using the Single Client Access Name (SCAN) to connect to the database, clients can locate the service independently of the node on which it is running. Relocating an Oracle RAC One Node instance is therefore mostly transparent to the client, depending on the client connection. Oracle recommends to use either Application Continuity and Oracle Fast Application Notification or Transparent Application Failover to minimize the impact of a relocation on the client.

A little later, you can read this important additional piece of information in the “notes” section:

To prepare for all failure possibilities, you must add at least one Dynamic Database Service (Oracle Clusterware-managed database service) to an Oracle RAC One Node database.

If you created your RAC One database with the database creation assistant (dbca), you are already complying with that rule. In my case, my lab database is named RON with the mandatory service RON_SVC. I opted for an instance prefix of “DCB”.

What I’d like to try out for this blog post is what happens to an active workload on a RAC One database during patching.

I have successfully patched multi-node RAC systems, but that required the application to support this procedure. One of my favourite talks is named “Advanced RAC programming techniques” where I demonstrate the resilience of an application based on RAC to node failures by using Universal Connection Pool (UCP), JDBC and Fast Connection Failover (FCF). UCP is a Fast Application Notification (FAN) aware connection pool allowing my application to react to cluster events such as node up/node down. The idea is to mask instance failure from users.

If all of this sounds super-cryptic, I would like to suggest chapters 5 and 6 of the RAC administration and deployment guide for you. There you can read more about Workload Management with Dynamic Database Services and Application Continuity in RAC.

RAC One Node is different from multi-node RAC as it is only ever active on one node in normal operations. Online relocation, as described in the previous post, is supported by temporarily adding a second (destination) instance to the cluster, and moving all transactions across before terminating them after a time-out (default 30m). Once the time-out is reached (or all sessions made it across to the destination) the source instance is shut down and you are back to 1 active instance.

The online relocation does not care too much about the compatibility of the application with the process. If your application is written for RAC, most likely it will migrate quickly from source to destination instance. If it isn’t, well, the hard timeout might kick in and evict a few of your users. In a worse case your users need to re-connect to the database. Even worse still, you might have to restart the middle-tier system(s).

Sadly I haven’t seen too many applications capable of handling RAC events properly. One application that does is Swingbench, so I’ll stick with it. I configured it according to Dominic Giles’s blog post.

This post assumes that you are properly licensed to use all these features.

The environment

My setup hasn’t changed from previous post so I spare you the boredom of repeating it here. Jump over to the other post for details.

Preparations

For this blogpost I need to ensure that my RAC One node database resides on the node I am about to patch. I have again followed the patch readme very carefully, I have made sure that I have (proven, working, tested) backups of the entire stack and all the rest of it…

My database is indeed actively running on the node I am about to patch:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: INACTIVE

Before I can benefit from Application Continuity, I need to make a few changes to my application service, RON_SVC. There are quite a few sources to choose from, I went with the JDBC Developer’s Guide. Here’s the modification:

[oracle@rac122sec2 ~]$ srvctl modify service -db DCB -service RON_SVC -commit_outcome true \
> -failoverretry 30 -failoverdelay 10 -failovertype transaction \
> -replay_init_time 1800 -retention 86400 -notification true
[oracle@rac122sec2 ~]

Following the instructions on Dominic Giles’s blog, I also need to grant SOE the right to execute DBMS_APP_CONT.

Finally, I need to make changes to my Swingbench configuration file. The relevant part is shown here:


    "Order Entry (PLSQL) V2"
    Version 2 of the SOE Benchmark running in the database using PL/SQL
    
        soe
        
        (DESCRIPTION=(TRANSPORT_CONNECT_TIMEOUT=5)
        (RETRY_COUNT=3)(FAILOVER=ON)
        (ADDRESS=(PROTOCOL=TCP)(HOST=rac122sec-scan)(PORT=1521))
        (CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=RON_SVC)))
        Oracle jdbc Driver
        
            5
            5
            10
            50
            45
            10
            120
        
        
            50
            true
            20
            nodes=rac122sec1:6200,rac122sec2:6200
            20
            true
        
    
    
        5
        0
        0
        50
        500
        120
        -1
        0:0
        1
        1000
        false
        false
        0:0
        0:0
        0
        
...

The connection string is actually on a single line, I have formatted it for readability in the above example. The main change from the standard configuration file is the use of connection pooling and setting the required properties for Application Continuity.

Let’s patch!

Once all the preparations are completed, it’s time to see how RAC One Node deals with an active workload undergoing an online relocation during patching. First of all I need to start the workload. I’d normally use charbench for this, but this time around opted for the GUI. It shows performance graphs over a 3 minute rolling window.

A few minutes after starting the benchmark I commenced patching. Soon thereafter, opatchauto told me that it was relocating the database.

OPatchauto session is initiated at Fri Jul 27 14:52:23 2018

...

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1

According to the session log, this happened at 14:54. And by the way, always keep the session log, it’s invaluable!

2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
Executing command as oracle: 
 /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
 /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:54:07,914 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  COMMAND Looks like this: 
    /bin/sh -c 'ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/dbhome_1 
    /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/srvctl relocate database -d DCB -n rac122sec1 -w 30 -v'
2018-07-27 14:55:31,736 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  status: 0
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Successfully executed the above command.

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Output from the command:
2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor -
Configuration updated to two instances
Instance DCB_2 started
Services relocated
Waiting for up to 30 minutes for instance DCB_1 to stop ...
Instance DCB_1 stopped
Configuration updated to one instance

2018-07-27 14:55:31,737 INFO  [163] com.oracle.glcm.patch.auto.db.product.executor.PatchingStepExecutor - 
  Command executed successfully.

You can see the same information by querying Clusterware, although there aren’t any timestamps associated with it:

[root@rac122sec1 ~]# srvctl status database -db DCB -detail -verbose
Instance DCB_1 is running on node rac122sec2 with online services RON_SVC. Instance status: Open.
Instance DCB_1 is connected to ASM instance +ASM2
Online relocation: ACTIVE
Source instance: DCB_1 on rac122sec2
Destination instance: DCB_2 on rac122sec1
[root@rac122sec1 ~]# 

Although the online relocation timeout was set to 30 minutes, use of modern coding techniques and connection pooling allowed for a much faster online relocation. As you can see in the log excerpt the entire relocation was completed 2018-07-27 14:55:31,737. Clusterware now tells me that my database runs on node 1:

[root@rac122sec1 ~]# date;  srvctl status database -db DCB -detail -verbose
Fri 27 Jul 14:55:38 2018
Instance DCB_2 is running on node rac122sec1 with online services RON_SVC. Instance status: Open.
Instance DCB_2 is connected to ASM instance +ASM1
Online relocation: INACTIVE

While this is wicked, what are the implications for the application? Have a look at this print screen, taken a minute after the online relocation completed.

https://martincarstenbach.files.wordpress.com/2018/07/rac-one-online-rel... 600w, https://martincarstenbach.files.wordpress.com/2018/07/rac-one-online-rel... 150w" sizes="(max-width: 300px) 100vw, 300px" />

As you can see there was nothing unusual recorded (tab events to the left), and I couldn’t see a drop in the number of sessions connected. I noticed a slight blip in performance though but it recovered very soon thereafter.

Summary

During automatic patching of the Oracle stack opatchauto will perform an online relocation of a RAC One Node database if it is found running on the node currently undergoing patching. If your application is developed with RAC in mind – such as using a FAN-aware connection pool like UCP, and either supports Fast Connection Failover or Application Continuity, there is a good chance that patching the system does not affect your service.

While my test was successful, it is by no means representative of a production workload – my RAC One database has a SGA of 4 GB and there were 5 concurrent users – hardly what I see in the real world. However that shouldn’t deter you: if you are using RAC One Node I think it’s definitely worth a try implementing modern technology into the application.

Building community via the speaker community

I recently participated in the Oracle Developer Community Yatra tour throughout India. It is a hectic event with 7 cities covered in a mere 9 days, so you can imagine how frantic the pace was. A typical day would be:

  • 7am – breakfast
  • 8am – check out of hotel and leave for the venue
  • 9am – speak all day, host Q&A
  • 6pm – leave straight from venue to the airport
  • 8pm – dinner at airport, and fly to next city
  • 1am – get to next hotel, grab some sleep before doing it all again in 6 hours time

Yet as a speaker in this event, it never felt that the logistics of the event were out of control.  This is mainly due to the incredible work of the people in the AIOUG (All India Oracle User Group) coordinated by Sai Penumuru. The smoothness of the organization prompted me to blog about how user groups could follow the AIOUG lead in terms of running successful events.

Every time I do an event in India, not only do I come away with a stronger network and connection with the attendees, I also gain new and strong friendships with the speakers and this is in no small part due to organizational skills of Sai and the volunteers.

So from that reference point – namely, the smart way in which Sai and the user group foster a great feeling of community amongst the speakers, I thought I’d share the mechanisms via which user groups can organize events that make speakers feel welcome and keen to return.

Common accommodation

When I travel to India, Sai provides a recommended hotel for all speakers. Obviously, no speaker is compelled to stay there, but we all know that the recommendation Sai makes has taken in account:

  • facilities the hotel provides,
  • hotel price to find a balance for speakers either having company funding or funding themselves,
  • transport time to/from the conference venue.

So most of us will always use his recommendation, and it makes the decision making process easy.  But most of all, it is a catalyst for speakers to meet in a casual environment outside the conference hours, and build relationships.

Common transport

The AIOUG organizes transport to and from the venue, and from the airport as well. And for when this is not possible, they will provide a recommended transport means (such as Uber etc) so that speakers never have the risk of picking an unreliable or unsafe transport option. For multi-city events such as Yatra, the AIOUG also recommends common itinerary for flights, once again making the entire planning process much easier for speakers.

A communication mechanism

Before a conference, AIOUG sets up a Whatsapp group containing all of the speakers, and the key representatives from the user group. In this way, all communication is sent in a consistent fashion.  We don’t have to jump between email, twitter, etc to see if any correspondence has been missed. And this group also helps build the relationship between speakers and user group. Humourous stories and pictures can be shared, but most importantly, if there is an issue or problem – everyone is aware of it immediately, making resolution is fast and effective.

And perhaps most importantly, it helps accentuate the human element – messages are sent as people land or takeoff, and when people arrive at the hotel.  Organizers regularly send messages making sure no-one is having any difficulties.  All these things make the speakers feel more comfortable before, during and after the event.

A typical day

Perhaps the most valuable piece of information that is conveyed by the user group is ‘local knowledge’. For example, each evening a whatsapp message would be sent out detailing

  • hotel pickup time,
  • expected transit time,
  • who to contact/what to do when arriving at venue,
  • what identification requirements may be needed on site
    etc

So even though it may be a first visit to a city for the speakers, there is a degree of familiarity with what is expected to happen, and hence knowledge of whether things are departing from the norm.

Onsite assistance

The most stressful 5 minutes for any speaker is the time they are setting up for their talk. Making sure projectors work, internet connectivity, schedule changes etc – are all things that can sabotage a good talk for a speaker. The AIOUG always has someone visit the room in that critical 5 minutes, so a speaker does not have to go hunting for technical assistance.

 

In summary, as you can see, none of these things are particular arduous to do, but the benefits are huge for speakers.  We feel comfortable and welcome, which means a much better chance of a successful talks, and makes us much more likely to want to return.

So if you are reading this, and are part of a user group committee, please consider some of the tips above for your local events. If your speakers have a good experience, they’ll be much more keen to offer continued support for your events.

Extended Histograms – 2

Following on from the previous posting which raised the idea of faking a frequency histogram for a column group (extended stats), this is just a brief demonstration of how you can do this. It’s really only a minor variation of something I’ve published before, but it shows how you can use a query to generate a set of values for the histogram and it pulls in a detail about how Oracle generates and stores column group values.

We’ll start with the same table as we had before – two columns which hold only the combinations (‘Y’, ‘N’) or (‘N’, ‘Y’) in a very skewed way, with a requirement to ensure that the optimizer provides an estimate of 1 if a user queries for (‘N’,’N’) … and I’m going to go the extra mile and create a histogram that does the same when the query is for the final possible combination of (‘Y’,’Y’).

Here’s the starting code that generates the data, and creates histograms on all the columns (I’ve run this against 12.1.0.2 and 12.2.0.1 so far):


rem
rem     Script:         histogram_hack_2a.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0.4
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to deal with wordpress format issue.
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to deal with wordpress format issue.
;

variable v1 varchar2(128)

begin
        :v1 := dbms_stats.create_extended_stats(null,'t1','(c2,c3)');
        dbms_output.put_line(:v1);
end;
/

execute dbms_stats.gather_table_stats(null, 't1', method_opt=>'for all columns size 10');

In a variation from the previous version of the code I’ve used the “create_extended_stats()” function so that I can return the resulting virtual column name (also known as an “extension” name) into a variable that I can use later in an anonymous PL/SQL block.

Let’s now compare the values stored in the histogram for that column with the values generated by a function call that I first referenced a couple of years ago:


select
        endpoint_value
from 
        user_tab_histograms
where
        table_name = 'T1'
and     column_name = :v1
;

select 
        distinct c2, c3, 
        mod(sys_op_combined_hash(c2,c3),9999999999) endpoint_value
from t1
;

ENDPOINT_VALUE
--------------
    4794513072
    6030031083

2 rows selected.


C C ENDPOINT_VALUE
- - --------------
N Y     4794513072
Y N     6030031083

2 rows selected.

So we have a method of generating the values that Oracle should store in the histogram; now we need to generate 4 values and supply them to a call to dbms_stats.set_column_stats() in the right order with the frequencies we want to see:


declare
        l_distcnt number;
        l_density number;
        l_nullcnt number;
        l_avgclen number;

        l_srec  dbms_stats.statrec;
        n_array dbms_stats.numarray;

begin
        dbms_stats.get_column_stats (
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

        l_srec.novals := dbms_stats.numarray();
        l_srec.bkvals := dbms_stats.numarray();

        for r in (
                select
                        mod(sys_op_combined_hash(c2,c3),9999999999) hash_value, bucket_size
                from    (
                        select 'Y' c2, 'Y' c3, 1 bucket_size from dual
                        union all
                        select 'N' c2, 'N' c3, 1 from dual
                        union all
                        select 'Y' c2, 'N' c3, 71482 from dual
                        union all
                        select 'N' c2, 'Y' c3, 1994 from dual
                        )
                order by hash_value
        ) loop
                l_srec.novals.extend;
                l_srec.novals(l_srec.novals.count) := r.hash_value;

                l_srec.bkvals.extend;
                l_srec.bkvals(l_srec.bkvals.count) := r.bucket_size;
        end loop;

        n_array := l_srec.novals;

        l_distcnt  := 4;
        l_srec.epc := 4;

--
--      For 11g rpcnts must not be mentioned
--      For 12c is must be set to null or you
--      will (probably) raise error:
--              ORA-06533: Subscript beyond count
--

        l_srec.rpcnts := null;

        dbms_stats.prepare_column_values(l_srec, n_array);

        dbms_stats.set_column_stats(
                ownname =>null,
                tabname =>'t1',
                colname =>:v1,
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                avgclen =>l_avgclen,
                srec    =>l_srec
        );

end;

The outline of the code is simply: get_column_stats, set up a couple of arrays and simple variables, prepare_column_values, set_column_stats. The special detail that I’ve included here is that I’ve used a “union all” query to generate an ordered list of hash values (with the desired frequencies), then grown the arrays one element at a time to copy them in place. (That’s not the only option at this point, and it’s probably not the most efficient option – but it’s good enough). In the past I’ve used this type of approach but used an analytic query against the table data to produce the equivalent of 12c Top-frequency histogram in much older versions of Oracle.

A couple of important points – I’ve set the “end point count” (l_srec.epc) to match the size of the arrays, and I’ve also changed the number of distinct values to match. For 12c to tell the code that this is a frequency histogram (and not a hybrid) I’ve had to null out the “repeat counts” array (l_srec.rpcnts). If you run this on 11g the reference to rpcnts is illegal so has to be commented out.

After running this procedure, here’s what I get in user_tab_histograms for the column:


select
        endpoint_value                          column_value,
        endpoint_number                         endpoint_number,
        endpoint_number - nvl(prev_endpoint,0)  frequency
from    (
        select
                endpoint_number,
                lag(endpoint_number,1) over(
                        order by endpoint_number
                )                               prev_endpoint,
                endpoint_value
        from
                user_tab_histograms
        where
                table_name  = 'T1'
        and     column_name = :v1
        )
order by endpoint_number
;

COLUMN_VALUE ENDPOINT_NUMBER  FREQUENCY
------------ --------------- ----------
   167789251               1          1
  4794513072            1995       1994
  6030031083           73477      71482
  8288761534           73478          1

4 rows selected.


It’s left as an exercise to the reader to check that the estimated cardinality for the predicate “c2 = ‘N’ and c3 = ‘N'” is 1 with this histogram in place.

Extended Histograms

Today’s little puzzle comes courtesy of the Oracle-L mailing list. A table has two columns (c2 and c3), which contain only the values ‘Y’ and ‘N’, with the following distribution:


select   c2, c3, count(*)
from     t1
group by c2, c3
;

C C   COUNT(*)
- - ----------
N Y       1994
Y N      71482

2 rows selected.

The puzzle is this – how do you get the optimizer to predict a cardinality of zero (or, using its best approximation, 1) if you execute a query where the predicate is:

where   c2 = 'N' and c3 = 'N'

Here are 4 tests you might try:

  • Create simple stats (no histograms) on c2 and c3.
  • Create frequency histograms on c2 and c3
  • Create a column group (extended stats) on (c2,c3) but no histograms
  • Create a column group (extended stats) on (c2,c3) with a histogram on (c2, c3)

If you do these tests you’ll find the estimated cardinalities are (from 12.1.0.2):

  • 18,369 – derived as 73,476 / 4  … total rows over total possible combinations
  • 1,940   – derived as 73,476 * (1,994/73,476) * (71,482/73,476) … total rows * fraction where c2 = ‘N’ * fraction where c3 = ‘N’
  • 36,738 – derived as 73,476 / 2 … total rows / number of distinct combinations of (c2, c3)
  • 997      – derived as 1,994 / 2 … half the frequency of the least frequently occurring value in the histogram

The last algorithm appeared in 10.2.0.4; prior to that a “value not in frequency histogram” would have been given an estimated cardinality of 1 (which is what the person on Oracle-L wanted to see).

In fact the optimizer’s behaviour can be reverted to the 10.2.0.3 mechanism by setting fix-control 5483301 to zero (or off), either with an “alter session” call or inside the /*+ opt_param() */ hint. There is, however, another option – if you get the column stats, then immediately set them (dbms_stats.get_column_stats(), dbms_stats.set_column_stats()) the optimizer defines the stats as “user defined” and (for reasons I don’t know – perhaps it’s an oversight) reverts to the 10.2.0.3 behaviour. Here’s some code to demonstrate the point; as the srcipt header says, I’ve tested it on versions up to 18.1


rem
rem     Script:         histogram_hack_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jul 2018
rem
rem     Last tested 
rem             18.1.0.0        via LiveSQL (with some edits)
rem             12.2.0.1
rem             12.1.0.2
rem

create table t1
as
select 'Y' c2, 'N' c3 from all_objects where rownum <= 71482 -- > comment to avoid format issue
union all
select 'N' c2, 'Y' c3 from all_objects where rownum <= 1994 -- > comment to avoid format issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 10 for columns (c2,c3) size 10');

column column_name format a128 new_value m_colname

select  column_name
from    user_tab_cols
where   table_name = 'T1'
and     column_name not in ('C2','C3')
;

set autotrace traceonly explain
select /* pre-hack */ * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

declare
        l_distcnt number default null;
        l_density number default null;
        l_nullcnt number default null;
        l_srec    dbms_stats.statrec;
        l_avgclen number default null;

begin

        dbms_stats.get_column_stats (
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

        dbms_stats.set_column_stats(
                ownname =>user,
                tabname =>'t1',
                colname =>'&m_colname',
                distcnt =>l_distcnt,
                density =>l_density,
                nullcnt =>l_nullcnt,
                srec    =>l_srec,
                avgclen =>l_avgclen
        );

end;
/

set autotrace traceonly explain
select /* post-hack */  * from t1 where c2 = 'N' and c3 = 'N';
set autotrace off

I’ve created a simple table for the data and collected stats including histograms on the two columns and on the column group. I’ve taken a simple strategy to find the name of the column group (I could have used the function dbms_stats.create_extended_stats() to set an SQL variable to the name of the column group, of course), and then run a little bit of PL/SQL that literally does nothing more than copy the column group’s stats into memory then write them back to the data dictionary.

Here are the “before” and “after” execution plans that we get from autotrace:

BEFORE
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   997 |  3988 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   997 |  3988 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')


AFTER
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     4 |    23  (27)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |     1 |     4 |    23  (27)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("C2"='N' AND "C3"='N')

As required – the estimate for the (‘N’,’N’) rows drops down to (the optimizer’s best approximation to ) zero.

Footnote:

An alternative strategy (and, I’d say, a better strategic approach) would have been to create a “fake” frequency histogram that included the value (‘N’,’N’) giving it a frequency of 1 – a method I’ve suggested in the past  but with the little problem that you need to be able to work out the value to use in the array passed to dbms_stats.set_column_stats() to represent the value for the (‘N’,’N’) combination – and I’ve written about that topic in the past as well.

You might wonder why the optimizer is programmed to use “half the least popular” for predicates references values not in the index. Prior to 12c it’s easy to make an argument for the algorithm. Frequency histograms used to be sampled with a very small sample size, so if you were unlucky a “slightly less popular” value could be missed completely in the sample; if you were requesting a value that didn’t appear in the histogram then (presumably) you knew it should exist in the data, so guessing a cardinality somewhat less than the least popular must have seemed like a good idea.

In 12c, of course, you ought to be taking advantage of the “approximate NDV” implementation for using a 100% sample to generate frequency (and Top-N / Top-Frequency histograms). If you’ve got a 12c frequency histogram then the absence of a value in the histogram means the data really wasn’t there so a cardinality estimate of 1 makes more sense. (Of course, you might have allowed Oracle to gather the histogram at the wrong time – but that’s a different issue). If you’ve got a Top-N histogram then the optimizer will behave as if a “missing” value is one of those nominally allowed for in the “low frequency” bucket and use neither the 1 nor the “half the least popular”.

So, for 12c and columns with frequency histograms it seems perfectly reasonably to set the fix control to zero – after getting approval from Oracle support, of course.

 

FAQ: Webinars for “Oracle Indexing Internals and Best Practices”

I’ve been somewhat inundated with questions regarding the “Oracle Indexing Internals and Best Practices” webinar series I’ll be running in October and November since I announced both webinar series last week. So I’ve compiled the following list of frequently asked questions which I’m hoping will address most of those asked. If you have any additional […]

All about headroom and mandatory patching before June 2019

This post was triggered upon rereading a blogpost by Mike Dietrich called databases need patched minimum april 2019. Mike’s blogpost makes it clear this is about databases that are connected using database links, and that:
– Newer databases do not need additional patching for this issue (11.2.0.4, 12.1.0.2, 12.2 and newer).
– Recent PSU patches contain a fix for certain older versions (11.1.0.7, 11.2.0.3 and 12.1.0.1).
– This means versions 11.2.0.2 and earlier 11.2 versions, 11.1.0.6 and earlier and anything at version 10 or earlier can not be fixed and thus are affected.

But what is the actual issue?

The first link in the article: Recommended patches and actions for Oracle databases versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2361478.1) provides essentially the same information as Mike’s blogpost, however it additionaly mentions that the interoperability of database clients with database servers is not affected.

Mike’s article mentions the following:
The alert refers to an SCN issue which came up a while ago. The system change number (SCN) is a logical, internal timestamp used by the Oracle Database. SCNs order events that occur within the database, which is necessary to satisfy the ACID properties of a transaction. The database uses SCNs to query and track changes.

So I guess it has something to do with SCNs. Most of the links are about SCNs. The MOS article that is most descriptive is: System Change Number (SCN), Headroom, Security and Patch Information (Doc ID 1376995.1).

This article describes a lot of details:
– SCNs are necessary for the database to keep changes organised. I got an article that explains that SCNs are not unique to a transaction, but are “just” granular enough to keep changes organised.
– SCNs are an ever increasing number. SCNs are never decreased!
– SCNs have a hard limit, which version specific, and is based on the number of bits for the number:
– – The general limit is 2^48.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 2^63.
– SCNs have a per second increasing soft limit, called ‘RSL’ or ‘reasonable SCN limit’, which is version specific:
– – The general soft limit is 16384 (16k) SCNs per second calculated by the number of second from Januari 1st, 1988 times 16384.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008 times 98304.
– The RSL can not be exceeded, if a database session tries to go beyond the soft limit, an ORA 600 [2252] is raised and the transaction is rolled back. This will not corrupt data (but obviously the data in the transaction is not applied).
– The difference between the current SCN and the RSL SCN is known as ‘SCN headroom’.
– There have been bugs that can increase SCNs at a higher rate, decreasing the SCN headroom or even reaching the soft limit, but these have all been solved in the Januari 2012 CPU/PSU/patch bundles.
– When databases communicate which each other via a database link, the SCNs of both databases are synchronised by picking the highest of the two.

But it doesn’t really explain why patches must be applied before June 2019. However, another article is more concrete on that: Recommended patching and actions for Oracle database versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2335265.1). The first interesting thing mentioned is:

3. What is the change introduced by the patches listed above?
These patches increase the database’s current maximum SCN (system change number) limit.
At any point in time, the Oracle Database calculates a “not to exceed” limit for the number of SCNs a database can have used, based on the number of seconds elapsed since 1988.

So, this means the patched database have a change (increase) in the RSL.

These recommended patches enable the databases to allow for a higher current maximum SCN limit. The rate at which this limit is calculated can be referred to as the “SCN rate” and these patches help allow higher SCN rates to enable databases to support many times higher transaction rates than earlier releases.

And this means the RSL is increased from the 16k per second since Januari 1988 to the 96k per second since Januari 2008.

Please note that the patches only increase the max limit but the current SCN is not impacted. So, if all your databases don’t have any major change in transaction rate, the current SCN would still remain below the current maximum SCN limit and database links between newer (or patched) and unpatched databases would continue to work. The patches provide the safety measure to ensure that you don’t have any issue with dblinks independent of any possible future change in your transaction rate.

With the patches applied, this change in current maximum SCN limit will happen automatically starting 23rd June 2019.

That is important information! So with the patch applied to some databases and not applied to some other databases and patched and non-patched databases have database links between them, everything should remain working. This is true for any database at this moment, because the change of the limit will happen on the 23rd of June 2019, which at the time of writing is in the future.

Once the change of limit has happened at the 23rd of June 2019, database links between older, non-patched versions of the database and newer or patched versions of the database could be affected if after the 23rd of June 2019 the SCN rate is increased in a newer or patched database and it runs beyond the headroom available in databases with the 16k rate, because the earlier mentioned SCN synchronisation then will fail because it’s beyond the 16k rate database headroom meaning it can not synchronise with the newer dataebase.

So the problem we are talking about here is two databases talking with each other over a database link, which have a different RSL, which could lead to a situation that one database is at an SCN number which is too high for the other older or non-patched database, meaning the communication will fail, which will persist until the older or non-patched databases is able to reach the SCN number of the newer database over time, if that is possible at all.

A thing that is not clear to me at this point: the documentation to me seems to suggest that Oracle version 12.2 with compatibility set to 12.2 or higher versions already allows the higher number of SCNs per second. (the above point: – – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008) If that is true, the issue that is warned for could potentially already happen!

Luckily, there is are ways to investigate this:

The reported “newer” versions and the versions that are patched for the rate change have an SGA variable “kcmscnc_” that lists the current SCN rate of the database. There are 3 values that I have seen for “kcmscnc_”:
1: This is the rate of 16k per second since Januari 1st 1988.
2: ?
3: This is the rate of 96k per second since Januari 1st 2008.

SQL> select ksmfsnam, ksmfsadr, ksmfssiz from x$ksmfsv where ksmfsnam like 'kcmscnc_';
KSMFSNAM							 KSMFSADR	    KSMFSSIZ
---------------------------------------------------------------- ---------------- ----------
kcmscnc_							 000000006001579C	   4
SQL> oradebug peek 0x6001579C 4
[06001579C, 0600157A0) = 00000001

So, this databases is capable of switching (because kcmscnc_ exists), and is currently running with the 16k per second threshold.

In fact, I tested this on 11.2.0.4, 12.1.0.2 and 18.3.0.0, all these version report currently (currently is before June 2019) “1” or “scheme 1”. So the above mentioned rate of 96k for 12.2 and above does currently not apply for the soft limit or SCN headroom calculation for any database, including 12.2 and 18.3.

There is a less “hardcore” way to obtain this information, via the DBMS_SCN. This package equally to the “kcmscnc_” variable only exists if the database is of a version or patch version high enough to switch:

declare
  v_rsl number;
  v_headroom_in_scn number;
  v_headroom_in_sec number;
  v_cur_scn_compat number;
  v_max_scn_compat number;
begin
  dbms_scn.getcurrentscnparams(v_rsl, v_headroom_in_scn, v_headroom_in_sec, v_cur_scn_compat, v_max_scn_compat);
  dbms_output.put_line('reasonable scn limit (soft limit): '||to_char(v_rsl,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in scn                  : '||to_char(v_headroom_in_scn,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in sec                  : '||v_headroom_in_sec);
  dbms_output.put_line('current scn compatibility scheme : '||v_cur_scn_compat);
  dbms_output.put_line('max scn compatibility scheme     : '||v_max_scn_compat);
end;
/

SQL> /
reasonable scn limit (soft limit):	 16,104,861,483,008
headroom in scn 		 :	 16,104,860,802,471
headroom in sec 		 : 982962695
current scn compatibility scheme : 1
max scn compatibility scheme	 : 3

PL/SQL procedure successfully completed.

This is executed in a version 18.3.0.0 database. So the conclusion here is that currently all versions up to 18.3.0.0 are still compatible, because they all use the same SCN limit per second, which is referred to as ‘scheme 1’. However, on June 23, 2019 newer versions of the database will switch to a new scheme, which is referred to as ‘scheme 3’, which allows an more headroom.

First of all, I hope a lot of databases out there have enough headroom in the first place, and a modest rate of SCNs it is using per second, which means it doesn’t advance into the RSL. In such a case, when you got older versions that can not switch to scheme 3 and newer versions that will, I can see no reason to worry at all.

The second case is when your database is close to running out of headroom currently in scheme 1, and the SCN rate in the database is also close the limit, so you truly should worry when your database switches to scheme 3, it might actually run over the 16k per second limit, and when it does that long enough to run over RSL of scheme 1, communication over a database link between the current scheme 3 database with a scheme 1 database will be disrupted until the scheme 3 database SCN is low enough again to fit the scheme 1 RSL. If the SCN rate persists in the scheme 3 database, communication will be impossible between the scheme 1 and a scheme 3 database.

So, at this point it should be clear that the most important statistic for determining issues between scheme 1 and after June 2019 scheme 3 databases is the current headroom in your databases. For all of the databases involved that will have a database link between a scheme 1 and a scheme 3 database, you should investigate their SCN number and SCN rate. If some of these databases have run into the soft limit ora-600, ora 600 [2252], these are potential candidates for running over the soft limit when they switch to scheme 3.

However, the most important statistic is to see how close the current SCN is to the scheme 1 soft limit. This can be done using the following SQL (this SQL does not need a newer version of the database, and is tested with version 11.2.0.2 and higher):

col "RSL scheme 1" format 9,999,999,999,999,999
col "current value" format 9,999,999,999,999,999
select dbms_flashback.get_system_change_number "current value",
       ((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024)) "RSL scheme 1",
       round(dbms_flashback.get_system_change_number/((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024))*100,5) "% to RSL scheme 1"
from dual;

If a database is close the RSL (roughly speaking beyond 90%-95%), the next thing to do is measure if the database keeps on using SCNs and keeps on being close to the RSL. If that is true, an additional increase in SCN usage could in the current situation using scheme 1 lead to an ora-600 [2252], but if that database has switched to scheme 3 after June 2019, there will not be anything keeping that database from going beyond an SCN number that will exceed the RSL of scheme 1, which will then cause issues if that database has a database link with a scheme 1 database.

Is there anything you can do if you suspect or know a database will go over the scheme 1 RSL? Purely for this issue, the obvious solution would be to make sure you are on a version that will switch to scheme 3 on June 2019, so at least after June 2019 it will not run into ora-600 [2252].

However, if such a scheme 3 database needs to connect to an older scheme 1 database, you have two choices:
1. Potentially run over the the scheme 1 limit and disrupt the database link communication.
2. Stop a newer database from switching to scheme 3, potentially disrupt changes in the current database, but it guarantees the database link will always work.

To look into the switch to scheme 3, which oracle calls ‘auto rollover’, the following SQL can be used:

declare
  v_autorollover_date date;
  v_target_compat number;
  v_is_enabled boolean;
begin
  dbms_scn.getscnautorolloverparams(v_autorollover_date, v_target_compat, v_is_enabled);
  dbms_output.put_line('auto rollover date      : '||to_char(v_autorollover_date,'YYYY-MM-DD'));
  dbms_output.put_line('target scheme	        : '||v_target_compat);
  dbms_output.put_line('rollover enabled (1=yes): '||sys.diutil.bool_to_int(v_is_enabled));
end;
/
SQL> /
auto rollover date	: 2019-06-23
target scheme		: 3
rollover enabled (1=yes): 1

PL/SQL procedure successfully completed.

If you want to prevent a database from rolling over to scheme 3, the procedure dbms_scn.disableautorollover can be used:

exec dbms_scn.disableautorollover;

Obviously, the procedure enableautorollover does the opposite. Please mind to contact Oracle support before doing this with your production database, this is an undocumented procedure at this time.

Also mind that if you create a new database after June 23, 2019, with a new or patched version that can switch to scheme 3, it will probably be running scheme 3 by default. If you want to be absolutely sure it will not exceed the scheme 1 limit, you can revert it to scheme 1 manually using the alter database set scn compatibility N command in mount mode:

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1048574496 bytes
Fixed Size		    8665632 bytes
Variable Size		  281018368 bytes
Database Buffers	  616562688 bytes
Redo Buffers		  142327808 bytes
Database mounted.
SQL> alter database set scn compatibility 1;

Database altered.

SQL> alter database open;

Database altered.

For this too I would strongly advise to contact Oracle support first. The purpose of this blogpost is to define the problem, show all the technical details that have to do with it, and show all the tools that are part of it. There is in no way anything in this article to tell you what you should do, it just shows everything that surrounds the switch to scheme 3 in June 2019.

Another view that might be beneficial is x$kcmscn. This view seems to be created to help looking if a scheme 3 database can connect to a scheme 1 database:

col cur_max_scn format 999,999,999,999,999
col pre_11_2_0_2_cur_max_scn format 999,999,999,999,999
select * from x$ksmscn;
ADDR		       INDX    INST_ID	   CON_ID CUR_MAX_RATE	  CUR_SCN
---------------- ---------- ---------- ---------- ------------ ----------
	 CUR_MAX_SCN DIFF_IN_DAYS DIFF_IN_HOURS PRE_11_2_0_2_CUR_MAX_SCN
-------------------- ------------ ------------- ------------------------
PRE_11_2_0_2_DIFF_IN_DAYS
-------------------------
00007F773DEDAE10	  0	     1		0	     0	   800918
  16,108,830,064,640	    11379	 273112       16,108,830,064,640
		186444783

Because currently (before June 2019) every database by default will be in scheme 1, the cur_max_scn and pre_11_2_0_2_cur_max_scn are identical. I even believe the column naming is wrong, the first version that can switch if it is patched to a high enough PSU version is 11.2.0.3, I do believe the column name is suggesting scheme 1 databases are databases of a version lower than 11.2.0.2, not including 11.2.0.2.

Conclusion.
I think there’s been a lot of fuzz for something that in most cases is not an issue. This article is supposed to give you all the knowledge and the tools to determine how it looks like in your situation.

This might be an issue if you happen to have one or more databases that are high on SCN numbering, and continues to take a lot of SCN numbers, and will be converted to a scheme 3 database on June 29, 2019 and is suspected to increase on taking SCN numbers for whatever reason AND it has a database link to a scheme 1 database that remains scheme 1. That’s a lot of ifs.

On the other hand you only need one database to be high in SCN numbering which continues to take a lot of SCNs keeping it close to the soft limit, which will propagate its SCN to other databases if it is linked, or the required properties of the problem spread out over multiple linked databases.

Again, I do not advise anything in this article, the purpose here is to provide all the details that surround it so you can make the best decision for yourself.

RAC One node databases are relocated by opatchauto in 12.2 part 1

This is an interesting observation I wanted to share. I have a feeling as if there didn’t seem to be too much information out there for RAC One Node (RON) users, and I hope this helps someone thinking about patching his system.

RAC-rolling patching is well documented in patch readme files, blog posts and official white papers. Most RAC DBAs have a solid handle on the procedure. Patching RAC One Node is a different affair.

What happens when patching a RAC One Node system? As the name suggests a RAC One Node database is a cluster database restricted to one active instance in normal operations. It is possible to relocate the database from one node to another online. Oracle does this by temporarily adding a second instance to the cluster database with the intention of letting it take over from the source instance. At the end of the online relocation, the source instance is shut down, and only the destination instance remains up and running.

An online relocation quite often is a manual operation. However I noticed that such an online relocation can happen during patching with opatchauto as well, at least in 12.2.

This post is intended to show you the process as it is, in the next part I’d like to show some implications of that approach.

The environment

In this example my lab environment consists of a 2 node RAC system currently patched to 12.2.0.1.180417. I wanted to apply the July 2018 RU to the system next to get some experience with the patch.

I have one RDBMS home in addition to the mandatory Grid home, same release level for both, no one-off patches (it’s a lab after all). The virtual machines run Oracle Linux 7.4 with kernel UEK4. To keep things simple there’s a single RAC One database, named RON. I assigned it DCB (“data centre B”) as unique name because I don’t like setting db_unique_name to reflect roles such as “PROD” and “STDBY”. It gets confusing when “STDBY” runs in primary role :)

Here’s the current status of my components:

[oracle@rac122sec2 ~]$ srvctl status database -db DCB
Instance DCB_1 is running on node rac122sec1
Online relocation: INACTIVE

[oracle@rac122sec2 ~]$ srvctl status service -db DCB
Service RON_SVC is running on instance(s) DCB_1
[oracle@rac122sec2 ~]$ 

For the curious, here’s the configuration metadata:

[oracle@rac122sec2 ~]$ srvctl config service -db DCB
Service name: RON_SVC
Server pool: 
Cardinality: 1
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type: 
Failover method: 
TAF failover retries: 
TAF failover delay: 
Failover restore: NONE
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: NONE
Edition: 
Pluggable database name: 
Maximum lag time: ANY
SQL Translation Profile: 
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
Drain timeout: 
Stop option: 
Session State Consistency: DYNAMIC
GSM Flags: 0
Service is enabled
Preferred instances: DCB_1
Available instances: 
CSS critical: no

[oracle@rac122sec2 ~]$ srvctl config database -db DCB
Database unique name: DCB
Database name: RON
Oracle home: /u01/app/oracle/product/12.2.0.1/dbhome_1
Oracle user: oracle
Spfile: +DATA/DCB/spfileRON.ora
Password file: +DATA/DCB/orapwRON
Domain: 
Start options: open
Stop options: immediateb
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: 
Disk Groups: DATA,RECO
Mount point paths: 
Services: RON_SVC
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: DCB
Candidate servers: rac122sec1,rac122sec2
OSDBA group: dba
OSOPER group: oper
Database instances: DCB_1
CSS critical: no
CPU count: 0
Memory target: 0
Maximum memory: 0
Default network number for database services: 
Database is administrator managed

The most important takeaway is that my RON instance DCB_1 is running on node rac122sec1.

Now let’s patch

After having followed the instructions in the patch readme closely, and after double/triple/quadrupel checking that I have (working, tried and tested!) backups of the entire stack I am ready to patch. This time around I’m following the instructions for the automatic application of the Grid Infrastructure RU, eg using opatchauto. Here is some relevant output from the patching session:

...
OPatchauto session is initiated at Thu Jul 26 14:12:12 2018

System initialization log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchautodb/systemconfig2018-07-26_02-12-14PM.log.

Session log file is /u01/app/12.2.0.1/grid/cfgtoollogs/opatchauto/opatchauto2018-07-26_02-13-15PM.log
The id for this session is Q4JA

Executing OPatch prereq operations to verify patch applicability on home /u01/app/12.2.0.1/grid

Executing OPatch prereq operations to verify patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1

Patch applicability verified successfully on home /u01/app/12.2.0.1/grid


Verifying SQL patch applicability on home /u01/app/oracle/product/12.2.0.1/dbhome_1
SQL patch applicability verified successfully on home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
No step execution required.........
 

Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
/u01/app/oracle/product/12.2.0.1/dbhome_1 is not a RACOne database. No step execution required........
 

Bringing down CRS service on home /u01/app/12.2.0.1/grid
...

Wait a minute, what’s that? Have a look at the line beginning with “Relocating RACOne home before patching…”. Relocating the database wasn’t necessary in this case (remember that the database was active on rac122sec1-the other node), but opatchauto can definitely relocate your RAC One database.

When it does, you will see something like this in the output generated by opatchauto:

...
Preparing to bring down database service on home /u01/app/oracle/product/12.2.0.1/dbhome_1

WARNING: The service RON_SVC configured on dcb will not be switched as it is not configured to run on any other node(s).
Successfully prepared home /u01/app/oracle/product/12.2.0.1/dbhome_1 to bring down database service


Relocating RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated RACOne home before patching on home /u01/app/oracle/product/12.2.0.1/dbhome_1
...

The last 2 lines are those of interest. opatchauto detected that a RAC One database was running on the active node, and relocated it. Under the covers it uses a “srvctl relocate database …” command, as shown in the session log file.

Interestingly however, and contrary to what I expected, opatchauto moves the RAC One database back to where it came from as a post-patch step. Towards then end of the patching session I saw this:

...
Starting CRS service on home /u01/app/12.2.0.1/grid
Postpatch operation log file location: /u01/app/oracle/crsdata/rac122sec2/crsconfig/crspatch_rac122sec2_2018-07-26_03-01-06PM.log
CRS service started successfully on home /u01/app/12.2.0.1/grid


Relocating back RACOne to home /u01/app/oracle/product/12.2.0.1/dbhome_1
Relocated back RACOne home successfully to home /u01/app/oracle/product/12.2.0.1/dbhome_1


Preparing home /u01/app/oracle/product/12.2.0.1/dbhome_1 after database service restarted
No step execution required.........
...

The relevant bit is in the middle (“relocating …”). After relocating the database to rac122sec1 opatchauto moved it back to rac122sec2.

Summary

Unlike rolling patching on multi-node RAC where all instances on the patched RDBMS home are shut down and applications rely on connection pools and Fast Application Notification to maintain service ability, a RAC One Node database might be relocated to a different node in the cluster. There are implications to that process for application developers, some of which I hope to share in the next post.