Search

Top 60 Oracle Blogs

Recent comments

February 2011

Unlocking Temporary Table Instances from Deleted Process Requests

In a previous blog entry, I discussed how to detect whether an Application Engine process had run out of non-shared Temporary Table instances by querying the message log table for the entries written when this happens.

This can happen because processes that have locked all available non-shared table instances are currently processing. However, if a restartable Application Engine program terminates with an error, the tables remain locked to that process instance so that it can be restarted. If that process instance is neither cancelled nor restarted so that it subsequently runs to success then the tables will remain locked, and in time the Process Scheduler archive process may purge the process request record.

Philosophy – 14

Paraphrasing Yogi Berra:

    “It ain’t committed until it’s committed.”

If you’re wondering why it’s worth remembering this odd comment – it addresses the (commonly asked) question:

    “does the redo log contain uncommitted data as well as committed data?”

The answer is: yes.

When a session is creating redo change vectors it doesn’t know whether it is going to commit or rollback. But a session has to be able to store an arbitrarily large list of change vectors somewhere, and that list has to appear in the redo log (ideally “instantly”) if the session commits – so Oracle avoids delays on commit by putting the change vectors into the redo log as they are created***.

If you view the question from the opposite extreme, the recovery mechanism has to be able to deal with uncommitted data anyway because there are, after all, several scenarios where data that definitely was committed cannot be recovered; for example, recovery until end of log file 9998 because log file 9999 was destroyed and simply doesn’t exist – how can the code handle transactions that were not committed until part way through file 9999 if it only knows how to handle committed transactions ?)

*** Not strictly true from 10g onwards where Oracle introduced a delaying effect aimed at reducing competition for the redo allocation and redo copy latches for “small” transactions.

[The Philosophy Series]

Oracle11g Creation On Demand Indexes (Invisible Touch)

Prior to Oracle11g Release 2, the default and minimum size of a segment is one extent. So in the below example, where we create a table and five associated indexes:     Each of the segments has been allocated an extent, including each of the indexes.   However, since Oracle11g Release 2, this default behaviour has changed. [...]

IS NULL Conditions and B-tree Indexes

At page 383 of my book I wrote the following sentence (BTW, the same information is also provided by Table 9-3 at page 381):

With B-tree indexes, IS NULL conditions can be applied only through composite B-tree indexes when several SQL conditions are applied and at least one of them is not based on IS NULL or an inequality.

The text continues by showing the following examples (notice that in both cases the IS NULL predicate is applied through an access predicate):

Strange Timestamp Behavior

February 17, 2011 I have not experimented much with the TIMESTAMP datatype in Oracle – the DATE datatype is usually sufficient for my needs.  I thought that I would try to build a couple of test scripts to experiment with TIMESTAMPs, but I hit a bit of a snag in my testing.  The first script [...]

Database Machine 3D tours

Somehow I missed a post in the Oracle Database Insider blog with the links to 3D tours of the latest Database Machines – X2-2 and X2-8. I must say they’ve impressed me. Not by the amount of technical details these simple tours have, but rather the way they present data. It’s easy. It’s understandable. It’s absolutely not Oracle style. If you are following Oracle’s press releases you know what I mean. They suck. I spend time reading press releases to get WTF marketing people want to say. Go here for example and tell me what is the configuration of new servers in s/c/t.

Enabling constraint in parallel

Recently I did some tuning of data generation scripts, which purpose is to build large amount of representative data for application testing. Direct-path inserts are in use and as a prerequisite all constraints and indexes on target tables are disabled before the load and are enabled after it. Since I wanted to utilize available resources on the machine for that task, almost each step uses parallel execution. Well, kind of almost, because enabling constraints didn’t run in parallel, although I’ve politely asked Oracle to do so. I’ll explain here why it didn’t work.

RAC One Node and Database Protection

An email from fellow Oak Table Member James Morle about RAC One Node and failover got me thinking about the capabilities of the product.

I have written about RON (Rac One Node) in earlier posts, but haven’t really explored what happens with session failover during a database relocation.

Overview

So to clarify what happens in these two scenarios I have developed a simple test. Taking a RON database + a service I modified both to suit my test needs. Connected to the service I performed a database relocation to see what happens. Next I killed the instance (I wasn’t able to reboot the node) t o simulate what happens when the node crashes.

Setup

The setup used an existing database, “RON”. It also had a service defined already, but that needed tweaking. The database was defined as follows:

$ srvctl config database -d RON
Database unique name: RON
Database name: RON
Oracle home: /u01/app/oracle/product/11.2.0.2
Oracle user: oracle
Spfile: +DATA/RON/spfileRON.ora
Domain: example.com
Start options: open
Stop options: immediate
Database role: PRIMARY
Management policy: AUTOMATIC
Server pools: RON
Database instances:
Disk Groups: DATA
Mount point paths:
Services: RON_APP.example.com
Type: RACOneNode
Online relocation timeout: 30
Instance name prefix: RON
Candidate servers: node1,node2
Database is administrator managed

The service was initially defined as follows:

srvctl config service -d RON
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: NONE
Failover method: NONE
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

Some of these attributes require special attention, we have 3 categories to deal with: preferred instances for TAF, the TAF setup itself and the runtime load balancing (RLB) advisory.

Transparent Application Failover with “real” RAC only works if there are two preferred instances. As you can see the service has only one preferred instance. Not sure if that can be changed though…Let’s try:

srvctl modify service -s RON_APP -d RON -i RON_1,RON_2
PRKO-2007 : Invalid instance name: RON_2

I wasn’t surprised-RON is not a RAC database so it has only 1 active instance. When registering the RAC One Node database you don’t add instances as you would with a RAC database, instead you set the database type to RACONENODE (srvctl add database -d name -c RACONEONE … )

I recommend setting TAF properties on the service level-that way you don’t miss crucial parameters in your tnsnames.ora file. This is the preferred way of doing it at least since 11.2. Changing the service is straight forward:

srvctl modify service -d RON -s RON_APP.example.com \
> -P BASIC -e SESSION -m BASIC

This piece of code instructs the service to use a BASIC TAF policy, a failover type of SESSION and the BASIC failover method. These parameters were normally be configured in the CONNECT_DATA section of your TNSNames.ora file.

With these changes made, the service configuration has changed to the below:

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

I also wanted to change the defaults to a more suitable RLB configuration. Instead of a CLB goal of “LONG” I wanted to set it up for “SHORT”. And I needed emphasis on SERVICE_TIME as well (my intention was to run swingbench). The change and resulting service configuration are shown below:

$ srvctl modify service -d RON -s RON_APP.example.com \
> -j short -B service_time

$ srvctl config service -d RON -s RON_APP.example.com
Service name: RON_APP.example.com
Service is enabled
Server pool: RON
Cardinality: 1
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Failover type: SESSION
Failover method: BASIC
TAF failover retries: 0
TAF failover delay: 0
Connection Load Balancing Goal: SHORT
Runtime Load Balancing Goal: SERVICE_TIME
TAF policy specification: BASIC
Edition:
Preferred instances: RON_1
Available instances:
[rac]oracle@node1.example.com $

My SCAN name was scan1.example.com, and I ensured that the local_listener was pointing to my none default port of 1821 and the remote_lister was using the EZConnect synatax (“scan1.example.com:1825). It is very important to set the local_listener parameter if you are not using the default port of 1521!

On my client system I defined a local TNS alias “RON” to connect  to the RON database. Note that it doesn’t use any TAF parameters.

C:\oracle\product\11.2.0\client_1\network\admin>tnsping ron

TNS Ping Utility for 32-bit Windows: Version 11.2.0.2.0 - Production on 15-FEB-2011 17:07:44

Copyright (c) 1997, 2010, Oracle.  All rights reserved.

Used parameter files:
C:\oracle\product\11.2.0\client_1\network\admin\sqlnet.ora

Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = scan1.example.com)(PORT = 1825)
)) (CONNECT_DATA = (SERVICE_NAME = LRON_APP.uk.db.com)))
OK (10 msec)

Database Relocation

Oracle’s promise is that you don’t lose your session during a database relocation. Let’s see if that is actually true. Using my setup I connected to the RON database (sorry for the broken formatting!):

C:\oracle\product\11.2.0\client_1\network\admin>sqlplus martin/test@ron

SQL*Plus: Release 11.2.0.2.0 Production on Tue Feb 15 16:50:08 2011

Copyright (c) 1982, 2010, Oracle.  All rights reserved.

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

SQL> select username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
------------------------------ ---------- ------------- ---
MARTIN                         BASIC      SESSION       NO

OK, so TAF is working. At this time I started the relocate command:

[RON_1]oracle@node1.example.com $ srvctl relocate database -d RON -n node2 -w 1 -v
Configuration updated to two instances
Instance RON_2 started

I then repeatedly required my SQL statement above while the instance was relocating. The output is shown below.

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 1 node1.example.com:RON_1
 2 node2.example.com:RON_2

SQL> select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN';

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 1 MARTIN                         NONE       NONE          NO
 1 MARTIN                         BASIC      SESSION       NO
 2 MARTIN                         NONE       NONE          NO

As you can see the configuration change to 2 instances is reflected in the output of my query to v$active_instances. You also see the number of sessions increasing, pay attention to the inst_id colum: a new session is created on the second instance.

In my other session I saw the relocation completing:

Services relocated
Waiting for 1 minutes for instance RON_1 to stop.....
Instance RON_1 stopped
Configuration updated to one instance

I required one more time to see what happened-would my session survive?

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-25408: can not safely replay call

SQL> /

 INST_ID USERNAME                       FAILOVER_M FAILOVER_TYPE FAI
---------- ------------------------------ ---------- ------------- ---
 2 MARTIN                         BASIC      SESSION       YES

SQL>
SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- ------------------------------------------------------------
 2 node2.example.com:RON_2

Well it did survice. Note the ORA-25408 error. That’s expected, since I’m using the SQL*Plus client I don’t have the opportunity to trap the error and replay my OCI call. You should capture this SQLException in Java or your preferred development environment. I have provided an example in chapter 11 of Pro Oracle Database 11g RAC on Linux.

Node failure

I couldn’t see how a session would survive in the case of a node failure… I said this in my email to James:

> I cannot see how TAF or FCF work in case of a node failure. In my
> tests I did for the book TAF only worked if there was a service with
> at least 2 preferred instances. And FCF requires a FAN aware
> connection pool which is rare to find. RAC one however only has only 1
> active node (unless you relocate it).

But better test before jumping to conclusions!

I could only kill an instance rather than the server which would have been a better test. I assumed Clusterware would try to restart the failed instance on the same node a few times and then relocate the resource if the stat was not successful.

I knew the database was now running on node 2 so I killed the SMON process. That sure results in an instance crash.

oracle@node2.example.com $ ps -ef | grep RON | grep smon
oracle   14208     1  0 16:52 ?        00:00:00 ora_smon_RON_2
$ kill -9 14208

Just as you would expect, the session didn’s survive this (how could it? There is no active second instance!)

SQL> /
select inst_id,username,failover_method,failover_type,failed_over from gv$session where username='MARTIN'
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 15649
Session ID: 183 Serial number: 3

And just as expected, Clusterware restarted the failed instance the second it detected the failure. The node’s alert log showed this:

System State dumped to trace file /u01/app/oracle/product/admin/RON/admin/diag/rdbms/RON/RON_2/trace/RON_2_diag_14174.trc
ORA-1092 : opitsk aborting process
2011-02-15 16:58:29.255000 +00:00
ORA-1092 : opitsk aborting process
License high water mark = 6
2011-02-15 16:58:32.569000 +00:00
Instance terminated by PMON, pid = 14164
USER (ospid: 16101): terminating the instance
Instance terminated by USER, pid = 16101
2011-02-15 16:58:35.286000 +00:00
Starting ORACLE instance (normal)
2011-02-15 16:58:36.477000 +00:00
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
2011-02-15 16:58:42.687000 +00:00
Private Interface ‘bond1:1′ configured from GPnP for use as a private interconnect.
[name='bond1:1',...
..., use=public/1]

Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/product/11.2.0.2/dbs/arch
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is enabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 – 64bit Production

Summary

That concludes my testing. RON works better than a “classic” active/passive cluster and allows sessions to stay connected when migrating the database to a different host. And it makes it easier to convert the database to a full RAC database (I have an example but it needs a bit of tidying up before posting). On the other hand, virtualisation technology has allowed us to do the same for quite some time now. Xen and OracleVM can relocate domUs, and vmotion is the commercial alternative. Whatever suits your needs.

An Interesting Read...

I found this article on the 'security' firm HBGary to be very interesting.

They were brought down by 'Anonymous' recently, in a completely embarrassing fashion (for a security firm, for sure).
What I found most interesting was how it all started and how easy it was....
It all started with a simple SQL Injection attack on their custom Content Management System (CMS). By finding a small exploit in that application, they gained access to the usernames and hashed passwords. Since the hashed passwords were neither salted, nor exceptionally well hashed - simple rainbow tables were used to figure out the (simple) passwords of a few power users. Once they had the passwords to the CMS - they then discovered some power users (at a security firm) liked to reuse their passwords over and over on many systems. That fact in turn gave them access to their email - where they discovered yet more passwords (sent in emails) - including the root password to a powerful internal machine. Given that information - and improper ssh configuration - they were able to "socially engineer" (over email) remote access to this machine - to which they now had root... And that was the end of that.
An internet security firm - brought down - by not following the most *basic* of security principals.
And all because of - SQL Injection... If you don't use bind variables - you are susceptible to it. If you accept input from an end user and concatenate it into your SQL, you are subject to SQL Injection. If you use bind variables - if you do not dynamically construct your SQL at runtime - you are not subject to it. It is that simple.
I say this a lot:
"it is much harder to write code that doesn't use binds than it is to write code that uses binds".
To which I get a lot of confused stares - for all of the developers "know" that if they use binds - they have to write MORE code - not less. And MORE = harder -right?
Wrong, 100% wrong. If you do not use binds, you have to write more code than if you use them. The code you have to write is the code to ensure with 100% degree of certainty that your inputs from the end user are valid, are safe, will not subject you to SQL Injection. And that is non-trivial. The real kicker is - after you write that code - you better submit it for review to at least five people that do not like you (that last bit is important). They have to be SUPER critical of the code and subject it to rigorous review.
As a quick test - see if you can
  1. determine how the following bit of code can be attacked
  2. what might be the outcome of this attack - what might be compromised on your server if this code were attacked, what could they do with it?
  3. how to best protect against that attack
  4. how else - short of the "best" - would you protect against the attack

create or replace procedure inj( p_date in date )
as
l_rec all_users%rowtype;
c sys_refcursor;
l_query long;
begin
l_query := '
select *
from all_users
where created = ''' ||p_date ||'''';

dbms_output.put_line( l_query );
open c for l_query;

for i in 1 .. 5
loop
fetch c into l_rec;
exit when c%notfound;
dbms_output.put_line( l_rec.username || '.....' );
end loop;
close c;
end;

I'll post my answers to 1-4 tomorrow or the next day.

Think about it - how many of the developers you work with would even know that bit of code was easily attacked? Not many in my experience (I wouldn't have seen it right off until just a few years ago by the way - no magic here...)

Error message of the day: OUI-25023 and the FQDN

It’s been a long day with many problems around a Grid Control installation, including (but not limited to) corruption of the repository database, bugs in OUI when it comes to deinstalling the Oracle Management Server, lots of files left over by the weblogic “uninstall.sh” script and many more. Some of the error messages were quite misleading, and OUI-25023 just was one too many. What happened?

Earlier today I was trying to install the 64bit 11.1.0.1 agent on an 8 node cluster. After an initial headache (see below) it worked ok. However, I couldn’t resist mentioning OUI-25023. Here’s the complete story.

I downloaded the 11.1 agent for linux x86-64 as per the GC 11.1 documentation and deployed it to my fresh-installed management server. The OMS is on Solaris SPARC, and Grid Control doesn’t supply agents for a different platform. However, the security experts have locked the oracle account down on the cluster which ruled out the “agent push” scenario. I then opted for the installation via a response file, as described in the documentation.

The idea is that you set a number of variables in a response file “additional_agent.rsp” to inform the installer about your desired configuration. If memory serves me right then there was no way to run OUI in GUI mode, it had to be a silent installation.

Amongst the variables you need to specify the cluster nodes. I dutifully filled that information in, using the {“node1.example.com”, “node2.example.com”, …, “node8.example.com”} – i.e. the fully qualified domain name as per the documentation.

Starting the installer with the -silent and -responseFile /path/to/additional_agent.rsp options failed:

$ ./linux_x64/agent/runInstaller -silent -responseFile /u01/app/oracle/stage/agent_11.1.0.1/linux....
Starting Oracle Universal Installer...

Checking Temp space: must be greater than 150 MB.   Actual 1662 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 16415 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2011-02-15_03-15-10PM. Please wait ...
*** Check for updates ***
*** Select Installation Type ***
*** Check Prerequisites ***
*** Specify Oracle Management Service Location ***
*** Customize Ports ***
*** Review ***

ERROR: OUI-25023: The local node is not selected for installing this product. Include the local node in the cluster list or perform the installation on the nodes on which the install is to be performed.Also note that the cluster nodes should be specified in non FQDN format.

$

I was pretty sure that my node_list included my first node… Pinging the node and DNS access also worked (I wrote about that in a previous post). So a little research on Metalink revealed this gem:

OUI-25023 When Trying To Install A Patchset On RAC [ID 394868.1]

So contradicting the error message (…FQDN format…) I needed to specify the node_list as in the inventory: without the “example.com” bit. Once that was changed in the response file, the installation completed ok.