Search

Top 60 Oracle Blogs

Recent comments

March 2011

Copying the password file for RAC databases

This post is inspired by a recent thread on the oracle-l mailing list. In post “11g RAC orapw file issue- RAC nodes not updated” the fact that the password file is local to the instance has been brought up. In fact, all users with the SYSOPER or SYSDBA role granted are stored in the password file, and changing the account for the SYS user on one instance doesn’t mean the password change is reflected on the other RAC instances. Furthermore, your Data Guard configuration will break as well, since the SYS account is used to log in to the standby database.

On a related note, the change of the sys password for the ASM instance in GRID_HOME will propagate to all cluster nodes automatically, a fact I have first seen mentioned on the Dutch Prutser’s weblog, Harald van Breederode.

Now to get over the annoyance of having to manually copy the new password file to all cluster nodes I have written a small shell script, which I use for all my Linux clusters. It takes the ORACLE_SID of the local instance for input, then works out the corresponding ORACLE_HOME and copies the password file to all instances in the cluster, as listed in the output of olsnodes. The script can deal with separation of duty, i.e. Systems where GRID_HOME is owned by a different owner then the RDBMS ORACLE_HOME. The script is by no means perfect, and could be extended to deal with a more general setup. My assumption is that all cluster nodes have a 1:1 mapping of Oracle instance and ORACLE_SID, for example instance PROD1 will be hosted on the first cluster node, prodnode1.

The script is shown below, it’s been written and tested on Linux:

#!/bin/bash

# A small and simple script to copy a password file
# to all nodes of a cluster
# This works for me, it doesn't necessarily work for you,
# and the script is provided "as is"-I will not take
# responsibility for its operation and it comes with no
# warrenty of any sorts
#
# Martin Bach 2011
#
# You are free to use the script as you feel fit, but please
# retain the reference to the author.
#
# Usage: requires the local ORACLE_SID as a parameter.
# requires the ORACLE_SID or DBNAME to be in oratab

ORACLE_SID=$1
[[ $ORACLE_SID == "" ]] && {
 echo usage `basename $0` ORACLE_SID
 exit 1
}

#### TUNEABLES

# change to /var/opt/oracle/oratab for Solaris
ORATAB=/etc/oratab
GRID_HOME=/u01/crs/11.2.0.2

#### this section doesn't normally have to be changed

DBNAME=${ORACLE_SID%*[0-9]}
ORACLE_HOME=`grep $DBNAME $ORATAB | awk -F":" '{print $2}'`
[[ $ORACLE_HOME == "" ]] && {
 echo cannot find ORACLE_HOME for database $DBNAME in $ORATAB
 exit 2
}

cd $ORACLE_HOME/dbs
cp -v orapw$ORACLE_SID /tmp
INST=1

echo starting copy of passwordfile
for NODE in `$GRID_HOME/bin/olsnodes`; do
 echo copying orapw$ORACLE_SID to $NODE as orapw${DBNAME}${INST}
 scp orapw$ORACLE_SID $NODE:${ORACLE_HOME}/dbs/orapw${DBNAME}${INST}
 INST=$(( $INST + 1))
done

It’s fairly straight forward, we first get the ORACLE_SID and use this to get the ORACLE_HOME for the database.  The GRID_HOME has to be hard coded to keep it compatible with < 11.2 database where you could have a CRS_HOME different from the ASM_HOME. For Oracle < 11.2, you need to set the GRID_HOME variable to your Clusterware home.

The DBNAME is the $ORACLE_SID without trailing number, which I need to work out the SIDs of the other cluster nodes. Before copying the password file from the local node to all cluster nodes a copy is taken to /tmp, just in case.

The main logic is in the loop provided by the output of olsnodes, and the local password file is copied across all cluster nodes.

Feel free to use at your own risk, and modify/distribute as needed. This works well for me, especially across the 8 node cluster.

Moats

The (M)other (O)f (A)ll (T)uning (S)cripts. A tuning and diagnostics utility for SQL*Plus co-developed with Tanel Poder. Includes active session sampling and a TOP-style utility for displaying instance activity at regular refresh intervals. Supports versions from 10g Release 2 onwards. September 2010 (updated March 2011)

I can relate...

It is not often a simple picture and caption will make me laugh out loud - but this did. I feel like that some days... Most days now I guess :)

Update, things of interest and a couple of blogs to check out

tech.E2SN secret hacking session on Tuesday 22nd March:

Just in case you missed it – there’s still chance to sign up to my tomorrow’s ORA-4031 and shared pool hacking session. I initially planned to limit the attendees to 100 per event (as the limited GotoWebinar package is cheaper that way) but over 100 people had signed up for the US event on the day of announcement, even before it was 8am in California, so I figured I should invest a bit more and allow more people attend. So far over 500 people have signed up (total for both events). If you haven’t done so, you can sign up here:

Advanced Oracle Troubleshooting online seminar Deep Dives 1-5  on 11-15 April:

The next AOT deep dives (1-5) will start in 3 weeks, on 11-15 April. (and 6-10 will be on 9-13 May).

Check the details here:

Blogs to check out:

Andrey Nikolaev has done some serious low-level research on Oracle latches and KGX mutexes and he also presented his work this year at Hotsos Symposium (I missed his session as I was stuck in JFK instead of attending the conference on that day):

Porus Havewala is quite a Grid Control and OEM enthusiast. If you are into OEM & GC, check out his blog:

Future events:

  • I will be speaking at the UKOUG Exadata Special Event on 18th April
  • I will announce some more Virtual Conferences pretty soon!!! Very interesting topics and good speakers – including (but not limited to) some serious Exadata technical contents!

Share

oracle.com inaccessible

Well not really, but it is for me at the moment. A while back I could not log onto the oracle.com website with my personal account details. I kept getting invalid/username password. I clicked the link to reset my password and got a new password. This also did not work, so I tried my work [...]

Princep’s Fury…

Princep’s Fury is the fifth book in the Codex Alera series by Jim Butcher. The book starts a few months on from where the Captain’s Fury ended.

This book seems a little calmer in comparison, mostly because I was still recovering from the onslaught of the previous book I guess. Each book in the six part series only tells a fraction of the whole story, but the previous four books were written in such a way that they also felt reasonably self contained. Princep’s Fury in comparison feels like the author was planning for the last book more than concentrating on this one. That sounds kinda damning, but even though it drifts a little at times, it was still cool.

Cheers

Tim…




Princep’s Fury…

Princep’s Fury is the fifth book in the Codex Alera series by Jim Butcher. The book starts a few months on from where the Captain’s Fury ended.

This book seems a little calmer in comparison, mostly because I was still recovering from the onslaught of the previous book I guess. Each book in the six part series only tells a fraction of the whole story, but the previous four books were written in such a way that they also felt reasonably self contained. Princep’s Fury in comparison feels like the author was planning for the last book more than concentrating on this one. That sounds kinda damning, but even though it drifts a little at times, it was still cool.

Cheers

Tim…




Nested Loops Join – the Smaller Table is the Driving Table, the Larger Table is the Driving Table

March 21, 2011 I occasionally see discussions about Oracle Database behavior that make me wonder… is it true, can I generate a test case that validates the statement, and just as important, can I generate a test case that refutes the statement.  An interesting question was posted the the OTN forums regarding apparently conflicting advice [...]

Will Oracle Exadata Database Machine Eventually Support Offload Processing for Everything?

In my recent post entitled Exadata Database Machine: The Data Sheets Are Inaccurate! Part – I, I drew attention to the fact that there is increasing Exadata-related blog content produced by folks that know what they are talking about. I think that is a good thing since it would be a disaster if I were the only one providing Exadata-related blog content.

The other day I saw Tanel Poder blogging about objects that are suitable targets for Smart Scan. Tanel has added bitmap indexes to his list. Allow me to quickly interject that the list of what can and cannot be scanned with Smart Scan is not proprietary information. There are DBA views in every running Oracle Database 11g Release 2 instance that can be queried to obtain this information.  Tanel’s blog entry is no taboo.

So, while Tanel is correct, I think it is also good to simply point out that the seven core Exadata fundamentals do in fact cover this topic. I’ll quote the relevant fundamentals:

Full Scan or Index Fast Full Scan.

  • The required access method chosen by the query optimizer in order to trigger a Smart Scan.

Direct Path Reads.

  • Required buffering model for a Smart Scan. The flow of data from a Smart Scan cannot be buffered in the SGA buffer pool. Direct path reads can be performed for both serial and parallel queries. Direct path reads are buffered in process PGA (heap).

So, another way Tanel could have gone about it would have been to ask, rhetorically, why wouldn’t Exadata perform a Smart Scan on a bitmap index if the plan chooses access method full? The answer would be simple—no reason. It is an index after all and can be scanned with fast full scan.  So why am I blogging about this?

Can I Add Index Organized Tables To That List?
In a recent email exchange, Tanel asked me why Smart Scan cannot attack an index organized table (IOT). Before I go into the outcome of that email exchange I’d like to revert to a fundamental aspect of Exadata that eludes a lot of folks. It’s about the manner in which data is stored in the Exadata Storage Servers and how that relates to offload processing such as Smart Scan.

Data stored in cells is striped by Automatic Storage Management (ASM) across the cells with course-grain striping (granularity established by the ASM allocation unit size). With Exadata, the allocation unit size by default—and best-practice—is 4MB. Therefore, tables and indexes are scattered in 4MB chunks across all the cells’ disks.

Smart Scan performs multiple, asynchronous 1MB reads for allocation units (thus four 1MB asynchronous reads for adjacent 1MB storage regions). As the I/O operations complete, Smart Scan performs predicate operations (filtration) upon each storage region (1MB). If the data contained in a 1MB region references another portion of the database (as is the case with a traditional chained row for instance), Smart Scan cannot completely process that storage region. The blocks that reference indirect data are sent to the database grid in standard block form (the same form as when reading an ASM disk on conventional storage). The database server then chases the indirection because only it has the code to map the block-level indirection to an ASM AU in some cell, somewhere. Cells cannot ask other cells for data because cells don’t know anything about each other.

Thus far, in this blog post, I’ve taken the recurring question of whether Smart Scan works on a certain type of object (in this case IOT) and broadened the discussion to focus on a fundamental aspect of Exadata. So what does this broadened scope have to do with Smart Scan on IOT? Well, when I read that email from Tanel I used logic based on the fundamentals and shot off an answer. Before that hasty reply to Tanel I recalled IOT has the concept of an overflow tablespace. The concept of overflow tablespace—in my mind—has “indirection” written all over it. Later I became more curious about IOT so I scanned through the code (server side) and couldn’t find any hard barriers against Smart Scan on IOT. I was stumped (trust me that aspect of the code is not all that straightforward) so I asked the developers that own that specific part of the server. I found out my logic was faulty. I was wrong. It turns out that Smart Scan for IOT is simply not implemented. I’m not insinuating that means “implemented yet” either. That isn’t the point of this blog entry. Neither is admitting I was wrong in my original answer to Tanel.

Does The List Of Smart Scan-Compatible Objects Keep Growing And Growing?
Neither confessing how I shot off a wrong answer to Tanel, nor specifics about IOT Smart Scan support are the central points of this blog entry, just what is my agenda?  Primarily, I wanted to remind folks about the fundamental aspect of Exadata regarding indirection and Smart Scan (e.g., chained row, etc) and secondarily, I wanted to point out that the list of objects fit for Smart Scan is limited for reasons other than feasibility. Time to market is important. I know that. If an object like IOT is not commonly used in the data warehousing use-case it is unnecessary work to implement support for Smart Scan. But therein lies the third hidden agenda item for this post which is to question our continual pondering over the list of objects that support Smart Scan.

Offload processing is a good thing. I wonder, is the goal to offload more and more?  Some is good, certainly more must be better in a scale-out solution. Could offload support grow to the point where Exadata nears a state of “total offload processing?”  Would that be a bad thing? Well,  “total offload processing” is, in fact, impossible since cells do not contain discrete segments of data but instead the scattering of data I wrote about above.  However, more  can be offloaded. The question is just how far does that go and what does it mean in architectural terms? Humor me for another moment in this “total offload processing” train of thought.

If, over time, “everything”—or even nearly “everything”—is offloaded to the Exadata Storage Servers there may be two problems. First, offloading more and more to the cells means the query-processing responsibility in the database grid is systematically reduced. What does that do to the architecture? Second, if the goal is to pursue offloading more and more, the eventual outcome gets dangerously close to “total offload processing.” But, is that really dangerous?

So let me ask: In this hypothetical state of “total offload processing” to Exadata Storage Servers (that do not share data by the way), isn’t the result a shared-nothing MPP?  Some time back I asked myself that very question and the answer I came up with put in motion a series of events leading to a significant change in my professional career. I’ll blog about that as soon as I can.

Filed under: oracle

Upgrade Whoa

(The title’s a pun, by the way – an English form of humour that is not considered good unless it’s really bad.)

Very few people try to email me or call me with private problems – which is the way it should be, and I am grateful to my audience for realizing that this blog isn’t trying to compete with AskTom – but I do get the occasional communication and sometimes it’s an interesting oddity that’s worth a little time.

Today’s blog item is one such oddity – it was a surprise, it looked like a nasty change in behaviour, and it came complete with a description of environment, and a neatly formatted, complete, demonstration. For a discussion of the problem in Spanish you can visit the blog of John Ospino Rivas, who sent me the original email and has written his own blog post on the problem.

We start with a simple table, and then query it with a ‘select for update from two different sessions:


drop table tab1 purge;

create table tab1(
	id	number,
	info	varchar2(10),
	constraint tab1_pk primary key (id)
		using index (create index idx_tab1_pk on tab1(id))
);

insert into tab1 values(1,'a');
insert into tab1 values(2,'a');
insert into tab1 values(3,'a');
commit;

execute dbms_stats.gather_table_stats(user,'tab1',cascade=>true)

column id new_value m_id

set autotrace on explain

select  id
from    tab1
where   id = (
            select  min(id)
            from    tab1
        )
for update
;

set autotrace off

prompt	=============================================================
prompt  Now repeat the query in another session and watch it lock
prompt	And use a third session to check v$lock
prompt  Then delete here, commit and see what the second session does
prompt	=============================================================

accept X prompt 'Press return to delete and commit'

set verify on
delete from tab1 where id = &m_id;
commit;

The fact that the primary key index is created as a non-unique index isn’t a factor that affects this demonstration.

Given the query and the data in the table, you won’t be surprised by the result of the query from the first session (for convenience I’ve captured the selected value using the ‘column new_value’ option). Here’s the result of the query and its execution plan:


        ID
----------
         1

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     1 |     3 |     1   (0)| 00:00:01 |
|   1 |  FOR UPDATE                  |             |       |       |            |          |
|*  2 |   INDEX RANGE SCAN           | IDX_TAB1_PK |     1 |     3 |     0   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE            |             |     1 |     3 |            |          |
|   4 |     INDEX FULL SCAN (MIN/MAX)| IDX_TAB1_PK |     3 |     9 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"= (SELECT MIN("ID") FROM "TAB1" "TAB1"))

At this point the program issues instructions to repeat the query from a second session, then waits for you to press Return. When you run the same query from another session it’s going to see the data in read-consistent mode and try to select and lock the row where ID = 1, so the second session is going to hang waiting for the first session to commit or rollback.

Here’s the key question: what’s the second session going to return when you allow the first session to continue, delete the row it has selected, and commit ? Here’s the answer if you’re running 10.2.0.3 or 11.1.0.6 (which is what I happen to have easily available):

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

        ID
----------
         2

1 row selected.

Now, this seems perfectly reasonable to me – especially since I’ve read Tom Kyte’s notes on “write consistency” and seen the “rollback and restart” mechanism that kicks in when updates have to deal with data that’s changed since the start of the update. Session 2 had a (select for) update, and when it finally got to a point where it could lock the data it found that the read-consistent version of the data didn’t match the current version of the data so it restarted the statement at a new SCN. At the new SCN the current highest value was 2.

Now here’s what happened when I ran the test under 11.2.0.2:

SQL> select  id
  2  from    tab1
  3  where   id = (
  4              select  min(id)
  5              from    tab1
  6          )
  7  for update
  8  ;

no rows selected

The upgrade produces a different answer !

At first sight (or guess) it looks as if the query has run in two parts – the first part producing the min(id) of 1 using a read-consistent query block, with the second part then using the resulting “known value” to execute the outer select (shades of “precompute_subquery”) and restarting only the second part when it discovers that the row it has been waiting for has gone away.

It doesn’t really matter whether you think the old behaviour or the new behaviour is correct – the problem is that the behaviour has changed in a way that could silently produce unexpected results. Be careful if any of your code uses select for update with subqueries.

As a defensive measure you might want to change the code to use the serializable isolation level – that way the upgraded code will crash with Oracle error ORA-08177 instead of silently giving different answers:

SQL> alter session set isolation_level = serializable;

Session altered.

SQL> get afiedt.buf
  1  select  /*+ gather_plan_statistics */
  2          id
  3  from    tab1
  4  where   id = (
  5              select  min(id)
  6              from    tab1
  7          )
  8* for update
  9  /
from    tab1
        *
ERROR at line 3:
ORA-08177: can't serialize access for this transaction

It might be a way of avoiding this specific problem, of course, but it’s not a frequently used feature (the first pages of hits on Google are mostly about SQL Server) so who knows what other anomalies this change in isolation level might introduce.