Search

Top 60 Oracle Blogs

Recent comments

February 2011

Quis custodiet ipsos custodies-Nagios monitoring for Grid Control

I have a strange problem with my Grid Control 11.10.1.2 Management Server in a Solaris 10 zone. When restarted, the OMS will serve requests fine for about 2 to 4 hours and then “hang”. Checking the Admin Server console I can see that there are stuck threads. The same information is also recorded in the logs.

NB: the really confusing part about Grid Control 11.1 is the use of Weblogic-you thought you knew where the Grid Control logs where? Forget about what you knew about 10.2 and enter a different dimension :)

So to be able to react quicker to a hang of the OMS (or EMGC_OMS1 to be more precise) I set up nagios to periodically poll the login page.

I’m using a VM with OEL 5.5 64bit to deploy nagios to, the requirements are very moderate. The install process is well documented in the quickstart guide-I’m using Fedora as a basis. OEL 5.5 doesn’t have nagios 3 RPMs available, so I decided to use the source downloaded from nagios.org. The tarballs you need are nagios-3.2.3.tar.gz and nagios-plugins-1.4.15.tar.gz at the time of this writing.If you haven’t got a development environment, build it:

  • # yum install httpd
  • # yum install php
  • # yum install gcc glibc glibc-common
  • # yum install gd gd-devel
  • # yum install openssl-devel

From then on it’s as simple as copy-pasting from the quickstart guide. The only problem I had with the check_http plugin was the lack of openssl-devel. I initially built the plugins without “–with-openssl=/usr/include/openssl” flag. After executing the configure command again the build didn’t work for check_http (undefined symbol foo), but that could be fixed with a “make clean; make”. I just realised that my wordpress theme seems to combine two dashes into 1 – there is nothing I can do about that, sorry (frustrating in the case of the configure command etc)

For the remainder of this article I assume you built nagios with these arguments to configure:

./configure –with-command-group=nagcmd

The plugins have been built with these options:

./configure –with-openssl=/usr/include/openssl –with-nagios-user=nagios –with-nagios-group=nagios

This will install nagios to /usr/local/nagios which is fine by me-you’d obviously choose a different prefix when configuring for a production nagios server . Start the nagios server as per the quickstart guide using “service nagios start”.

With nagios up and running you can connect to the dashboard: http://nagios.example.com/nagios

You authenticate yourself using the nagiosadmin account and the password you supplied earlier to the htpasswd command.

Great! Your nagios environment is up and running. Next you need to add the OMS to the configuration. First of all I opted to change the example configuration-three steps are to be performed:

  • Modify the contacts
  • Create a check_oms command
  • Add the OMS to the list of monitored targets

Again, I should note that this setup is for monitoring 2 OMS hosts only, nothing else. I’m saying this because the way I add the targets is not the most elegant one. If you intend to add more targets to the nagios setup you should opt for a better approach which is commented out in the nagios.cfg file.

Modifying contact information

I would like to be informed in case something goes wrong. Nagios offers a wealth of notification methods, I’m limiting myself to email.

The file you’d like to modify with your favourite text editor is /usr/local/nagios/etc/objects/contacts.cfg

The most basic (but sufficient) way is to edite the nagiosadmin contact. Simply change the email address to your email address and save the file. NB: you may have to configure your local MTA and add a mail relay-ask your friendly sys admin how to do so.

Create the check_oms command

Before we can define it as a target in nagios, we need to tell nagios how to monitor the OMS. Nagios comes with a basic set of plugins, amongst which the check_http seems the most suitable. It needs to be compiled with the openssl-devel package (see above) since the OMS logon requires the https protocol.

Open /usr/local/nagios/etc/objects/commands.cfg with your favourite text editor and add a command such as this one:

define command{
command_name    check_oms
command_line    $USER1$/check_http -H $HOSTALIAS$ -f critical -w 5 -c 10 –ssl -p 7799 –url /em/console/logon/logon
}

Translated back to English this means that if the check_oms command is defined as a so called service check in nagios then the check_http script is called against the host defined by the host alias (we’ll define that in a minute) variable. Furthermore, if we receive http 302 codes (moved temporarily) I want the check to return a critical error instead of an OK. If my response time is > 5 seconds I want the service to emit a “warning” reply, and if it takes longer than 10 seconds than that’s critical. The remaining variables specify that I need to use SSL against port 7799 (default Grid Control port-change if yours is different) and the URL is /em/console/logon/logon. Don’t simply specify /em as the URL as that will silently redirect you to /em/console/logon/logon after a HTTP 302 message which doesn’t help in this case. You can run the command interactively on the nagios host. The check is in /usr/local/nagios/libexec; the “-v” option displays the HTTP traffic:

./check_http -H oms.example.com -f critical -w 5 -c 10 –ssl -p 7799 –url /em/console/logon/logon -v

[root@nagios libexec]# ./check_http -H oms.example.com -f critical -w 5 -c 10 –ssl -p 7799 –url /em/console/logon/logon -v
GET /em/console/logon/logon HTTP/1.1
User-Agent: check_http/v1.4.15 (nagios-plugins 1.4.15)
Connection: close
Host: oms.example.com:7799
https://oms.example.com:7799/em/console/logon/logon is 8671 characters
STATUS: HTTP/1.1 200 OK
**** HEADER ****
Date: Mon, 28 Feb 2011 10:27:14 GMT
Server: Oracle-Application-Server-11g
Set-Cookie: JSESSIONID=tJ0yNr4CgGf4gyTPJR4kKTzL2WBg1SFLQvh0ytrpC3Kgv9xqkDsF!-2069537441; path=/em; HttpOnly
X-ORACLE-DMS-ECID: 00074S^kp_dF8Dd_Tdd9ic0000B4000DkI
X-Powered-By: Servlet/2.5 JSP/2.1
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8
Content-Language: en
**** CONTENT ****
fe8
[content skipped...]
HTTP OK: HTTP/1.1 200 OK – 8671 bytes in 0.067 second response time |time=0.067383s;5.000000;10.000000;0.000000 size=8671B;;;0
[root@nagios libexec]#

Right- HTTP 200 and sub-second response time: I’m happy.

Create a new monitored target

This is the final step to be completed. I started off by copying the localhost.cfg file to oms.cfg and edited it. Below is a sample file with all comments and other verbose information removed:

define host{
use               linux-server
host_name         OMS_pilot
alias             oms.example.com
address           192.168.99.13
}

define service{
use                     generic-service
host_name               OMS_pilot
service_description     HTTP
is_volatile             0
check_period            24x7
max_check_attempts      10
normal_check_interval   1
retry_check_interval    1
contact_groups          admins
notification_options    w,u,c,r
notification_interval   960
notification_period     workhours
check_command           check_oms
}

I’m also using the check_ping command but that’s the default and not shown here. How does nagios know what server to execute the check against? That’s back in the command definition. Remember the -H $HOSTALIAS$ directive? Upon execution of the check, the value of the host’s alias configuration variable will be passed to the check_oms command. You should therefore ensure that the nagios host can resolve that host name, and I’d recommend using the FQDN as well.

The service check will execute the check_oms command against the host every minute 24×7. In case the service is critical, it will notify the contact group admins (which you edited in step 1) and send email during work hours (09:00 – 17:00 by default, defined in timeperiods.cfg.

The final bit where everything is tied together is the nagios.cfg file: add the definition for your host as in this example:

cfg_file=/usr/local/nagios/etc/objects/oms.cfg

Alternatively, if you would like to logically group your objects, you could create /usr/local/nagios/etc/servers and put all your server configuration files in there. Regardless what option you choose, the next step is to  reload the nagios service to reflect the current configuration.

(Ignore the warning-that’s a http 403 issue on another host …)

Happy monitoring!

Oracle Database Time Model Viewer in Excel 1

February 28, 2011 (Forward to the Next Post in the Series) Previously, I had written a couple of blog articles that showed how to build a reasonably usable Oracle Database Time Model Viewer using nothing more than a text file (containing a VBS script) and a dynamically generated web page that is displayed on a Windows [...]

Internal Views

When Oracle transforms a query it will sometimes report an internal view name in the execution plan – and I have seen people asking for help occasionally because they’ve tried to find the view name in dba_views and it hasn’t existed. So, just for reference, here’s a short list of the view names that might appear during query transformation but won’t exist in your system.

With each name I’ve put a brief comment of why they might appear:

index$_join$_#  Index hash join
vw_nso_#		Unnesting "IN" subquery
vw_nsq_#		Haven't seen one recently, but I think it's a variation on "IN" subqueries
vw_sq_#			Unnesting - possibly specific to correlated subqueries
vw_gbc_#		group by placement
vw_wif_#		Analytic functions (especially those generated by "remove aggregate subquery")
vm_nwvw_#		Note "VM" rather than "VW" - possibly related to complex view merging with distinct
vw_foj_#		Full Outer Join
vw_jf_????		Join Factorization
--
-- From Timur Akhmadeev (comment #2)
--
VW_CDG_%08X
VW_COL_%d          # Naming convention (somtimes) for columns used in generated views
VW_DAT_%08X        # distinct aggregate transform?
VW_DIG_%08X
VW_DIS_%08X        # distinct (similar to group by placement)?
VW_DTP_%08X
VW_FTG_%08X
VW_LT_%08X
VW_RIG_%08X
VW_RT_%08X
VW_%s_%d
VW_SJE_%08X
VW_%s_%s
VW_ST_%08X
VW_STJ_%08X
VW_TEX_%08X
--
-- From Tane Poder (comment #3)
--
VW_TE_#         Table expansion transformation
VW_DAG_#        Distinct aggregatation transform

Note: I’ve used ‘#’ to represent the possible appearance of a number, the items supplied by Timur Akhmedeev use the C “printf” conventions.

Originally I called for suggestions of the ones I’d failed to report – but the list is now much closer to completion. If you’d like to contribute any further names names, or fill in some of the missing causes I’ll add them to the list.

For reference notes for a couple of these cases:

Oracle11g: Zero Sized Unusable Indexes Part II (Nathan Adler)

In my previous post, I discussed how Oracle from 11g R2 onwards will automatically drop the segment and associated storage from unusable index objects. Mohamend Houri asked in the comments section the excellent question of just how useful this feature will be in real life cases when typically indexes are not left in an unusuable state for a [...]

Parallel DML - Conventional (non-direct-path) Inserts As Select

In a recent discussion I've mentioned that I thought to remember that the DML part of conventional load as select inserts will always be executed serially, even with parallel DML enabled and requesting parallel DML execution. It's important to understand in this context that this is not the same as the parallel query execution of the SELECT part, which is possible independently from the parallel DML part.

After that discussion I realized that it was quite some time ago that I tested this scenario, probably it was back then with some 10.2 version.

So I quickly put together a small test case that I ran on 11g versions and the results were quite surprising which motivated me to take a closer look.

New SQL and PL/SQL landing pages on my website…

I’ve added a couple of new landing pages on my website:

I think it makes it a bit easier to find stuff if you are not searching for specific keywords.

Cheers

Tim…

SQL Precedence UNION and MINUS, Converting to Full Outer Join Syntax

February 26, 2011 In a recent OTN thread in the SQL forum I found a SQL statement (see the thread for the original SQL statement) that the original poster (OP) wanted to transform into full outer join syntax, but was unsure how to start the transformation.  To get started with helping the OP, you first [...]

Organizing Chris Date's seminar in Ljubljana May 23-24 2011

I have invited Chris J. Date, an independent author, lecturer, researcher, and consultant of international renown, specializing in relational database technology, to have a 2 day seminar "How to Write Correct SQL and Know It: A Relational Approach to SQL". The 2-day seminar will take place in Ljubljana, Slovenia, on May 23-24 2011.

Chris is best known for his book "An Introduction to Database Systems" (eighth edition, Addison-Wesley, 2004), which has sold some 725,000 copies and is used by several hundred colleges and universities worldwide.

Oracle11g: Zero Sized Unusable Indexes (Zeroes)

Following on from my previous discussion on “Create On Demand” segments, Oracle 11g R2 has also introduced storage saving initiatives in relation to useable indexes.  Starting with a simple Oracle 10g example, we create a table and associated index:        If we now make the index unusable:        We notice that [...]

AWR Reports

A little thought for those who use OEM to generate AWR reports – if you generate the standard HTML form and then try posting it into (say) the OTN database forum when you want some help interpreting it, it won’t look very readable (even if you do remember to put the right tags before and after to reproduce it in fixed font).

So it’s a good idea to know how to produce the text format – which is what I do most of the time (especially since I often end up with a simple telnet or putty session into a client server). Take a look at $ORACLEHOME/rdbms/admin for all the scripts starting with “awr” – there’s quite a lot of them, and the number keeps growing. Apart from finding a script that will give you the standard AWR report in a good old-fashioned text format, you may that newer versions of Oracle include a few useful variations on the theme.

Here’s a list from the 11.2.0.2 home ($ORACLE_HOME/rdbms/admin) with (mostly) the one-line description from the start of file. I’ve put the most useful ones in the first list:

awrrpt.sql      -- basic AWR report
awrsqrpt.sql    -- Standard SQL statement Report

awrddrpt.sql    -- Period diff on current instance

awrrpti.sql     -- Workload Repository Report Instance (RAC)
awrgrpt.sql     -- AWR Global Report (RAC)
awrgdrpt.sql    -- AWR Global Diff Report (RAC)

awrinfo.sql     -- Script to output general AWR information

For most people the awrrpt.sql and awrsqrpt.sql are likely to be sufficient, but the “difference between two periods” can be very useful – especially if you do things like regularly forcing an extra snapshot at the start and end of the overnight batch so that you can (when necessary) find the most significant differences in behaviour between the batch runs on two different nights.

If you get into the ‘RAC difference report’ you’ll need a very wide page – and very good eyesight !

There are also a lot of “infrastructure and support” bits – some of the “input” files give you some nice ideas about how you can write your own code to do little jobs like: “run the most recent AWR report automatically”:

awrblmig.sql    -- AWR Baseline Migrate
awrload.sql     -- AWR LOAD: load awr from dump file
awrextr.sql     -- AWR Extract

awrddinp.sql    -- Get inputs for diff report
awrddrpi.sql    -- Workload Repository Compare Periods Report

awrgdinp.sql    -- Get inputs for global diff reports
awrgdrpi.sql    -- Workload Repository Global Compare Periods Report

awrginp.sql     -- AWR Global Input
awrgrpti.sql    -- Workload Repository RAC (Global) Report

awrinpnm.sql    -- AWR INput NaMe
awrinput.sql    -- Get inputs for AWR report 

awrsqrpi.sql    -- Workload Repository SQL Report Instance

I usually carry copies of the scripts with me when I’m troubleshooting in case I need them at client sites – sometimes I’m not allowed the privileges I really need to do rapid troubleshooting, but if I can persuade the DBA to give me execute privileges on package dbms_workload_repository and select privileges on a couple of the necessary tables and views then I can run the reports from an otherwise “minimal” account.

There are also a couple of deceptively named files that you might miss in 11.2:

spawrrac.sql  -- Server Performance AWR RAC report
spawrio.sql  -- AWR IO Intensity Report
spadvrpt.sql -- Streams Performance ADVisor RePorT

Although the initial letters in the names suggest that these files might fall in with statspack, they actually report from the AWR tables – however the first one (spawrrac.sql) was only a temporary measure, and prints out the warning message:

This script will be deprecated.  The official release of the Global AWR report is awrgrpt.sql

So if you’re using the spawrrac.sql – stop it.