Search

Top 60 Oracle Blogs

Recent comments

June 2011

Backups are forever

You should save this link in your bookmarks, for the next time you accidentally remove your database’s online redologs or overwrite some ASM disks:

 

Cache buffers chains latch contention troubleshooting using latchprofx.sql example

Laurent Demaret has written a good article about how he systematically troubleshooted cache buffers chains latch contention, starting from wait interface and drilling down into details with my latchprofx tool:

A common cause for cache buffers chains latch contention is that some blocks are visited and re-visited way too much by a query execution. This usually happens due to nested loops joins or FILTER loops retrieving many rows from their outer (driving) row sources and then visiting the inner row-source again for each row from driving row source. Once you manage to fix your execution plan (perhaps by getting a hash join instead of the loop), then the blocks will not be re-visited so much and the latches will be hammered much less too.

The moral of the story is that if you have latch contention in a modern Oracle database, you don’t need to start tweaking undocumented latching parameters, but reduce the latch usage instead. And Laurent has done a good job with systematically identifying the SQL that needs to be fixed.

Good stuff!

If you don’t know what LatchProfX is, read this:

 

OUG Scotland

The annual OUG Scotland conference takes place next Wednesday (8th June) at a familiar location to me - the Sun Microsystems site in Linlithgow where I spent a happy 6 months on contract many years back. I've always looked forward to this event but sadly will be beavering away at a client site in London. I recommend you make it if you can and, best of all, it's free

More information here.

Merge – argh!

I wonder if I ought to create an “Argh” category on my blog for articles about some of the nastier, and sometimes unbelievable, bugs in the Oracle code. Here’s one (that’s just about forgivable in terms of failing to test) published by Sokrates a couple of months ago.

If you’re using the merge command without including insert/delete options then Oracle will ignore check constraints. (What’s the most important requirement of a database – many people would say that it’s the correctness of the data.)

Why KScope?

Early this year, my friend Mike Riley from ODTUG asked me to write a little essay in response to the question, “Why Kscope?” that he could post on the ODTUG blog. He agreed that cross-posting would help the group reach more people, so I’ve reproduced my response to that question here. I’ll hope to see you at Kscope11 in Long Beach June 26–30. If you develop applications for Oracle systems, you need to be there.

MR: Why KScope?

Pirates and X-Men…

I started to write a quick review of the latest X-Men film and I realized I hadn’t written anything about that pirate film, so here goes a double whammy…

  • Pirates of the Caribbean: On Stranger Tides – This is just more of the same as far as I’m concerned. Some of the cast are back. Some new people have been brought in, but my overriding feeling is that I’ve seen it all before. It just doesn’t feel like they’ve brought anything new to the table here. That’s not to say it’s a bad film, I just feel like I could have stayed at home and watched a DVD of any of the previous films instead. This film is not going to sell you on the franchise, but it’s also not going to make you vow never to see another PotC film either. Good harmless fun, if a little uninspired.
  • X-Men: First Class – I liked the first three X-Men films. The Wolverine film was quite cool too. I was really unsure what I would think of this prequel. Sometimes stripping things back leaves them exposed. Sometimes it reveals something new. In my opinion X-Men: First Class really works well. I guess I had not really though about how little I knew (or cared) about Professor X and Magneto as characters. This film fleshes them out a bit more than the previous outings and it worked well for me. There are some really lame characters, like Banshee and Darwin, but I guess this also adds to plot as this is a very immature X-Men team. Probably my favorite character from this film is Azazel, who Bamfs like Nightcrawler, but is not so annoying. What I would give to be able to bamf and avoid plane flights. The cast have been signed up for 2 sequels to this prequel (provided this film makes money). I’m looking forward to seeing them.

Cheers

Tim…




Friday Philosophy – The Secret to Being a Good IT Manager

If you go into a book shop there will probably be a section on business and, if there is, there will almost certainly be a load of books on how to be a manager. Shelves and shelves of them. There is also a large and vibrant market in selling courses on management and aspects of management. I’ve been on a couple of such course and, if you can manage to be open minded whilst keeping a cynical edge, I think they can be useful.

However, I think I most of them are missing the key points and that if you can but hold on to the following extensive list of guiding principles you will be a good IT manager. Maybe even an excellent one :-) :

  1. Your top priority, at all times, is to see to the best interests of your people.
  2. Whatever you develop, be it code, databases, network, a team of support staff – User Acceptance is paramount.
  3. You must find ways to deal with other teams and your own management hierarchy in such a way as to be allowed to do (1) and (2).
  4. That’s it.
  5. OK, if pushed, I’d say Never Lie. Maybe that’s just personal though, it’s because I don’t have the memory, audacity or swiftness of mind to pull it off. By not lying I don’t have to try and construct what I said to who and why.

I’m sure people could cite some other hard rules like “you must be within budget” or “you need to get buy-in to your vision” but I don’t agree. Budgets can be negotiated and the difference between those deemed visionaries and those deemed fantasists seems to be to me down to success and luck. Luck is luck and for success I refer you to points 1 through 5.

OK, maybe a final rule is:

  • Never ask for or aim for something that is not realistic.

So, I am now able to develop my team and my application and not expect to be able to spend half the company profit on the fastest box out there, as it is not realistic.

There are a shed load of other things that I think are important to helping you be a good manager, you know, techniques and methods for improving things, but nothing else that is key.

And it’s such a simple, small list even I can aim for it.

The shame of it is that I don’t think it’s enough to be developed into a book or a course so I can’t sell the idea. That and I’ve gone and given it away in this blog. Also, though I feel I can give points 1,2 and 5 a good shot, point 3 is way beyond me…possibly because of point 5… So I am not a great manager.

I’m going to hide behind this stout wall now, with my hard hat on, and wait to be told how naive I am…

Update – A couple of weeks later, Kellyn on her DBA Kevlar blog put similar sentiments to looking after your guys, more from the employee’s perspective and far better covered

Why given so many of us feel this way and want things to be this way…are they not?

Formspider

While I was in Turkey a few weeks ago presenting the keynote at the Turkish Oracle User Group conference, I had a chat with one of the sponsors of the event who has a product that “does Web 2.0 for pl/sql specialists” – in other words, doesn’t require you to learn all the other transient technologies that get stuck around the edges of Oracle to produce application.

I said I’d take a look at it – but development really isn’t my area, which means I’m not really in a position to judge what they’ve done; so I thought I’d give it a brief mention here, in case anyone was looking for a way to move to Web 2.0 without throwing away their current Oracle investment. (I don’t usually consider doing adverts – but the idea was interesting, and I see that it’s got support from Paul Dorsey and that’s a reasonable recommendation in its own right.)

Index Rebuilds – A new Myth arrives

Just over 9 years ago now, before usenet died , I posted a list of common myths regarding the Oracle RDBMS. You can find that list and the ensuing discussion here. Number 2 in my list was

Free Space in an index is never reused.

At the time it was commonly thought that indexes needed regular rebuilding to avoid fragmentation. The best work I am aware of that describes why this is not the case is Richard Foote’s Index Internals: Rebuilding the Truth paper. Over time more and more people have come to quote this work and it is often cited in discussions on the forums at OTN and elsewhere.  Recently I have noticed a tendency for people to produce answers like the one in this answer. Namely

In Enterprise Manager, the section named “Segment Advisor” can to help you

This it seems to me is a fundamental misunderstanding of what the segment advisor is showing you, and sadly is my second myth rising anew from the dead. The official documentation on the segment advisor describes it as

Over time, updates and deletes on objects within a tablespace can create pockets of empty space that individually are not large enough to be reused for new data. This type of empty space is referred to as fragmented free space.

Objects with fragmented free space can result in much wasted space, and can impact database performance. The preferred way to defragment and reclaim this space is to perform an online segment shrink. This process consolidates fragmented free space below the high water mark and compacts the segment. After compaction, the high water mark is moved, resulting in new free space above the high water mark. That space above the high water mark is then deallocated. The segment remains available for queries and DML during most of the operation, and no extra disk space need be allocated.

You use the Segment Advisor to identify segments that would benefit from online segment shrink. Only segments in locally managed tablespaces with automatic segment space management (ASSM) are eligible. Other restrictions on segment type exist. For more information, see “Shrinking Database Segments Online”.

If a table with reclaimable space is not eligible for online segment shrink, or if you want to make changes to logical or physical attributes of the table while reclaiming space, then you can use online table redefinition as an alternative to segment shrink. Online redefinition is also referred to as reorganization. Unlike online segment shrink, it requires extra disk space to be allocated. See “Redefining Tables Online” for more information.

In other words the segment advisor is all about reclaiming space. But a B*-Tree index will always tend to have free space within it – and will nearly always be able to reuse that space. Fragmentation as described above, that is unusable free space, does not apply to B*-Tree indexes in general in Oracle (the point of my previous post was to show one situation where it can be an issue) . To run the segment advisor on an index then is fundamentally misguided. As the next section describes the advisor is really intended for objects that actually store data (most commonly heap tables).

The Segment Advisor generates the following types of advice:

  • If the Segment Advisor determines that an object has a significant amount of free space, it recommends online segment shrink. If the object is a table that is not eligible for shrinking, as in the case of a table in a tablespace without automatic segment space management, the Segment Advisor recommends online table redefinition.
  • If the Segment Advisor determines that a table could benefit from compression with the OLTP compression method, it makes a recommendation to that effect. (Automatic Segment Advisor only. See “Automatic Segment Advisor”.)
  • If the Segment Advisor encounters a table with row chaining above a certain threshold, it records that fact that the table has an excess of chained rows.

Please don’t lets reinvent the old myths – there’s plenty of room for new ones out there.

Why is my SYSAUX Tablespace so Big? Statistics_level=ALL

One of my most popular postings is about why your SYSTEM tablespace could be rather large. Recently I’ve had issues with a SYSAUX tablespace being considerably larger than I expected, so I thought I would do a sister posting on the reason.

The client I was working with at the time was about to go live with a new application and database. For various reasons I was a little anxious about how the Java application (the User Interface) would actually call the stored PL/SQL code I had helped develop. Initial workloads would be low and so I asked that the STATISTICS_LEVEL be set to ALL, so that bind variables (amongst other things) would be gathered. This is on version 10.2.0.4, btw, enterprise edition and 4-node RAC.

We went live, issues were encountered and resolved, the usual way these things work. Then, a few weeks in and when everything was still very “exciting” from a problem resolution perspective, I got an odd email from the DBA team. Would they like us to add another datafile to the SYSAUX tablespace. Huh? I checked. I’d been watching the size of our application’s tablespaces but not the others {well, I was not supposed to be a DBA and I was watching an awful lot of other things}. Our SYSAUX tablespace was around 160GB in size, having pretty much filled it’s 5th datafile. Why? I checked to see what was taking up the space in the tablespace:

 select * from
 (select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX'
 ORDER BY BLOCKS desc)
 where rownum < 40

OWNER              SEGMENT_NAME                                             SIZE_M
------------------ -------------------------------------------------- ------------
SYS                WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_3911            27,648
SYS                WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_3911         26,491
SYS                WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_3537            23,798
SYS                WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_3537         22,122
SYS                WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_4296            17,378
SYS                WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_4296         16,818
SYS                WRH$_ACTIVE_SESSION_HISTORY~WRH$_ACTIVE_14459270_3          136
                   911
SYS                WRH$_SQLSTAT~WRH$_SQLSTA_14459270_3911                       96
SYS                WRH$_SQLSTAT~WRH$_SQLSTA_14459270_3537                       72
SYS                WRH$_SQLSTAT~WRH$_SQLSTA_14459270_4296                       47
SYS                WRH$_LATCH_MISSES_SUMMARY_PK~WRH$_LATCH__14459270_           45
                   3537
SYS                I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST~                              41
SYS                WRH$_SYSMETRIC_SUMMARY~                                      40
SYS                WRH$_LATCH_MISSES_SUMMARY_PK~WRH$_LATCH__14459270_           37

As you can see, almost all the space is being taken up by WRH$_LATCH_CHILDREN and WRH$_LATCH_CHILDREN_PK partitions. They are massive compared to other objects. A quick goggle did not come up with much other than many hits just listing what is in SYSAUX and the odd person also seeing SYSAUX being filled up with these objects and suggested ways to clear down space, nothing about the cause.

I had a chat with the DBAs and we quickly decided that this was going to be something to do with AWR given the name of objects – “WRH$_” objects are the things underlying AWR. The DBA suggested my settings of 15 minute intervals and 35 day retention was too aggressive. I knew this was not the case, I’ve had more aggressive snapshot intervals and longer retention periods on far busier systems than this. I did not have access to Metalink at that point so I asked the DBAs to raise a ticket, which they duly did.

Oracle support cogitated for a couple of days and came back with the advice to reduce the retention period. Hmmmm. Via the DBA I asked Oracle support to explain why those objects were so large when I had not seen this issue on several other systems. Was it a bug? I had by now corroborated with a friend from a previous site with 5 minute snapshot intervals and two months retention period and their SYSAUX tablespace was about 10GB all in. I did not want to go changing things if we did not know it would fix the issue as we really wanted to stop the growth of SYSAUX as soon as possible, not just try a hunch.

As you probably realise from the title of this blog, the issue was not the snapshot interval or retention period but the STATISTICS_LEVEL=ALL. The one and only hit you get in metalink if you search on WRH$_LATCH_CHILDREN is note 874518.1. From V10.1.0.2 to V11.1.0.7 setting this parameter to ALL is known to create a lot of data about Latch children and not clear it down when the AWR data is purged (Bug 8289729). The advice was to change STATISTICS_LEVEL and make the snapshot interval larger. I’d suggest you just need to alter the STATISTICS_LEVEL, unless you really, really need that extra information gathered. It seemed to take Oracle Support an extra day or two to find that note for us. {I’ve since checked out Metalink directly to confirm all this}.

So with a known issue we felt confident that altering the initialisation parameter would solve the issue. It took a while for us to change the STATISTICS_LEVEL on the production system – Change Control for that site is rather robust. This allowed us to see some other impacts of this issue.

The mmon process which looks after AWR data was becoming a top session in our OEM performance screens. In particular, a statement with SQL id 2prbzh4qfms7u that inserted into the WRH$_LATCH_CHILDREN table was taking several seconds to run each time and was running quite often {I include the SQL ID as it may be the same on many oracle V10 systems as it is internal code}:


The internal SQL inserting into wrh$_latch_children was becoming demanding

This was doing a lot of IO, by far the majority of the IO on our system at the time – it was a new system and we had been able to tune out a lot of the physical IO.


The physical IO requirements and 15-20 second elapsed time made this out most demanding statement on the system

We also now started to have issues with mmon running out of undo space when it ran at the same time as our daily load. This was particularly unfortunate as it coincided in a period of “intense management interest” in the daily load…

What was happening to the size of the SYSAUX tablespace?

Enter the tablespace (or leave null)> sys

TS_NAME              ORD      SUM_BLKS        SUM_K  MAX_CHNK_K NUM_CHNK
-------------------- ----- ----------- ------------ ----------- --------
SYSAUX               alloc  58,187,904  465,503,232  33,553,408       14
                     free       10,728       85,824      21,504       20
SYSTEM               alloc     128,000    1,024,000   1,024,000        1
                     free       68,360      546,880     546,752        3

4 rows selected.

select * from
(select owner,segment_name||'~'||partition_name segment_name,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSAUX'
 ORDER BY BLOCKS desc)
where rownum < 40

OWNER    SEGMENT_NAME                                                     SIZE_M
-------- ------------------------------------------------------------ ----------
SYS      WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_6201                     30262
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_5817                     29948
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_5435                     28597
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_4675                     28198
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_3911                     27648
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_5817                  27144
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_6585                     26965
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_6201                  26832
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_4675                  26741
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_3911                  26491
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_4296                     26307
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_5435                  26248
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_4296                  25430
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_6585                  25064
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_5058                     24611
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_5058                  23161
         WRH$_LATCH_CHILDREN~WRH$_LATCH__14459270_6966                      9209
         WRH$_LATCH_CHILDREN_PK~WRH$_LATCH__14459270_6966                   8462
         WRH$_SYSMETRIC_SUMMARY~                                             152
         WRH$_ACTIVE_SESSION_HISTORY~WRH$_ACTIVE_14459270_3911               136
         WRH$_SQLSTAT~WRH$_SQLSTA_14459270_3911                               96

@sysaux_conts

OWNER                          OBJ_PART_NAME                                SIZE_M
------------------------------ ---------------------------------------- ----------
SYS                            WRH$_LATCH_CHILDREN-WRH                  231745.063
SYS                            WRH$_LATCH_CHILDREN_PK-WRH               215573.063
SYS                            WRH$_SQLSTAT-WRH                           711.0625
SYS                            WRH$_LATCH_MISSES_SUMMARY_PK-WRH           439.0625
SYS                            WRH$_ACTIVE_SESSION_HISTORY-WRH            437.0625
SYS                            WRH$_LATCH_PARENT-WRH                      292.0625
SYS                            WRH$_LATCH-WRH                             276.0625
SYS                            WRH$_LATCH_MISSES_SUMMARY-WRH              273.0625
SYS                            WRH$_SEG_STAT-WRH                          268.0625
SYS                            WRH$_LATCH_PARENT_PK-WRH                   239.0625
SYS                            WRH$_SYSSTAT_PK-WRH                        237.0625

Yes, that is close to half a terabyte of SYSAUX and it is all used, more partitions have appeared and the total size of the largest segments in SYSAUX show how WRH$_LATCH_CHILDREN and WRH$_LATCH_CHILDREN_PK make up the vast majority of the space used.

Shortly after, we finally got permission to change the live system. The impact was immediate, mmon dropped from being the most demanding session, that SQL code dropped down the rankings and the issues with running out of undo ceased.

I was anxious to see if the old data got purged, as the Metalink note had suggested the data would not be purged. Thankfully, that was not the case. The space was slowly released as normal purging of data outside the retention period took place and after just over a month, the SYSAUX tablespace contained a lot less information and was mostly free space:

OWNER      OBJ_PART_NAME                                SIZE_M
---------- ---------------------------------------- ----------
SYS        WRH$_LATCH_MISSES_SUMMARY_PK-WRH           512.0625
SYS        WRH$_LATCH_MISSES_SUMMARY-WRH              350.0625
SYS        WRH$_LATCH-WRH                             304.0625
SYS        WRH$_SQLSTAT-WRH                           280.0625
SYS        WRH$_LATCH_PK-WRH                          259.0625
SYS        WRH$_SYSSTAT_PK-WRH                        247.0625
SYS        WRH$_SERVICE_STAT_PK-WRH                   228.0625
SYS        WRH$_PARAMETER_PK-WRH                      201.0625
SYS        WRH$_PARAMETER-WRH                         169.0625
SYS        WRH$_SYSSTAT-WRH                           169.0625
SYS        WRH$_SEG_STAT-WRH                          161.0625
SYS        WRH$_SYSTEM_EVENT_PK-WRH                   156.0625
SYS        WRH$_SYSMETRIC_SUMMARY-                         152
SYS        WRH$_SYSTEM_EVENT-WRH                      133.0625
SYS        WRH$_SERVICE_STAT-WRH                      123.0625
SYS        WRH$_ACTIVE_SESSION_HISTORY-WRH            115.0625

TS_NAME              ORD      SUM_BLKS        SUM_K  MAX_CHNK_K NUM_CHNK
-------------------- ----- ----------- ------------ ----------- --------
SYSAUX               alloc  58,251,904  466,015,232  33,553,408       15
                     free   57,479,400  459,835,200   4,063,232    1,208
SYSTEM               alloc     128,000    1,024,000   1,024,000        1
                     free       68,048      544,384     544,320        2

Now, how do we get that space back? I left that with the DBA team to resolve.

Oh, one last thing. I mentioned the above to a couple of the Oaktable lot in the pub a few weeks back. Their instant response was to say “You set STATISTICS_LEVEL to ALL on a live system?!? You are mad!”
:-)

{Update, I’ve just spotted this posting by Colbran which is related. Last time I googled this I just got a stub with no information}