Here’s a note I’ve been meaning to research and write up for more than 18 months – every since Dion Cho pinged a note I’d written about the effects of partitioning because of a comment it made about the “2% small table threshold”.
It has long been an item of common knowledge that Oracle has a “small table threshold” that allows for special treatment of data segments that are smaller than two percent of the size of the buffer cache, viz:
If a table is shorter than the threshold then it is read to the midpoint of the cache (just like any other block read) but – whether by accident or design – the touch count (x$bh.tch) is not set and the table will fall off the LRU end of the buffer cache fairly promptly as other objects are read into the buffer. Such a tablescan would be recorded under the statistic “table scans (short tables)”.
Then, in July 2009, Dion Cho decided to check this description before repeating it, and set about testing it on Oracle 10gR2 – producing some surprising results and adding another item to my to-do list. Since then I have wanted to check his conclusions, check whether the original description had ever been true and when (or if) it had changed.
As a simple starting point, of course, it was easy to check the description of the relevant (hidden) parameter to see when it changed:
8.1.7.4 _small_table_threshold threshold level of table size for forget-bit enabled during scan 9.2.0.4 _small_table_threshold threshold level of table size for direct reads 11.2.0.1 _small_table_threshold lower threshold level of table size for direct reads
This suggests that the behaviour might have changed some time in 9i (9.2.0.4 happened to be the earliest 9i listing of x$ksppi I had on file) – so I clearly had at least three major versions to check.
The behaviour of the cache isn’t an easy thing to test, though, because there are a number of special cases to consider – in particular the results could be affected by the positioning of the “mid-point” marker (x$kcbwds.cold_hd) that separates the “cold” buffers from the “hot” buffers. By default the hot portion of the default buffer is 50% of the total cache (set by hidden parameter _db_percent_hot_default) but on instance startup or after a “flush buffer cache” there are no used buffers so the behaviour can show some anomalies.
So here’s the basic strategy:
Here’s some sample code:
create table t_15400 pctfree 99 pctused 1 as with generator as ( select --+ materialize rownum id from dual connect by rownum <= 10000 ) select rownum id, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 15400 ; create index t_15400_id on t_15400(id); begin dbms_stats.gather_table_stats( ownname => user, tabname =>'T_15400', estimate_percent => 100, method_opt => 'for all columns size 1' ); end; / select object_name, object_id, data_object_id from user_objects where object_name in ( 'T_300', 'T_770', 'T_1540', 'T_3750', 'T_7700', 'T_15400', 'T_15400_ID' ) order by object_id ; select /*+ index(t) */ max(small_vc) from t_15400 t where id > 0 ;
The extract shows the creation of just the last and largest table I created and collected statistics for – and it was the only one with an index. I chose the number of blocks (I’ve rigged one row per block) because I had set up a db_cache_size of 128MB on my 10.2.0.3 Oracle instance and this had given me 15,460 buffers.
As you can see from the query against user_objects my test case included tables with 7,700 rows (50%), 3,750 rows (25%), 1,540 rows (10%), 770 rows (5%) and 300 rows (2%). (The number in brackets are the approximate sizes of the tables – all slightly undersized – relative to the number of buffers in the default cache).
Here’s the query that I then ran against x$bh (connected as sys from another session) to see what was in the cache (the range of values needs to be adjusted to cover the range of object_id reported from user_objects):
select obj, tch, count(*) from x$bh where obj between 77710 and 77720 group by obj, tch order by count(*) ;
After executing the first index range scan of t_15400 to fill the cache three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75855 0 1 75854 0 1 75853 0 1 75851 0 1 75850 0 1 75849 0 1 75852 0 1 75855 2 9 -- Index blocks, touch count incremented 75855 1 18 -- Index blocks, touch count incremented 75854 1 11521 -- Table blocks, touch count incremented
Then after three tablescans, at 4 second intervals, of the 7,700 block table:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 -- segment header of 7700 table, touch count incremented each time 75855 0 1 75854 0 1 75852 0 1 75849 0 1 75850 0 1 75851 0 1 75855 2 9 75855 1 10 75853 0 3991 -- lots of blocks from 7700 table, no touch count increment 75854 1 7538
Then repeating the tablescan of the 3,750 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75851 0 1 75852 3 1 -- segment header block, touch count incremented each time 75849 0 1 75850 0 1 75855 2 9 75855 1 10 75853 0 240 75852 0 3750 -- table completely cached - touch count not incremented 75854 1 7538
Then repeating the tablescan of the 1,540 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75851 3 1 -- segment header block, touch count incremented each time 75849 0 1 75850 0 1 75852 3 1 75855 2 9 75855 1 10 75853 0 149 75851 2 1540 -- Table fully cached, touch count incremented but only to 2 75852 0 2430 75854 1 7538
Then executing the tablescan of the 770 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75850 3 1 -- segment header block, touch count incremented each time 75849 0 1 75851 3 1 75852 3 1 75854 0 1 75855 2 9 75855 1 10 75851 0 69 75853 0 149 75850 2 770 -- Table fully cached, touch count incremented but only to 2 75851 2 1471 75852 0 1642 75854 1 7538
Finally executing the tablescan of the 300 block table three times:
OBJ TCH COUNT(*) ---------- ---------- ---------- 75853 3 1 75855 0 1 75854 0 1 75850 3 1 75852 3 1 75851 3 1 75855 2 9 75855 1 10 75851 0 69 75850 0 131 75853 0 149 75849 3 301 -- Table, and segment header, cached and touch count incremented 3 times 75850 2 639 75852 0 1342 75851 2 1471 75854 1 7538
This set of results on its own isn’t conclusive, of course, but the indications for 10.2.0.3 are:
I can’t state with any certainty where the used and recycled buffers might be, but since blocks from the 3750 tablescan removed the blocks from the 7700 tablescan, it’s possible that “large” tablescans do somehow go “to the bottom quarter” of the LRU.
There also some benefit in checking the statistics “table scans (short)” and “table scans (long)” as the tests run. For the 2% (300 block) table I recorded 3 short tablescans; for the tables in the 2% to 10% range (770 and 1540) I recorded one long and two short (which is consistent with the touch count increment of 2 – the first scan was expected to be long, but the 2nd and 3rd were deemed to be short based on some internal algorithm about the tables being fully cached); finally for the tables above 10% we always got 3 long tablescans.
But as it says in the original note on small partitions – there are plenty of questions still to answer:
I’ve quoted the 2% as the fraction of the db_cache_size – but we have automatic SGA management in 10g, automatic memory management in 11g, and up to eight different cache sizing parameters in every version from 9i onwards. What figure is used as the basis for the 2%, and is that 2% of the blocks or 2% of the bytes, and if you have multiple block sizes does each cache perhaps allow 2% of its own size.
And then, in 11g we have to worry about automatic direct path serial tablescans – and it would be easy to think that the “_small_table_threshold” may have been describing that feature since (at least) 9.2.0.4 if its description hadn’t changed slightly for 11.2 !
So much to do, so little time — but at least you know that there’s something that needs careful investigation if you’re planning to do lots of tablescans.
Footnote: Having written some tests, it’s easy to change versions. Running on 8.1.7.4 and 9.2.0.8, with similar sized caches, I could see that the “traditional” description of the “small_table_threshold” was true – a short tablescan was anything less 2% of the buffer cache, long tablescans were (in effect) done using just a window of “db_file_multiblock_read_count” buffers, and in both cases the touch count was never set (except for the segment header block).
Once in while I work on systems that are new to me and no one is around to explain the configuration to me. In these cases, I’ve often wanted a way to clearly figure out which running database is using which ORACLE_HOME, so it was fun to see Tanel Poder’s recent posted on how to do this on LINUX and SOLARIS. Here is a quick summary of LINUX and SOLARIS plus HPUX and AIX as well:
Solaris and Linux
$ pgrep -lf _pmon_ 12586 ora_pmon_ee10204 ORACLE_SID is ee10204 $ pwdx 12586 12586: /u01/oracle/10.2.0.4/ee1/dbs ORACLE_HOME is /u01/oracle/10.2.0.4/ee1 |
HPUX
$ ps -ef | grep pmon ora1024 25611 1 0 Mar 21 ? 0:24 ora_pmon_itanic10 ORACLE_SID is itanic10 $ pfiles 25611 | grep bin 25611: /opt/ora1024/app/db_1/bin/oracle ORACLE_HOME is /opt/ora1024/app/db_1 |
AIX
$ ps -ef | grep pmon ora1024 262314 1 0 Mar 23 - 0:12 ora_pmon_mercury ORACLE_SID is mercury $ ls -l /proc/262314/cwd lr-x------ 2 ora1024 dba 0 Mar 23 19:31 cwd -> /data/opt/app/product/10.2.0.4/db_1/dbs/ ORACLE_HOME is /data/opt/app/product/10.2.0.4/db_1 |
A lot of people have asked me whether there’s some sort of index or “table of contents” of my TPT scripts (there’s over 500 scripts in the tpt_public.zip file – http://tech.e2sn.com/oracle-scripts-and-tools )
I have planned to create such index for years, but never got to it. I probably never will :) So a good way to extract the descriptions of some scripts is this (run the command in the directory where you extracted my scripts to):
A lot of people have asked me whether there’s some sort of index or “table of contents” of my TPT scripts (there’s over 1000 in my TPT script repo right now).
I have planned to create such index for years, but never got to it. I probably never will :) So a good way to extract the descriptions of some scripts is to just grep for Purpose: in the scripts directory:
A lot of people have asked me whether there’s some sort of index or “table of contents” of my TPT scripts (there’s over 1000 in my TPT script repo right now).
I have planned to create such index for years, but never got to it. I probably never will :) So a good way to extract the descriptions of some scripts is to just grep for Purpose: in the scripts directory:
The registration for the seminar is now open and there is a substantial early bird registration discount. Beside this the members of Slovenian, Croatian and Serbian Oracle User Group have additional discount. Please see the details on www.dbprof.com about the seminar registration.
I tried really hard to organize this unique event at the affordable prices for everybody especially in times when the financial crisis is still striking the budget for education.
If you follow me on Twitter or Facebook, you will know I fell down the stairs last night. My office is upstairs and I often put mail on the stairs to remind me to take it up. I also have a habit of walking around the house with the lights off in the evening. These two factors combined beautifully yesterday and resulted in me launching myself down the stairs with a couple of empty mugs in my hands. The initial assessment was left knee and right elbow were sore and I had some small cuts on my hands and arms from the shattered mugs in my hands. As the adrenaline wore off I noticed more cuts and various bits of my body started to ache.
Everything is OK this morning. My left knee still feels a bit dodgy, but all the cuts were so fine they are barely noticeable now. Apart from that, it’s just an all-over body ache similar to minor whiplash. Fun, fun, fun…
Cheers
Tim…
If you follow me on Twitter or Facebook, you will know I fell down the stairs last night. My office is upstairs and I often put mail on the stairs to remind me to take it up. I also have a habit of walking around the house with the lights off in the evening. These two factors combined beautifully yesterday and resulted in me launching myself down the stairs with a couple of empty mugs in my hands. The initial assessment was left knee and right elbow were sore and I had some small cuts on my hands and arms from the shattered mugs in my hands. As the adrenaline wore off I noticed more cuts and various bits of my body started to ache.
Everything is OK this morning. My left knee still feels a bit dodgy, but all the cuts were so fine they are barely noticeable now. Apart from that, it’s just an all-over body ache similar to minor whiplash. Fun, fun, fun…
Cheers
Tim…
First Lord’s Fury is the sixth (and final) book in the Codex Alera series by Jim Butcher. The book follows on from where the Princep’s Fury ended.
This book returns to the juggernaut pace of Captains Fury. I mentioned previously that some parts of the series were quite predictable, but that didn’t detract from the enjoyment and expectation. That is also true of this final book. You are pretty sure where you would like things to end up. The question is, how are you going to get there?
Part of me is very sad it’s over and part of me is relieved as I don’t think I could cope if it carried on. It’s an awesome series, but I quite draining because it is so intense. If this were a 38+ series like Discworld, I think I would have to retire on medical grounds…
It’s about 4 months until the next Dresden Files book is released, so I think I will have a rest and maybe look at the Earth’s Children series, suggested by @Boneist.
Cheers
Tim…
First Lord’s Fury is the sixth (and final) book in the Codex Alera series by Jim Butcher. The book follows on from where the Princep’s Fury ended.
This book returns to the juggernaut pace of Captains Fury. I mentioned previously that some parts of the series were quite predictable, but that didn’t detract from the enjoyment and expectation. That is also true of this final book. You are pretty sure where you would like things to end up. The question is, how are you going to get there?
Part of me is very sad it’s over and part of me is relieved as I don’t think I could cope if it carried on. It’s an awesome series, but I quite draining because it is so intense. If this were a 38+ series like Discworld, I think I would have to retire on medical grounds…
It’s about 4 months until the next Dresden Files book is released, so I think I will have a rest and maybe look at the Earth’s Children series, suggested by @Boneist.
Cheers
Tim…
Recent comments
1 year 45 weeks ago
2 years 5 weeks ago
2 years 9 weeks ago
2 years 10 weeks ago
2 years 15 weeks ago
2 years 36 weeks ago
3 years 4 weeks ago
3 years 34 weeks ago
4 years 18 weeks ago
4 years 18 weeks ago