Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Step-By-Step SLOB Installation and Quick Test Guide for Amazon RDS for Oracle.

Before I offer the Step-By-Step guide, I feel compelled to answer the question that some exceedingly small percentage of readers must surely have in mind–why test with SLOB? If you are new to SLOB (obtainable here) and wonder why anyone would test platform suitability for Oracle with SLOB, please consider the following picture and read this blog post.

https://kevinclosson.files.wordpress.com/2017/07/asktom1.png?w=998 998w, https://kevinclosson.files.wordpress.com/2017/07/asktom1.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/asktom1.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/asktom1.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

SLOB Is How You Test Platforms for Oracle Database.

Simply put, SLOB is the right tool for testing platform suitability for Oracle Database. That means, for example, testing Oracle Database I/O from an Amazon RDS for Oracle instance.

Introduction

This is a simple 9-step, step-by-step guide for installing, loading and testing a basic SLOB deployment on a local server with connectivity to an instance of Amazon RDS for Oracle in Amazon Web Services. To show how simple SLOB deployment and usage is, even in a DBaaS scenario, I chose to base this guide on a t2.micro instance. The t2.micro instance type is eligible for free-tier usage.

Step One

The first step in the process is to create your t2.micro Amazon Web Services EC2 instance. Figures 1 and 2 show the settings I used for this example.

Figure 1https://kevinclosson.files.wordpress.com/2017/07/console1.png?w=110 110w, https://kevinclosson.files.wordpress.com/2017/07/console1.png?w=220 220w" sizes="(max-width: 444px) 100vw, 444px" />

Figure 1: Create a Simple t2.micro EC2 Instance

 

Figure 2https://kevinclosson.files.wordpress.com/2017/07/console2.png?w=85 85w, https://kevinclosson.files.wordpress.com/2017/07/console2.png?w=170 170w" sizes="(max-width: 450px) 100vw, 450px" />

Figure 2: Configure a Simple EC2 Instance

Step Two

Obtain the SLOB distribution file from the SLOB Resources Page. Please note the usage of the md5sum(1) (see Figure 3) command to verify the contents are correct before performing the tar archive extraction. You’ll notice the SLOB Resources Page cites the check sums as a way to ensure there is no corruption during downloading.

After the tar archive is extracted, simply change directories into the wait_kit directory and execute make(1) as seen in Figure 3.

Figure 3https://kevinclosson.files.wordpress.com/2017/07/pic1.png?w=998 998w, https://kevinclosson.files.wordpress.com/2017/07/pic1.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic1.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic1.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 3: Install SLOB. Create Trigger Mechanism.

Step Three

In order to connect to your Amazon RDS for Oracle instance, you’ll have to configure SQL*Net on your host. In this example I have installed Oracle Database 11g Express Edition and am using the client tools from that ORACLE_HOME.

Figure 4 shows how to construct a SQL*Net service that will offer connectivity to your Amazon RDS for Oracle instance. The HOST assignment is the Amazon RDS for Oracle endpoint which can be seen in the instance details portion of the RDS page in the AWS Console. After configuring SQL*Net is is good to test the connectivity with the tnsping command–also seen in Figure 4.

Figure 4https://kevinclosson.files.wordpress.com/2017/07/pic2.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic2.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic2.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic2.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 4: Configure Local Host SQL*Net tnsnames.ora File.

 

Step Four

As per the SLOB documentation you must create a tablespace in which to store SLOB objects. Figure 5 shows how the necessary Oracle Managed Files parameter is already set in the Amazon RDS for Oracle instance. This is because Amazon RDS for Oracle is implemented with Oracle Managed Files.

Since the Oracle Managed Files parameter is set the only thing you need to do is execute the ~SLOB/misc/ts.sql SQL script. This script will create a bigfile tablespace and set it up with autoextend in preparation for data loading.

Figure 5https://kevinclosson.files.wordpress.com/2017/07/pic3.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic3.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic3.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic3.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 5: Create a Tablespace to House SLOB Table and Index Segments.

Step Five

This step is not absolutely necessary but is helpful. Figure 6 shows the size of the database buffers in the SGA (in bytes). The idea behind generating SLOB IOPS is to have an active data set larger than the SGA. This is all covered in detail in the SLOB documentation.

Once you know the size of your SGA database buffer pool you will be able to calculate the minimum number of SLOB schemas that need to be loaded. Please note, SLOB supports two schema models as described in the SLOB documentation. The model I’ve chosen for this example is the Multiple Schema Model. By default, the slob.conf file has a very small SCALE setting (80MB). As such, each SLOB schema is loaded with a table that is 80MB in size. There is a small about of index overhead as well.

Since the default slob.conf file is configured to load only 80MB per schema, you need to calculate how many schemas are needed to saturate the SGA buffer pool. The necessary math for a default slob.conf (SCALE=80MB) with an SGA buffer pool of roughly 160MB is shown in Figure 6. The math shows that the SGA will be saturated with only 2 SLOB schemas loaded. Any number of SLOB schemas beyond this will cause significant physical random I/O. For this example I loaded 8 schemas as shown later in this blog post.

Figure 6https://kevinclosson.files.wordpress.com/2017/07/pic4.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic4.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic4.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic4.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 6: Simple Math to Determine Minimum SLOB SCALE for IOPS Testing.

Step Six

Figure 7 shows in yellow highlight all of the necessary edits to the default slob.conf file. As the picture shows, I chose to generate AWR reports by setting the DATABASE_STATISTICS_TYPE parameter as per the SLOB documentation.

In order to direct the SLOB test program to connect to your Amazon RDS for Oracle instance you’ll need to set the other four parameters highlighted in yellow in Figure 7. This is also covered in the SLOB documentation. As seen in Figure 4, I configured a SQL*Net service called slob. To that end, Figure 7 shows the necessary parameter assignments.

The bottom two parameters are Amazon RDS for Oracle connectivity settings. The DBA_PRIV_USER parameter in slob.conf maps to the master user of your Amazon RDS for Oracle instance. The SYSDBA_PASSWD parameter needs to be set to your Amazon RDS for Oracle master user password.

#000000;" src="https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=500" alt="Figure 7" srcset="https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=500 500w, https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic5.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 7: Edit the slob.conf File.

Step Seven

At this point in the procedure it is time to load the SLOB tables. Figure 8 shows example output of the SLOB setup.sh data loader creating and loading 8 SLOB schemas.

Figure 8https://kevinclosson.files.wordpress.com/2017/07/pic6.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic6.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic6.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic6.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 8: Execute the setup.sh Script to Load the SLOB Objects.

After the setup.sh script exits, it is good practice to view the output file as per setup.sh command output. Figure 8 shows that SLOB expected to load 10,240 blocks of table data in each SLOB schema and that this was, indeed, the amount loaded (highlighted in yellow).

Figure 9https://kevinclosson.files.wordpress.com/2017/07/pic7.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic7.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic7.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic7.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 9: Examine Data Loading Log File for Possible Errors.

Step Eight

Once the data is loaded, it is time to run a SLOB physical I/O test. Figures 10 and 11 show the command output from the runit.sh command.

Figure 10https://kevinclosson.files.wordpress.com/2017/07/pic8.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic8.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic8.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic8.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 10: Execute the runit.sh Script to Commence a Test.

 

Figure 11https://kevinclosson.files.wordpress.com/2017/07/pic8-2.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic8-2.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic8-2.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic8-2.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 11: Final runit.sh Output. A Successful Test.

Step Nine

During the SLOB physical I/O testing I monitored the throughput for the instance in the RDS page of the AWS console. Figure 12 shows that SLOB was driving the instance to perform a bit over 3,000 physical reads per second. However, it is best to view Oracle performance statistics via either AWR or STATSPACK. As an aside, the default database statistics type in SLOB is STATSPACK.

Figure 12https://kevinclosson.files.wordpress.com/2017/07/pic9.png?w=1000 1000w, https://kevinclosson.files.wordpress.com/2017/07/pic9.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic9.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic9.png?w=768 768w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 12: Screen Shot of AWS Console. RDS Performance Metrics.

Finally, Figure 13 shows the Load Profile section of the AWR report created during the SLOB test executed in Figure 10. The internal Oracle statistics agree with the RDS monitoring (Figure 12) because Amazon RDS for Oracle is implemented with the Oracle Database initialization parameter filesystemio_options set to “setall”. As such, Oracle Database uses Direct and Asynchronous I/O. Direct I/O on a file system ensures the precise amount of data Oracle Database is requesting is returned in each I/O system call.

#000000;" src="https://kevinclosson.files.wordpress.com/2017/07/pic10.png?w=500" alt="Figure 13" srcset="https://kevinclosson.files.wordpress.com/2017/07/pic10.png?w=500 500w, https://kevinclosson.files.wordpress.com/2017/07/pic10.png?w=150 150w, https://kevinclosson.files.wordpress.com/2017/07/pic10.png?w=300 300w, https://kevinclosson.files.wordpress.com/2017/07/pic10.png 734w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 13: Examine the AWR file Created During the Test.

Summary

Testing I/O on RDS for Oracle cannot be any simpler, nor more accurate than with SLOB. I hope this post convinced you of the former and testing will reveal the latter.

 

Filed under: oracle

Linux memory usage

One of the principal important configuration settings for running an Oracle database is making appropriate use of memory. Sizing the memory regions too small leads to increased IO, sizing the memory regions too big leads to inefficient use of memory and an increase in memory latency most notably because of swapping.

On Linux, there is a fair amount of memory information available, however it is not obvious how to use that information, which frequently leads to inefficient use of memory, especially in today’s world of consolidation.

The information about linux server database usage is available in /proc/meminfo, and looks like this:

$ cat /proc/meminfo
MemTotal:        3781616 kB
MemFree:          441436 kB
MemAvailable:    1056584 kB
Buffers:             948 kB
Cached:           625888 kB
SwapCached:            0 kB
Active:           500096 kB
Inactive:         447384 kB
Active(anon):     320860 kB
Inactive(anon):     8964 kB
Active(file):     179236 kB
Inactive(file):   438420 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048572 kB
SwapFree:        1048572 kB
Dirty:                 4 kB
Writeback:             0 kB
AnonPages:        320644 kB
Mapped:           127900 kB
Shmem:              9180 kB
Slab:              45244 kB
SReclaimable:      26616 kB
SUnreclaim:        18628 kB
KernelStack:        3312 kB
PageTables:         6720 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1786356 kB
Committed_AS:     767908 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       13448 kB
VmallocChunk:   34359721984 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
CmaTotal:          16384 kB
CmaFree:               4 kB
HugePages_Total:    1126
HugePages_Free:     1126
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:       65472 kB
DirectMap2M:     4128768 kB

No matter how experienced you are, it’s not easy to get a good overview just be fetching this information. The point is these figures are not individual memory area’s which you simply can add up to understand to the total memory used by linux. Not even all figures are in kB (kilobyte), the HugePages values are in number of pages.

In fact, there is no absolute truth that I can find that gives a definite overview. Here is a description of what I think are the relevant memory statistics in /proc/meminfo:

MemFree: memory not being used, which should be low after a certain amount of time. Linux strives for using as much memory as much as possible for something useful, most notably as cache. If this number remains high, there is ineffective use of memory.
KernelStack: memory being used by the linux kernel.
Slab: memory being used by the kernel for caching data structures.
SwapCached: memory being used as a cache for memory pages being swapped in and out.
Buffers: memory being used as an IO buffer for disk blocks, not page caching, and should be relatively low.
PageTables: memory used for virtual to physical memory address translation.
Shmem: memory allocated as small pages shared memory.
Cached: memory used for caching pages.
Mapped: memory allocated for mapping a file into an address space.
AnonPages: memory allocated for mapping memory that is not backed by a file (“anonymous”).
Hugepagesize: the size for huge pages blocks. Valid choices with current modern intel Xeon CPUs are 2M or 1G. The Oracle database can only use 2M HugePages on linux.
HugePages_Total: total number of pages explicitly allocated as huge pages memory.
HugePages_Rsvd: total number of pages allocated as huge pages memory, but not yet allocated (and thus reported as free).
HugePages_Free: total number of pages available as huge pages memory, includes HugePages_Rsvd.

Based on information in several blogposts and experimenting with the figures, I came up with this formula to get an overview of used memory. This is not an all-conclusive formula, my tests so far get me within 5% of what Linux is reporting as MemTotal.

Warning: the text: “Another thing is that in most cases when the system has swapped out, ‘Cached’ (minus ‘Shmem’ and ‘Mapped’) gets negative, which I currently can’t explain.” is to true anymore!
By dividing Cached memory into Shmem and Cached+Mapped memory, there is no negative value anymore! I can’t find a way to make a distinction between true ‘Cached’ memory, meaning pages cached without any process attached purely for the sake of reusing them so they do not need to be physically read again, and true Mapped pages, meaning pages that are mapped into a process address space. I know there is a value ‘Mapped’, but I can’t work out reliably how to make the distinction between cache and true mapped. Maybe there even isn’t one.

This is that formula:

MemFree
+ KernelStack
+ Buffers
+ PageTables
+ AnonPages
+ Slab
+ SwapCached
+ Cached – Shmem
+ Shmem
+ HugePages used (HugePages_Total-Hugepages_Free)*Hugepagesize
+ Hugepages rsvd (Hugepages_Rsvd*Hugepagesize)
+ Hugepages free (Hugepages_Free-Hugepages_Rsvd)*Hugepagesize
————————————————————-
= Approximate total memory usage

In order to easily use this, I wrote a shell script to apply this formula to your Linux system, available on gitlab: https://gitlab.com/FritsHoogland/memstat.git. You can use the script to get a (quite wide) overview every 3 seconds by running ./memstat.sh, or you can get an overview of the current situation by running ./memstat.sh –oneshot.

This is how a –oneshot of my test system looks like (which is a quite small VM):

$ ./memstat.sh --oneshot
Free                 773932
Shmem                  2264
Mapped+Cached        359712
Anon                 209364
Pagetables            28544
KernelStack            4256
Buffers                   0
Slab                  63716
SwpCache              21836
HP Used             2023424
HP Rsvd               75776
HP Free              206848
Unknown               11944 (  0%)
Total memory        3781616
---------------------------
Total swp           1048572
Used swp             200852 ( 19%)

There is a lot to say about linux memory management. One important thing to realise is that when a system is running low on memory, it will not show as ‘Free’ declining towards zero. Linux will keep a certain amount of memory for direct use, dictated by ‘vm.min_free_kbytes’ as the absolute minimum.

In general, the ‘Cached’ pages (not Shmem pages at first) will be made available under memory pressure, since the Linux page cache really is only caching for potential performance benefit, there is no process directly attached to ‘Cached’ pages. Please mind my experimentations show there is no reliable way I could make a distinction between true ‘Mapped’ pages, meaning pages which are in use as memory mapped files, and true ‘Cached’ pages, meaning disk pages (blocks) sized 4KB which are kept in memory for the sake of reusing them, not directly related to a process.

Once the the number of page cache pages gets low, and there still is need for available pages, pages from the other categories are starting to get moved to swap. This excludes huge pages, even if they are not used! The way pages are considered is based on an ageing mechanism. This works quite well for light memory pressure for a short amount of time.

In fact, this works so well that the default eagerness of the kernel to swap (vm.swappiness, 60 by default, I have seen 30 as a default value too, 0=not eager to swap, 100=maximal swap eagerness) seems appropriate on most systems, even ones which need strict performance requirements. In fact, when swappiness is set (too) low, the kernel will try to avoid swapping as long as possible, meaning that once there is no way around it, it probably needs to swap multiple pages, leading to noticeable delays, while paging out single pages more in advance will have a hardly noticeable overhead.

However, please mind there is no way around consistent memory pressure! This means if memory in active use exceeds physical available memory, it results in physical memory to be shared at the cost of active memory pages being swapped to disk, for which process have to wait.

To show the impact of memory pressure, and how hard it is to understand that from looking at the memory pages, let me show you an example. I ran ‘memstat.sh’ in one session, and the command ‘memhog’ (part of the numactl rpm package) in another. My virtual machine has 4G of memory, and has an Oracle database running which has the SGA allocated in huge pages.

First I started memstat.sh, then ran ‘memhog 1g’, which allocates 1 gigabyte of memory and then releases it. This is the memstat output:

$ ./memstat.sh
          Free          Shmem  Mapped+Cached           Anon     Pagetables    KernelStack        Buffers           Slab       SwpCache        HP Used        HP Rsvd        HP Free        Unknown   %
         42764         435128         495656         387872          40160           4608             96          38600             24              0              0        2306048          30660   0
         42616         435128         495656         388076          40256           4640             96          38572             24              0              0        2306048          30504   0
         42988         435128         495656         388088          40256           4640             96          38576             24              0              0        2306048          30116   0
         42428         435128         495700         388108          40264           4640             96          38600             24              0              0        2306048          30580   0
        894424         320960          99456          12704          40256           4640              0          35496          42352              0              0        2306048          25280   0
        775188         321468         141804          79160          40260           4640              0          35456          70840              0              0        2306048           6752   0
        698636         324248         201476          95044          40264           4640              0          35400          64744              0              0        2306048          11116   0
        686452         324264         202388         107056          40260           4640              0          35392          66076              0              0        2306048           9040   0
        682452         324408         204496         108504          40264           4640              0          35388          65636              0              0        2306048           9780   0

You can see memstat taking some measurements, then memhog is run which quickly allocates 1g and releases it. This is done between rows 6 and 7. First of all the free memory: once the process has allocated all the memory, it stops running which means the memory is freed. Any private memory allocation mapped into the (now quitted) process address space which has backing by a physical page is returned to the operating system as free because it has effectively become available. So what might seem counter-intuitive, by stopping a process that allocated a lot of non-shared (!) memory, it results in a lot of free memory being available.

As I indicated, ‘Cached’ memory is first to be released to provide memory pages for direct use. Mapped+Cached does contain this together with Mapped memory. The amount of pages used by Mapped and Cached are drastically reduced by swapping. ‘Anon’ pages are significantly reduced too, which means they are swapped to the swap device, and ‘Shmem’ is reduced too, which means swapped to the swap device, but way lesser than ‘Mapped+Cached’ and ‘Anon’. ‘Kernel’ (kernel stack) and ‘Pagetables’ hardly decreased and ‘Slab’ decreased somewhat. ‘Swapcache’ actually grew, which makes sense because that is related to the swapping that took place.

The main thing I wanted to point out is that between the time of no memory pressure (lines 2-6) and past memory pressure (8-16), there is no direct memory statistic showing that a system is doing okay nor having suffered. The only thing that directly indicates memory pressure are active swapping in and swapping out, which can be seen with sar -W; pswpin/s and pgwpout/s, or vmstat si/so columns; which are not shown here.

Even past memory pressure, where prior linux memory management had swapped out a lot of pages to facilitate the 1G being allocated which immediately after been allocated was freed and returned as free memory, the majority of the pages on my system that have been swapped out are still swapped out:

$ ./memstat.sh --oneshot
...
Total memory       3781616
--------------------------
Total swp          1048572
Used swp            407228 ( 38%)

This underlines an important linux memory management principle: only do something if there is an immediate, direct need. My system now has no memory pressure anymore, but still 38% of my swap is allocated. Only if these pages are needed, they are paged back in. This underlines the fact that swapping can not and should not be measured by looking at the used amount of swap, a significant amount of swap being used only indicates that memory pressure has occurred in the past. The only way to detect swapping is taking place is by looking at the actual current amount of pages being swapped in and out.

If you see (very) low amounts of pages being swapped out without pages being swapped in at the same time, it’s the swappiness setting that makes pages being moved that have not been used for some time out to the swap device. This is not a problem. If you see pages being swapped in without pages being swapped out at the same time, it means pages that were swapped out either because of past memory pressure or proactive paging due to swappiness are read back in, which is not a problem too. Again, only if both pages are actively being swapped in and out at the same time or if the rate is very high there is a memory problem. The swapping actually is helping you not fail because of memory not being available at all.

Redo OP Codes:

This posting was prompted by a tweet from Kamil Stawiarski in response to a question about how he’d discovered the meaning of Redo Op Codes 5.1 and 11.6 – and credited me and Julian Dyke with “the hardest part”.

Over the years I’ve accumulated (from Julian Dyke, or odd MoS notes, etc.) and let dribble out the occasional interpretation of a few op codes – typically in response to a question on the OTN database forum or the Oracle-L listserver, and sometimes as a throwaway comment in a blog post, but I’ve never published the full set of codes that I’ve acquired (or guessed) to date.

It’s been some time since I’ve looked closely at a redo stream, and there are many features of Oracle that I’ve never had to look at at the level of the redo so there are plenty of gaps in the list – and maybe a few people will use the comments to help fill the gaps.

It’s possible that I may be able to add more op codes over the new days – I know that somewhere I have some op codes relating to space management, and a batch relating to LOB handling, but it looks like I forgot to add them to the master list – so here’s what I can offer so far:


1	Transaction Control

2	Transaction read

3	Transaction update

4	Block cleanout
		4.1	Block cleanout record
		4.2	Physical cleanout
		4.3	Single array change
		4.4	Multiple array changes
		4.5	Format block
		4.6	ktbcc redo -  Commit Time Block Cleanout Change (?RAC, ?recursive, ?SYS objects)

5	Transaction undo management
		5.1	Update undo block
		5.2	Get undo header
		5.3	Rollout a transaction begin
		5.4	On a rollback or commit
		5.5	Create rollback segmenr
		5.6	On a rollback of an insert
		5.7	In the ktubl for 'dbms_transaction.local_transaction_id'
			(begin transaction) - also arrives for incoming distributed
			tx, no data change but TT slot acquired. Also for recursive
			transaction (e.g. truncate). txn start scn:  0xffff.ffffffff
		5.8	Mark transaction as dead
		5.9	Rollback extension of rollback seg
		5.10	Rollback segment header change for extension of rollback seg
		5.11	Mark undo as applied during rollback
		5.19	Transaction audit record - first
		5.20	Transaction audit record - subsequent
		5.23	ktudbr redo: disable block level recovery (reports XID)
		5.24	ktfbhundo - File Space Header Undo

6	Control file

10	Index
		10.1	SQL load index block
		10.2	Insert Leaf Row
		10.3	Purge Leaf Row
		10.4	Delete Leaf Row
		10.5	Restore Leaf during rollback
		10.6	(kdxlok) Lock block (pre-split?)
		10.7	(kdxulo) unlock block during undo
		10.8	(kdxlne) initialize leaf block being split
		10.9	(kdxair) apply XAT do to ITL 1	-- related to leaf block split 
		10.10	Set leaf block next pointer
		10.11	(kdxlpr) (UNDO) set kdxleprv (previous pointer)
		10.12 	Initialize root block after split
		10.13	index redo (kdxlem): (REDO) make leaf block empty,
		10.14	Restore block before image
		10.15	(kdxbin) Insert branch block row	
		10.16	Purge branch row
		10.17	Initialize new branch block
		10.18	Update key data in row -- index redo (kdxlup): update keydata
		10.19	Clear split flag
		10.20	Set split flag
		10.21	Undo branch operation
		10.22	Undo leaf operation
		10.23	restore block to tree
		10.24	Shrink ITL
		10.25	format root block
		10.26	format root block (undo)
		10.27	format root block (redo)
		10.28	Migrating block (undo)
		10.29	Migrating block (redo)
		10.30	Update nonkey value
		10.31	index root block redo (kdxdlr):  create/load index
		10.34 	make branch block empty
		10.35	index redo (kdxlcnu): update nonkey
		10.37	undo index change (kdxIndexlogicalNonkeyUpdate) -- bitmap index
		10.38	index change (kdxIndexlogicalNonkeyUpdate) -- bitmap index
		10.39	index redo (kdxbur) :  branch block update range
		10.40	index redo (kdxbdu) :  branch block DBA update,

11	Table
		11.1  undo row operation 
		11.2  insert row  piece
		11.3  delete row piece 
		11.4  lock row piece
		11.5  update row piece
		11.6  overwrite row piece
		11.7  manipulate first column
		11.8  change forwarding address - migration
		11.9  change cluster key index
		11.10 Set Cluster key pointers
		11.11 Insert multiple rows
		11.12 Delete multiple rows
		11.13 toggle block header flags
		11.17 Update multiple rows
		11.19 Array update ?
		11.20 SHK (mark as shrunk?)
		11.24 HCC update rowid map ?

12	Cluster

13	Segment management
		13.1	ktsfm redo: -- allocate space ??
		13.5	KTSFRBFMT (block format) redo
		13.6	(block link modify) (? index )  (options: lock clear, lock set)
		13.7	KTSFRGRP (fgb/shdr modify freelist) redo: (options unlink block, move HWM)
		13.13	ktsbbu undo - undo operation on bitmap block
		13.14	ktsbbu undo - undo operation on bitmap block
		13.17	ktsphfredo - Format Pagetable Segment Header
		13.18	ktspffredo - Format Level1 Bitmap Block
		13.19	ktspsfredo - Format Level2 Bitmap Block
		13.21	ktspbfredo - Format Pagetable Datablock
		13.22	State change on level 1 bitmap block
		13.23	Undo on level 1 bitmap block
		13.24	Bitmap block (BMB) state change (level 2 ?)
		13.25	Undo on level 2 bitmap block 
		13.26	?? Level 3 bitmap block state change ??
		13.27	?? Level 3 bitmap block undo ??
		13.28	Update LHWM and HHWM on segment header
		13.29	Undo on segment header
		13.31	Segment shrink redo for L1 bitmap block
		13.32	Segment shrink redo for segment header block

14	Extent management
		14.1	ktecush redo: clear extent control lock
		14.2	ktelk redo - lock extent (map)
		14.3	Extent de-allocate
		14.4	kteop redo - redo operation on extent map
		14.5	kteopu undo - undo operation on extent map
		14.8	kteoputrn - undo operation for flush for truncate

15	Tablespace

16	Row cache

17	Recovery management
		17.1	End backup mode marker
		17.3	Crash Recovery at scn:  0x0000.02429111
		17.28	STANDBY METADATA CACHE INVALIDATION
	
18	Block image (hot backups)
		18.1	Block image
		18.3	Reuse redo entry 
				   (Range reuse: tsn=1 base=8388753 nblks=8)
				or (Object reuse: tsn=2 objd=76515)

19	Direct loader
		19.1	Direct load block record
		19.2	Nologging invalidate block range
			Direct Loader invalidate block range redo entry

20	Compatibility segment

21	LOB segment 
		21.1	kdlop (Long Feild) redo:  [sic]
				(insert basicfile clob)

22	Locally managed tablespace
		22.2	ktfbhredo - File Space Header Redo:
		22.3	ktfbhundo - File Space Header Undo:
		22.5	ktfbbredo - File BitMap Block Redo:
		22.16	File Property Map Block (FPM)

23	Block writes
		23.1	Block written record
		23.2	Block read record (BRR) -- reference in Doc ID: 12423475.8

24	DDL statements
		24.1	DDL
		24.2	Direct load block end mark
		24.4	?? Media recovery marker
		24.10	??
		24.11	??

(E & O.E) – you’ll notice that some of the descriptions include question marks – those are guesses – and some are little more than the raw text extracted from a redo change vector with no interpretation of what they might mean.

Update

It didn’t take long for someone to email me a much longer list that has been published elsewhere on the Internet. The results don’t have the hierarchical style display I have above, so I may copy the extra entries into the list above when I get a little time.

When UPDATE becomes an INSERT

During a research for VOODOO, we came across a lot of interesting stuff inside REDO.
One of my favourites is an UPDATE, becoming an INSERT </p />
</p></div>

    	  	<div class=

When UPDATE becomes an INSERT

During a research for VOODOO, we came across a lot of interesting stuff inside REDO.
One of my favourites is an UPDATE, becoming an INSERT </p />
</p></div>

    	  	<div class=

Fast Now, Fast Later

The following is the text of an article I published in the UKOUG magazine several years ago (2010), but I came across it recently while writing up some notes for a presentation and thought it would be worth repeating here.

Fast Now, Fast Later

The title of this piece came from a presentation by Cary Millsap and captures an important point about trouble-shooting as a very memorable aphorism. Your solution to a problem may look good for you right now but is it a solution that will still be appropriate when the database has grown in volume and has more users.

I was actually prompted to write this article by a question on the OTN database forum that demonstrated the need for the basic combination of problem solving and forward planning. Someone had a problem with a fairly sudden change in performance of his system from November to December, and he had some samples from trace files and Statspack of a particular query that demonstrated the problem.

The query was very simple:

select  *
from    tph
where   pol_num = :b0
order by
        pm_dt, snum

When the query was operating fast enough the trace file from a sample run showed the following (edited) tkprof output, with an the optimizer taking advantage of the primary key of (pol_num, pm_dt, snum) on table TPH to avoid a sort for the order by clause. (Note that the heading on the plan is “Row Source Operation” – which means it’s the execution plan that really was used)

call    count    cpu  elapsed  disk  query  current  rows
---------------------------------------------------------
Parse       1   0.01     0.13     0    106        0     0
Execute     1   0.03     0.03     0      0        0     0
Fetch       4   0.01     0.22    46     49        0    43
---------------------------------------------------------
total       6   0.06     0.39    46    155        0    43

Rows Row Source Operation
---- --------------------
  43 TABLE ACCESS BY INDEX ROWID TPH (cr=49 pr=46 pw=0 time=226115 us)
  43   INDEX RANGE SCAN TPH_PK (cr=6 pr=3 pw=0 time=20079 us)(object id 152978)

Elapsed times include waiting on following events:
Event waited on               Times  Max. Wait Total Waited
                             Waited 
-----------------------------------------------------------
db file sequential read          46       0.01         0.21

When the query was running less efficiently the change in the trace didn’t immediately suggest any fundamental problems:


call    count    cpu  elapsed  disk  query  current  rows
---------------------------------------------------------
Parse       1   0.00     0.00     0     51        0     0
Execute     1   0.01     0.01     0      0        0     0
Fetch       4   0.00     0.59    47     51        0    45
---------------------------------------------------------
total       6   0.01     0.61    47    102        0    45

Rows Row Source Operation
---- --------------------
45 TABLE ACCESS BY INDEX ROWID TPH (cr=51 pr=47 pw=0 time=593441 us)
45 INDEX RANGE SCAN TPH_PK (cr=6 pr=2 pw=0 time=33470 us)(object id 152978)

Elapsed times include waiting on following events:
Event waited on               Times  Max. Wait Total Waited
                             Waited 
-----------------------------------------------------------
db file sequential read          47       0.03         0.58

The plan is the same, the number of rows returned is roughly the same, and the number of disc reads and buffer gets has hardly changed. Clearly the overall change in performance comes from the slower average disk read times (a total of 0.21 seconds with a maximum of one hundredth of a second, compared to a total 0.58 seconds with a maximum of 3 hundredths), but why has the disk I/O time changed?

The figures give us a couple of preliminary ideas. An average read time of 4.5 milliseconds ( 0.21 seconds / 46 reads) is pretty good for a “small” random read of a reasonably loaded disc subject to a degree of concurrent access [ed: bearing in mind this was 2010], so the waits for “db file sequential read” in the first tkprof output are probably getting some help from a cache somewhere – possibly a SAN cache at the end of a fibre link or maybe from a local file system buffer (we might get a better idea if we could see the complete list of individual read times).

In the second case an average of 12.3 milliseconds ( 0.58 seconds / 45 reads) looks much more like a reasonable amount of genuine disc I/O is taking place – and the maximum of 30 milliseconds suggests that the disc(s) in question are subject to an undesirable level of concurrent access: our session is spending some of its time in a disk queue. Again, it would be nice to see the wait times for all the reads, but at this point it’s not really necessary.

There are couple more clues about what’s going on – one is the text of the query itself (and I’ll be coming back to that later) and the other is in the detail of the disk I/Os. If you check the “Row Source Operation” details you’ll see that in the first case the sample query selected 43 rows from the table and requested 43 (46 – 3) physical reads (pr) of the table to do so. In the second case it was 45 rows and 45 (47 – 2) physical reads. Is this simply a case of the same query needing a little more data and having to do a little more work as time passes?

So now we come to the Statspack data. Based on my observations (or guesses) about the nature of the query and the work going on, I asked if we could see some summary information for a couple of comparative intervals, and also to see if this particular query appeared in the “SQL ordered by reads” section of the Statspack reports. Here are the results, first for a snapshot taken in October:


Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                      3,816,939      58,549     15   79.4
CPU time                                                     7,789          10.6
db file parallel write                         371,865       2,005      5    2.7
log file parallel write                         75,554       1,552     21    2.1
log file sync                                   17,198       1,228     71    1.7

                                                     CPU      Elapsd     Old
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
        775,166       43,228           17.9   24.3   212.58  12449.98  1505833981
Module: javaw.exe
SELECT * FROM TPH WHERE POL_NUM = :B1 ORDER BY PM_DT ,SNUM FOR UPDATE NOWAIT}

You might notice that the critical query is actually a ‘select for update’ rather than the simple select that we had originally been told about; this doesn’t affect the execution plan, but is going to have some significance as far as undo and redo are concerned.

Now look at the corresponding figures for an interval in December:

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
db file sequential read                      7,000,428      92,970     13   89.8
CPU time                                                     6,780           6.5
db file parallel write                         549,286       1,450      3    1.4
db file scattered read                          84,127         720      9     .7
log file parallel write                         41,197         439     11     .4


                                                     CPU      Elapsd     Old
 Physical Reads  Executions  Reads per Exec %Total Time (s)  Time (s) Hash Value
--------------- ------------ -------------- ------ -------- --------- ----------
      2,444,437       43,363           56.4   25.2   221.31  23376.07 1505833981
Module: javaw.exe
SELECT * FROM TPH WHERE POL_NUM = :B1 ORDER BY PM_DT ,SNUM FOR UPDATE NOWAIT

You’ll see in both cases that a huge fraction of the total database time is spent in single block reads (the “db file sequential read” time), but in December the number of reads has gone up by about 3.2 million. You can also see that about 1.7 million of the “extra” reads could be attributed to the critical query even though the number of executions of that query has hardly changed. The average number of reads per execution has gone up from 18 to 56. (I did ask if I could see the section of Statspack titled “SQL ordered by Executions” as this includes the average number of rows per execution and it would have been nice to know whether this average had gone up just a little bit, or whether it had gone up in line with the average reads per execution. Unfortunately the request was ignored, so I am going to proceed as if the change in the average result set was small.)

This, perhaps, tells us exactly what the problem is (and even if it doesn’t, the figures are symptomatic of one of the common examples of non-scalable queries).

Look at the query again – are we reporting all the rows for a “policy number”, ordered by “payment date”. If so, the number of payments recorded is bound to increase with time, and inevitably there will be lots of payments for policies belonging to other people between each pair of payments I make on my policy – and that would put each of my payments in a different table block (if I use a normal heap table).

Initially the payments table may be sufficiently small that a significant fraction of it stays in Oracle’s data cache or even in the file-system or SAN cache; but as time passes and the table grows the probability of me finding most of my blocks cached decreases – moreover, as time passes, I want increasing numbers of blocks which means that as I read my blocks I’m more likely to knock your blocks out of the cache. Given the constantly increasing numbers of competing reads, it is also no surprise that eventually the average single block read time also increases.

In scenarios like this it is inevitable that performance will degrade over time; in fact it is reasonably likely that the performance profile will degrade slowly to start with and then show an increasingly dramatic plunge. The only question really is how much damage limitation you can do.

One strategy, of course, is to increase the memory available for the critical object(s). This may mean assigning the table to a generously sized KEEP cache. (The cache need not be the same size as the table to improve things, but the bigger the better – for this query, at least). But such a strategy is only postponing the inevitable – you really need to find an approach which is less susceptible to the passage of time.

In this case, there are a few options to consider. First – note that we are selecting all the rows for a policy: do we really need to, or could we select the rows within a given date range, thus setting an upper limit on the average volume of data we need to acquire for any one policy. If we do that, we may want to think about strategies for summarizing and deleting older data, or using partitioning to isolate older data in separate segments.

If we can’t deal with the problem by changing the code (and, in this case, the apparent business requirement) can we avoid the need to visit so many data blocks for single policy. There are two obvious options to consider here – we could create the table as an “index cluster” clustered on the policy number; in this way we pay a penalty each time we insert a new row for a policy because we have to find the correct block in the cluster but when we run a query against that policy we will only need to read one or two blocks (probably) to get all the data. Alternatively we could consider setting the table up as an index-organized table (IOT) – again we do more work inserting data into the correct leaf block in the index but again we reap the benefit as we query the data because all the rows we want are in the same two or three leaf blocks (and stored in the order we want them).

Of course we are still subject to the same basic problem of the result set increasing in size as time passes, but at least we have managed to reduce (dramatically) the number of blocks we have to visit and the rate of growth of the number of blocks per query, thereby improving the scalability of the queries significantly.

Introducing new structures to an existing system is difficult, of course, and we may have to work out variations on this theme (like creating an index that includes all the table columns if we can’t switch to an IOT!). The key point is this, though: sometimes we can look at our data and the critical queries and recognize that the volume of data we have to process (even if we don’t return it, as we did in this example) is always going to increase over time, then we need to consider ways of minimizing the volume of data, or improving the packing of data so that the work we do doesn’t change (much) over time. Don’t just think ‘fast now’, think ‘will it still be fast later’.

 

Quick tip–database link passwords

If you are relying on database links in your application, think carefully about how you want to manage the accounts that you connect with, in particular, when it comes to password expiry.

With a standard connect request to the database, if your password is going to expire soon, you will get some feedback on this:



SQL> conn demo/demo@np12
ERROR:
ORA-28002: the password will expire within 6 days


Connected.


But when using those same credentials via a database link, you will not get any warning, so when that password expires…you might be dead in the water.



SQL> create database link demolink connect to demo identified by demo using 'np12';

Database link created.

SQL> select * from tab@demolink;

TNAME
------------------------------------------------------------------------------------------------
TABTYPE  CLUSTERID
------- ----------
EMPLOYEES
TABLE

In Memoriam – 2

This is the second of two items that my mother typed out more than 25 years ago. I had very mixed emotions when reading it but ultimately I felt that it was a reminder that, despite all the nasty incidents and stupid behaviour hyped up by the press and news outlets, people and organisations are generally kinder, gentler and more understanding than they were 60 years ago.

This story is about the birth of my brother who was born with a genetic flaw now known as Trisomy 21 though formerly known as Down’s syndrome or (colloquially, and no longer acceptably) mongolism. It is the latter term that my mother uses as it was the common term at the time of birth and at the time she typed her story.

A child is born.

The history written by Dorothy Kathleen Lewis (1925 – 2017) about the birth of her first son

My pregnancy was normal. The first indication I had that something was wrong was in the delivery room when the baby was born; there was “oh” and silence then whispers. I asked what was wrong but was told nothing. The baby was put in a cot and the doctor came into the room and then he was taken out. I had not seen the baby – just knew that it was a boy. Again when I asked I was told that this was routine. Eventually the baby was brought back and given to me. When I saw him I thought he looked very odd and was so floppy. When I held him upright I could see he was a mongol, but prayed that I was wrong and this would go away. I asked to be told what the matter was with the baby and was told to tell my husband to ask if I was worried – which made me more suspicious.

Visiting was restricted and I did not see my husband until the evening. Fathers were just shown the babies at the nursery door and were not allowed to hold them. My husband was delighted that we had a little boy and I didn’t have the heart to tell him what I feared.

David had difficulty feeding and was put on a bottle at three days, the teat of the bottle made with a big enough hole for the milk to drip into his mouth because he was not sucking. When we went home, still not having been told of his conditions, he was being fed 8 times a day taking just 1.5 ounces per feed. Each feed took an hour to get into him, then at night it was back to bed for 2 hours and a repeat performance.

I took David to the child welfare clinic and again the actions of the people there spoke volumes, the health visitor hurried into the doctor and I was shown in – jumping the queue. (The clinic was held in our church hall which was next to the vicarage and I was very embarrassed that I should be singled out, although it was obvious why.) I asked the doctor what she thought and she said he did look mongoloid, but perhaps I should see the paediatrician where he was born.

At 6 weeks [ed: see photo] I went for my post natal and there was great concern in the waiting room as to how the baby was getting on. None of the other mothers who were there were being asked. I said he still looks like a mongol. My husband was still not aware of David’s condition or my suspicions, I wanted to protect him from the hurt I was feeling, but now I know it was not the kindest thing to do.

I then took David back to the Middlesex hospital and saw the paediatrician, who took him away from me, and whilst I sat at one end of a very large room he had David on a table at the other end with a group of students. I could not hear what they were saying, but when David was brought back to me I was told: “You have a complete vegetable; he will never walk or talk – just lie in his pram and stare up at the sky. The best thing you can do is to put him in a home and forget you ever had a baby.” I was devastated; I couldn’t run away from it any more. He had an enlarged liver and spleen and his spine was curving outwards. When I held him in my arms it was a little like a floppy parcel and there was no buoyancy at all.

When I got home I couldn’t hold back the tears that had been stifled all those weeks and I had to tell my husband. It was dreadful, I think it would have been better had we been able to grieve together in the beginning.

From then on everything David did was a milestone and he brought us a lot of joy. Just before he was five I had to take him to County Hall in London for assessment. That was a nightmare because by this time I had two other children – little boys – it was necessary to take the older of the two with me, a very busy child. We went into a large room and an elderly fussy lady had a lot of questions for David to answer. He was shown pictures and asked what they were. He hid his face from her and was saying the words to me, many of which he already knew, but because he was not answering her they were crossed out. So his assessment was a very low one.

I don’t think it would have made a lot of difference whether he had answered her he so surely wasn’t school material. He had been going to a junior training centre from the age of 3½ because I was expecting Jonathan. A social worker who came to see me at that time asked what sex I would like my third child to be and I said I didn’t mind so long as I had a normal healthy child, and she said that was a funny answer to give – I didn’t think it funny.

People’s reactions were very different 37 years ago [ed: now 64 years]. Once it was made known that David was as he was people who had known me from childhood would cross the street [to avoid speaking to me], they didn’t know what to say. But we didn’t hide him away and when we went on holiday we just said three children and we sometimes got a reaction when we arrived, but David was always well-behaved and everybody loved him. He learned a great deal from his brothers and I thank God he was our first child.

[Dorothy Kathleen Lewis: Banbury 1990]

https://jonathanlewis.files.wordpress.com/2017/07/david_6wks.jpg?w=150&h=93 150w, https://jonathanlewis.files.wordpress.com/2017/07/david_6wks.jpg?w=300&h... 300w, https://jonathanlewis.files.wordpress.com/2017/07/david_6wks.jpg?w=768&h... 768w, https://jonathanlewis.files.wordpress.com/2017/07/david_6wks.jpg?w=1024&... 1024w, https://jonathanlewis.files.wordpress.com/2017/07/david_6wks.jpg 1067w" sizes="(max-width: 599px) 100vw, 599px" />

Footnote

While copying up this story I was prompted to look at a few statistics from the UK’s Office of National Statistics for 1953 (and 50 years later); in particular the stats about child mortality and measles caught my eye.

Infant mortality for England and Wales

Year Births Still-births Died with 1 week Died within 4 weeks Died within 1 year
1953 684,372 15,681 10,127 12,088 18,324
2003 621,469 3,612 1,749 2,264 3,306

Don’t forget when you read the mortality figures that the 2003 numbers will include births that could be anything up to 8 weeks premature. I think anything more than about 2 weeks premature would probably have ended up in the still-births column in 1953.

Measles (England and Wales).

Year Cases Reported Deaths
1953 545,050 242
2003 2,048 0

Addendum

I’ve received a private email pointing out that some of the cases reported as still-births in 1953 would now be identified as murder, where babies born with obvious viability issues would be smothered at birth – sometimes without the mother even knowing.

 

Words I Don’t Use, Part 1: “Methodology”

Today, I will begin a brief sequence of blog posts that I’ll call “Words I Don’t Use.” I hope you’ll have some fun with it, and maybe find a thought or two worth considering.

The first word I’ll discuss is methodology. Yes. I made a shirt about it.

Approximately 100% of the time in the [mostly non-scientific] Oracle world that I live in, when people say “methodology,” they’re using it in the form that American Heritage describes as a pretentious substitute for “method.” But methodology is not the same as method. Methods are processes. Sequences of steps. Methodology is the scientific study of methods.

I like this article called “Method versus Methodology” by Peter Klein, which cites the same American Heritage Dictionary passage that I quoted on page 358 of Optimizing Oracle Performance.

Chinese support added to parts of the plsql random data generator

For quite a while I have been wanting to add Chinese support to some of the dbms_random functions, and this
last weekend I finally got some time to work a little bit on it. So now the library support Chinese output in functions in the
core_random package, text_random package and location_random package.