Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

dbca silent mode – Windows

Just a quick tip that often catches me out. If you are like me, you have long since tired of clicking Next, Next, Next, … through the GUI when you want to quickly create a database. Many people work around this by storing a set of database creation scripts. However, you can do even better. The Database Creation Assistant (dbca) can also be used at the command line and in silent mode.

On Windows, this is the error I commonly get when using dbca at the command line


C:\oracle\product\19\bin>dbca -silent -createDatabase  -templateName General_Purpose.dbc  -gdbname db19x ...
[FATAL] [DBT-50000] Unable to check for available memory.

There is nothing wrong with the tool, and nothing wrong with Windows Smile. In this case, it is a PEBKAC moment – all you need to do is ensure (as per the docs!) that you always create your databases with a command prompt that has been opened with “Run as Administrator”. Once you’ve done that, all should be fine.


C:\oracle\product\19\bin>dbca -silent -createDatabase  -templateName General_Purpose.dbc  -gdbname db19x ...
Prepare for db operation
8% complete
31% complete
Creating and starting Oracle instance
...

If you haven’t used dbca at the command line, you should check it out. Databases are now trivial to create with a one line command.

Video : Oracle REST Data Services (ORDS) : RESTful Web Services Handling Media Files

In today’s video we take a look at RESTful web services handling media files built using Oracle REST Data Services (ORDS).

This is based on this article.

There is more information about related stuff here.

The star of today’s video is Katrina Shallahamer, who is one of my favourite people to meet at an event. She’s such a positive person to be around, and makes me wish I was a nicer person. It’s clear she’s the real brains of the OraPub operation, but don’t tell Craig I said so! </p />
</p></div>

    	  	<div class=

2019-what grabbed your attention

Here are the blog posts that you hit on most this year, with the most viewed entry on top. Unsurprisingly is it related to my bugbear with the OpenWorld catalog. I mean, every conference organizer must know that the one thing the attendees will always want is to get access to all of the content. Questions on UTL_FILE often come up on AskTOM, so it is unsurprising to see UTL_FILE pop up on the list. And finally, if you want evidence of how fast things change, the ORA-14758 post on interval partitioing is no longer an issue in recent version of the database, because we automatically correct the definitions when you try to drop the last partition.

Thanks for supporting the blog, and as always, there will be more content next year!

Have a great festive season!

Tips'n'tricks: understanding "too many authentication failures" in SSH

Virtualbox VMs powered by Vagrant require authentication via SSH keys so you don’t have to provide a password each time vagrant up is doing its magic. Provisioning tools you run as part of the vagrant up command also rely on the SSH key based authentication to work properly. This is documented in the official Vagrant documentation set.

I don’t want to use unknown SSH keys with my own Vagrant boxes as a matter of principle. Whenever I create a new custom box I resort to a dedicated SSH key I’m using just for this purpose. This avoids the trouble with Vagrant’s “insecure key pair”, all I need to do is add config.ssh.private_key_path = "/path/to/key" to the Vagrantfile.

The documentation further reads I have to use a NAT device as the first network card in the VM. For some of my VMs I define an additional NIC using a host-only, private network for communication between say for example middle tier and database layer. I don’t want to mess around with port forwarding to enable communication between my VMs, and Vagrant makes it super easy to define another NIC.

This sounds interesting, but what does that have to do with this post?

Please bear with me, I’m building up a story ;) It will all make sense in a minute…

Connecting to the VM’s second interface

With all that in place it’s easy to SSH into my Vagrant box. Assume I have a Vagrant VM with an IP address of 192.168.56.202 to which I want to connect via SSH. Remember when I said I have a dedicated SSH key for my Vagrant boxes? The SSH key is stored in ~/.ssh/vagrant. The SSH command to connect to the environment is simple:

$ ssh -i ~/.ssh/vagrant vagrant@192.169.56.202

… and this connects me without having to provide a password.

Saving time for the lazy

Providing the path to the SSH key to use gets a little tedious after a while. There are a couple of solutions to this; there might be more, but I only know about these two:

  • Create a configuration in ~/.ssh/config Except that doesn’t work particularly well with keys for which you defined a passphrase as you now have to enter the passphrase each time
  • Add the key to the SSH agent

On Linux and MacOS I prefer the second method, especially since I’m relying on passphrases quite heavily. Recently I encountered a problem with this approach, though. When trying to connect to the VM, I received the following error message:

$ ssh vagrant@192.169.56.202
Received disconnect from 192.169.56.202 port 22:2: Too many authentication failures
Disconnected from 192.169.56.202 port 22

What’s that all about? I am sure I have the necessary key added to the agent:

$ ssh-add -l | grep -c vagrant
1

Well it turns out that if you have too many non-matching keys, you can run into the pre-authentication problem like I did. The first step in troubleshooting SSH connections (at least to me) is to enable the verbose option:

$ ssh -v vagrant@192.169.56.202

[ ... more detail ... ]

debug1: Will attempt key: key1 ... redacted ... agent
debug1: Will attempt key: key10 ... redacted ... agent
debug1: Will attempt key: key2 ... redacted ... agent
debug1: Will attempt key: key3 ... redacted ... agent
debug1: Will attempt key: key4 ... redacted ... agent
debug1: Will attempt key: key5 ... redacted ... agent
debug1: Will attempt key: key6 ... redacted ... agent
debug1: Will attempt key: key7 ... redacted ... agent
debug1: Will attempt key: key8 ... redacted ... agent
debug1: Will attempt key: key9 ... redacted ... agent

[ ... ]

debug1: Next authentication method: publickey
debug1: Offering public key: key1 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password

[...]

debug1: Offering public key: key5 ... redacted ... agent
debug1: Authentications that can continue: publickey,gssapi-keyex,gssapi-with-mic,password
Received disconnect from 192.169.56.202 port 22:2: Too many authentication failures
Disconnected from 192.169.56.202 port 22

It is my understanding that SSH is querying the agent for SSH keys, and it receives them. After trying key1 through key5 and not finding a match, it decides to stop and returns said error message.

There are quite a few keys currently added to my running agent:

$ ssh-add -l | wc -l
11

The Solution

The solution is quite straight forward: I need to store keys with the agent, but I have to indicate which of the stored keys to log in to my VM. This is probably best done in ~/.ssh/config:

$ cat ~/.ssh/config 

192.168.56.202
    IdentityFile ~/.ssh/vagrant

In summary, I’m now using a combination of the 2 approaches I outlined above to great effect: now I can log in without having to worry about the keys stored by my agent, and the order in which they are stored.

What are you really measuring?

File this under the “Lies, damn lies and statistics” section.

As I walked into the office the other day I looked at a monitor used to display various bits of news and information. One of the things it displayed was the number of service desk calls over the last period. I’m sure my reaction was meant to be, “Wow, the service desk are smashing it by answering all those questions!”, but my actual reaction was, “Wow, our services suck!”

The issue here being you can look at those statistics in a number of ways. If our systems were better and more intuitive we would get less calls, so a statistic showing a large number of answered calls reads as a bad thing to me. We should be aiming to reduce the total number of calls by building better systems. The fact people need to interact with the service desk should be seen as a negative IMHO. That’s not to downplay the work our service desk are doing, but we should be aiming to make it as unnecessary as possible.

So coming back to the title of the post, when people post an assortment of KPIs, it’s worth taking a step back and asking what they mean, or could mean. What is really being measured by these KPIs? Maybe that number you display with pride is actually a sad indictment on your systems. Maybe not. I know what I think. </p />
</p></div>

    	  	<div class=

Oracle Database 19c Automatic Indexing: Index Compression (Ghosteen)

    In my previous post on Automatic Indexing, I discussed how the default index column order (in absence of other factors) is column id, the order in which the columns are defined in the table. In this post, I’ll explore if this changes if index compression is also implemented. By default, Automatic Indexing does […]

Importing geo-partitioned data… the easy way

 

setting the stage

I started at Cockroach labs back in June 2019 to help others learn how to architect and develop applications using a geo-distributed database.  There has been a resurgence in distributed database technology, but the focus on geo-distributed is quite unique to CockroachDB.  While the underlying technology is unique, developers and DBAs that come with a wealth of experience, need to know how to best use this innovative technology.  Given this situation, I thought it would be good to start a blog series to explore various topics facing anyone beginning to architect database solutions with CockroachDB.

To start using a database, the first step is to IMPORT table data so you can begin to see how the database performs and responds.  And thus the IMPORT series has started!

The bulk of my personal experience with databases not surprisingly comes from work done with Oracle.  To that point, I was working with a customer to show them how to import Oracle data.  I extracted data from the Oracle SwingBench benchmark and created CSV files using the examples in the CockroachDB documentation.  This dataset can be easily replicated using the wonderful toolkit created by Dominic Giles.

import methodology

There are several ways to import data with CockroachDB.   With the IMPORT INTO command, you can import CSV data into an existing table.  The IMPORT TABLE  command imports an entire table from CSV files in one step creating the primary key and secondary indexes in parallel.  For this example, the entire table will be imported via CSV files.

As you might expect with a distributed database, there needs to be an efficient way to for each of the nodes to access the CSV files.  CockroachDB supports the ability to access files stored in cloud storage buckets such as Amazon S3, Azure, and Google Cloud.  Additionally, CockroachDB allows for data to be imported from various other URL data services:  http, nfs, and s3 compatible services.  This blog explores the use of HTTP and NFS/local import methods with the following commands:

IMPORT TABLE CUSTOMERS (
CUSTOMER_ID INT,
CUST_FIRST_NAME VARCHAR(40),
CUST_LAST_NAME VARCHAR(40),
NLS_LANGUAGE VARCHAR(3),
NLS_TERRITORY VARCHAR(30),
CREDIT_LIMIT NUMERIC(9,2),
CUST_EMAIL VARCHAR(100),
ACCOUNT_MGR_ID INT,
CUSTOMER_SINCE DATE,
CUSTOMER_CLASS VARCHAR(40),
SUGGESTIONS VARCHAR(40),
DOB DATE,
MAILSHOT VARCHAR(1),
PARTNER_MAILSHOT VARCHAR(1),
PREFERRED_ADDRESS VARCHAR(32),
PREFERRED_CARD INT,
PRIMARY KEY (CUSTOMER_ID)
) CSV DATA ('http://192.168.0.1:3000/cust01.csv.gz', 'http://192.168.0.1:3000/cust02.csv.gz',
'http://192.168.0.1:3000/cust03.csv.gz', 'http://192.168.0.1:3000/cust04.csv.gz',
'http://192.168.0.1:3000/cust05.csv.gz', 'http://192.168.0.1:3000/cust06.csv.gz',
'http://192.168.0.1:3000/cust07.csv.gz', 'http://192.168.0.1:3000/cust08.csv.gz',
'http://192.168.0.1:3000/cust09.csv.gz', 'http://192.168.0.1:3000/cust10.csv.gz',
'http://192.168.0.1:3000/cust11.csv.gz', 'http://192.168.0.1:3000/cust12.csv.gz',
'http://192.168.0.1:3000/cust13.csv.gz', 'http://192.168.0.1:3000/cust14.csv.gz',
'http://192.168.0.1:3000/cust15.csv.gz', 'http://192.168.0.1:3000/cust16.csv.gz',
'http://192.168.0.1:3000/cust17.csv.gz', 'http://192.168.0.1:3000/cust18.csv.gz'
) WITH delimiter = ',';
 

and using NFS/nodelocal:

IMPORT TABLE CUSTOMERS (
CUSTOMER_ID INT,
CUST_FIRST_NAME VARCHAR(40),
CUST_LAST_NAME VARCHAR(40),
NLS_LANGUAGE VARCHAR(3),
NLS_TERRITORY VARCHAR(30),
CREDIT_LIMIT NUMERIC(9,2),
CUST_EMAIL VARCHAR(100),
ACCOUNT_MGR_ID INT,
CUSTOMER_SINCE DATE,
CUSTOMER_CLASS VARCHAR(40),
SUGGESTIONS VARCHAR(40),
DOB DATE,
MAILSHOT VARCHAR(1),
PARTNER_MAILSHOT VARCHAR(1),
PREFERRED_ADDRESS VARCHAR(32),
PREFERRED_CARD INT,
PRIMARY KEY (CUSTOMER_ID)
) CSV DATA ('nodelocal:///importdir/cust01.csv.gz', 'nodelocal:///importdir/cust02.csv.gz',
'nodelocal:///importdir/cust03.csv.gz', 'nodelocal:///importdir/cust04.csv.gz',
'nodelocal:///importdir/cust05.csv.gz', 'nodelocal:///importdir/cust06.csv.gz',
'nodelocal:///importdir/cust07.csv.gz', 'nodelocal:///importdir/cust08.csv.gz',
'nodelocal:///importdir/cust09.csv.gz', 'nodelocal:///importdir/cust10.csv.gz',
'nodelocal:///importdir/cust11.csv.gz', 'nodelocal:///importdir/cust12.csv.gz',
'nodelocal:///importdir/cust13.csv.gz', 'nodelocal:///importdir/cust14.csv.gz',
'nodelocal:///importdir/cust15.csv.gz', 'nodelocal:///importdir/cust16.csv.gz',
'nodelocal:///importdir/cust17.csv.gz', 'nodelocal:///importdir/cust18.csv.gz'
) WITH delimiter = ',';

To support NFS mounts, CockroachDB uses --external-io-dir option when starting each node in the cluster.  This points to the NFS mount directory allowing each node to use the nodelocal URL to define the location of the import files.

geo-distributed vs local clusters

The clusters configured for this blog use Google Cloud n1-standard-4 machine type.  Using these machine types, two clusters were setup:

    1. Local Cluster with 3 nodes in a single region
      Screen Shot 2019-12-06 at 4.23.37 PMhttps://i2.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 300w, https://i2.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 150w" sizes="(max-width: 167px) 100vw, 167px" data-recalc-dims="1" />
    2. Geo-partitioned Cluster with 3 nodes across:
          'us-west1-b', 'us-central1-b', 'us-east1-b'
      Screen Shot 2019-12-06 at 4.22.57 PMhttps://i0.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 300w, https://i0.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 1024w, https://i0.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 768w" sizes="(max-width: 449px) 100vw, 449px" data-recalc-dims="1" />

Local clusters within one region give superb availability and scalability within a single region.  We have many customers that have moved from Postgres to CockroachDB for a more resilient and scaleable solution.   Often, they have local clusters for test and some production while they explore how to best use geo-partitioning.

measuring the throughput

CockroachDB has the ability to ingest raw or compressed CSV files.  There are certainly good reasons to load both file types with various implications.  Typically, you would expect there to be more CPU overhead to decompress while less impact to transferring data due to the increased payload.

To house the CSV data, a HTTP server was configured on a separate machine.  Additionally, to simulate and optimal NFS environment where all data was locally cached, a copy of all the files were placed in the /tmp/importdir directory on each of the nodes.  Finally, the cluster was started with flag --external-io-dir=/tmp.  This allowed for data to be loaded with the nodelocal URL method.

Screen Shot 2019-12-06 at 3.20.08 PMhttps://i1.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 300w, https://i1.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 1024w, https://i1.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 768w, https://i1.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 1536w" sizes="(max-width: 500px) 100vw, 500px" data-recalc-dims="1" />

Data locality is important to achieving the best results.  Speed of light and the network lag is very much in-play.  That said, the geo-distributed cluster only takes a small ~9% hit in throughput vs the local cluster.

If you don't have a real NFS setup, you can achieve excellent results as well by loading data to one of the nodes and use the following nodelocal://1/importdir/cust01.csv.gz to load from a specific node.   

Screen Shot 2019-12-06 at 3.39.58 PMhttps://i2.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 300w, https://i2.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 1024w, https://i2.wp.com/glennfawcett.wpcomstaging.com/wp-content/uploads/2019/... 768w" sizes="(max-width: 464px) 100vw, 464px" data-recalc-dims="1" />

These tests were re-run while hosting the files on only one of the cluster nodes.  CockroachDB was able to achieve more than 95% the throughput of a true optimal NFS configuration using nodelocal.

Summary

CockroachDB provides multiple cloud and URL methods to import data.  Optimized loading of geo-partitioned data allows for developers to focus on application optimization without worrying about how the data will be loaded. 

I plan to expand on loading and data ingestion with future posts, so please let me know if you have any suggestions or feedback. 

kglLock()+1406<-kglget()+293<-qostobkglcrt1()+498<-qostobkglcrt()+248<-qostobkglcrt2()+412<-qospsis(…

kglLock()+1406<-kglget()+293<-qostobkglcrt1()+498<-qostobkglcrt()+248<-qostobkglcrt2()+412<-qospsis()+2511 <-qospPostProcessIStats()+2765<-qerltFetch()+1544<-qerstFetch()+449<-insdlexe()+364<-insExecStmtExecIniEngine()+1810<-insexe()+2283<-atbugi_update_global_indexes()+1656<-atbFMdrop()+3088<-atbdrv()+7719

Sorry for this title, but that’s exactly the subject: this short stack gives me enough information to understand the issue, reproduce it, open a SR, talk with friends, find a workaround,…

A Friday afternoon story

Here is how this started, on a database just migrated from 11g:

Every hour a job is running, many sessions are blocked on library cache lock. As we are in RAC 19c, The Hang Manager detects this blocking situation and kills the culprit after a while. This was the first time I see it in action in real life. Killing is never good, but that’s better than letting all sessions blocked by one background job. However, it does not resolve the root cause… which, in this case, comes back every hour.

I was on day-off, but I can’t resist looking at those things. I look at the trace file. Not only the Hang Manager kills the blocking session but also dumps a lot of diagnostic information. Finding the needle in a haystack is not so difficult once you have identified where is the haystack: the dump trace contains the call stack which identifies where the issue occurs — which C function in Oracle software. And it also gives some clues about the context and how we got there.

I shared this call stack just in case some friends already encountered this issue:

Now let’s see how I came, within a few minutes, to the idea that it was related to online statistics gathering and global index maintenance. The best source of information about Oracle C function is Frits Hoogland www.orafun.info and you can even copy/paste the stack trace as-is in http://orafun.info/stack/

If the full function is not found here, you can guess from the names, you can search My Oracle Support for bugs related to this function,… and this is what I came with:

  • the hang situation was detected in kglLock(): kernel generic library cache management library cache lock, which is already known as the wait was on “library cache pin”
  • qospsis() is query optimizer statistics related to setting index statistics. That’s interesting. I can also see a call to the statement I’ve seen in the dump trace: dbms_stats.postprocess_indstats(). This is a new feature in 19c where online statistics gathering happens for indexes as well as tables. But, as far as I know, this should occur only on direct-path insert.
  • qospPostProcessIStats() confirms this: setting the index statistics is part of post processing index statistics
  • qerltFetch() is query execute rowsource load table fetch and qerstFetch() is query execute rowsource statistics row source fetch. That looks like an insert from select.
  • insdlexe() is insert direct load execution. This links to the online statistics gathering: direct-path inserts counting inserted rows to set index statistics at the end.
  • atbugi_update_global_indexes() makes the link between indexes and the statement causing this because atb is the prefix for ALTER TABLE functions.
  • atbFMdrop() on google finds a bug about “ORA-600 Hit When Drop of Table Partition” — I’ve no idea why but “FM” looks like related to partitions and we are dropping one. Yes, I found an ALTER TABLE DROP PARTITION in the waiting session cursor for the job that kicks-in every hour.

Ok, that’s far from understanding everything but at least we have a consistent view about what happened: We drop a partition. This has to update global indexes. By some magic, this is done in direct-path. In 19c direct-path does update the index statistics. And probably a bug there leads to a deadlock situation. Note that I’ve no global indexes on this table, but the codepath is there.

1. Workaround to fix the production issue

Thanks to the function name where the problem occurs, I have my workaround

Installing Oracle 19c on Linux

I needed to create a new 19c install yesterday for a test of some customer software and whilst I love Oracle products I have to say that installing the software and database has never been issue free and simple over....[Read More]

Posted by Pete On 06/12/19 At 04:27 PM

Temp space

A question about hunting down the source of the error “ORA-01652 unable to extend temp segment by NNN in tablespace XXX” shows up on the Oracle-L mailing list or the Oracle developer community forum from time to time. In most cases the tablespace referenced is the temporary tablespace, which means the session reporting the error was probably trying to allocate some space for sorting, or doing a hash join, or instantiating a GTT (global temporary table) or a CTE (common table expression / “with” subquery). The difficulty in cases like this is that the session reporting the error might be the victim of some other session’s greed – so looking at what the session was doing won’t necessarily point you to the real problem.

Of course you then run into a further problem tracking down the source of the problem. By the time you hear the complaint (even if it’s only seconds after the error appeared) the session that had been hogging the temporary tablespace may have finished what it was doing, leaving a huge amount of free space in the temporary tablespace and suggesting (to the less experienced and cynical user) that there’s something fundamentally wrong with the way Oracle has been accounting for space usage.

If you find yourself in this situation remember that (if you’re licensed to take advantage of it) the active session history may be able to help.  One of the columns in v$active_session_history is called temp_space_allocated with the slightly misleading description: “Amount of TEMP memory (in bytes) consumed by this session at the time this sample was taken”. A simple query against v$active_session_history may be enough to identify the session and SQL  statement that had been holding the temporary space when the error was raised, for example:


column pga_allocated        format 999,999,999,999
column temp_space_allocated format 999,999,999,999

break on session_id skip 1 on session_serial#

select
        session_id, session_serial#, 
        sample_id, 
        sql_id, 
        pga_allocated,
        temp_space_allocated
from
        v$active_session_history
where
        sample_time between sysdate - 5/1440 and sysdate
and     nvl(temp_space_allocated,0) != 0
order by
        session_id, sample_id
/

All I’ve done for this example is query v$active_session_history for the last 5 minutes reporting a minimum of information from any rows that show temp space usage. As a minor variation on the theme you can obviously change the time range, and you might want to limit the output to rows reporting more than 1MB (say) of temp space usage.

You’ll notice that I’ve also reported the pga_allocated (Description: Amount of PGA memory (in bytes) consumed by this session at the time this sample was taken) in this query; this is just a little convenience – a query that’s taking a lot of temp space will probably start by acquiring a lot of memory so it’s nice to be able to see the two figures together.

There are plenty of limitations and flaws in the usefulness of this report and I’ll say something about that after showing an example of usage. Let’s start with a script to build some data before running a space-consuming query:


rem
rem     Script:         allocate_tempspace.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Nov 2019
rem
rem     Last tested 
rem             19.3.0.0
rem             12.2.0.1
rem

create table t1 as 
select * from all_objects
;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

insert into t1 select * from t1;
commit;

execute dbms_stats.gather_table_stats(null,'t1')

execute dbms_lock.sleep(20)

set pagesize  60
set linesize 255
set trimspool on
set serveroutput off
alter session set statistics_level = all;

with ttemp as (
        select /*+ materialize */ * from t1
)
select 
        /*+ no_partial_join(@sel$2 t1b) no_place_group_by(@sel$2) */ 
        t1a.object_type,
        max(t1a.object_name)
from
        ttemp t1a, ttemp t1b
where
        t1a.object_id = t1b.object_id
group by
        t1a.object_type
order by
        t1a.object_type
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

My working table t1 consists of 16 copies of the view all_objects – so close to 1 million rows in my case – and the query is hinted to avoid any of the clever transformations that the optimizer could use to reduce the workload so it’s going to do a massive hash join and aggregation to report a summary of a couple of dozen rows. Here’s the execution plan (in this case from 12.2.0.1, though the plan is the same for 19.3 with some variations in the numbers).


SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     29 |00:00:10.03 |   47413 |  21345 |  12127 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D665B_E2772D3 |      1 |        |      0 |00:00:01.51 |   28915 |      0 |   9217 |  2068K|  2068K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    989K|    989K|00:00:00.24 |   19551 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     29 |     29 |00:00:08.51 |   18493 |  21345 |   2910 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     15M|     15M|00:00:03.93 |   18493 |  21345 |   2910 |    48M|  6400K|   65M (1)|   25600 |
|   6 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.36 |    9233 |   9218 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.35 |    9233 |   9218 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    989K|    989K|00:00:00.40 |    9257 |   9217 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D665B_E2772D3 |      1 |    989K|    989K|00:00:00.39 |    9257 |   9217 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")

Critically this plan shows us two uses of the temp space but only reports one of them as Used-Tmp. The “hash join” at operation 5 tells us that it reached 65MB of (tunable PGA) memory before going “1-pass”, eventually dumping 25,600 KB to disc. This space usage is corroborated by the 2,910 writes (which, at an 8KB block size, would be 23,280 KB). The missing Used-Tmp, however, is the space taken up by the materialized CTE. We can see that operation 2 is a “load as select” that writes 9,217 blocks to disc (subsequently read back twice – the tablescans shown in operations 7 and 9). That’s  74,000 KB of temp space that doesn’t get reported Used-Tmp.

If we take a look at the plan from 19.3 we see different numbers, but the same “error of omission”:

SQL_ID  1cwabt12zq6zb, child number 0
-------------------------------------

Plan hash value: 1682228242

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name                       | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   1 |  TEMP TABLE TRANSFORMATION               |                            |      1 |        |     25 |00:00:08.15 |   34905 |  13843 |   8248 |       |       |          |         |
|   2 |   LOAD AS SELECT (CURSOR DURATION MEMORY)| SYS_TEMP_0FD9D6624_E259E68 |      1 |        |      0 |00:00:01.26 |   23706 |      0 |   5593 |  2070K|  2070K|          |         |
|   3 |    TABLE ACCESS FULL                     | T1                         |      1 |    907K|    907K|00:00:00.21 |   18024 |      0 |      0 |       |       |          |         |
|   4 |   SORT GROUP BY                          |                            |      1 |     25 |     25 |00:00:06.89 |   11193 |  13843 |   2655 |  6144 |  6144 | 6144  (0)|         |
|*  5 |    HASH JOIN                             |                            |      1 |     14M|     14M|00:00:03.55 |   11193 |  13843 |   2655 |    44M|  6400K|   64M (1)|      23M|
|   6 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.26 |    5598 |   5594 |      0 |       |       |          |         |
|   7 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.25 |    5598 |   5594 |      0 |       |       |          |         |
|   8 |     VIEW                                 |                            |      1 |    907K|    907K|00:00:00.34 |    5595 |   5594 |      0 |       |       |          |         |
|   9 |      TABLE ACCESS FULL                   | SYS_TEMP_0FD9D6624_E259E68 |      1 |    907K|    907K|00:00:00.33 |    5595 |   5594 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1A"."OBJECT_ID"="T1B"."OBJECT_ID")


With slightly fewer rows in t1 (907K vs. 989K) we write 5,593 blocks for the materialized CTE  (instead of 9,217) and spill 2,655 blocks during the hash join (instead of 2,910). But again it’s only the hash join spill that is reported under Used-Tmp. Note, by the way, that the Used-Tmp in 12.2 was reported in KB when it’s reported in MB in 19.3.0.0.

Side note: comparing the number of rows created and blocks written for the CTE, it looks as if 19.3 is using the data blocks much more efficiently than 12.2. There’s no obvious reason for this (though a first guess would be that the older mechanism is to write a GTT with pctfree=10 while the new avoid any free space and transactional details) so, as ever, I now have another draft for a blog note reminding me to investigate (eventually) what differences there are in CTE storage on the upgrade. It’s something that might make a difference in a few special cases.

With the figures from the execution plans in mind we can now look at the results of the query against v$active_session_history. Conveniently the queries took a few seconds to complete, so we’re going to see several rows for each execution.

First the results from 12.2.0.1

SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
        14           22234   15306218 1cwabt12zq6zb       95,962,112            1,048,576
                             15306219 1cwabt12zq6zb       97,731,584           37,748,736
                             15306220 1cwabt12zq6zb      148,194,304           77,594,624
                             15306221 1cwabt12zq6zb      168,117,248           85,983,232
                             15306222 1cwabt12zq6zb      168,117,248           90,177,536
                             15306223 1cwabt12zq6zb      168,117,248           95,420,416
                             15306224 1cwabt12zq6zb      168,117,248           98,566,144
                             15306225 1cwabt12zq6zb      168,117,248          102,760,448
                             15306226 1cwabt12zq6zb      116,933,632          103,809,024
                             15306227 1cwabt12zq6zb      116,933,632          103,809,024
                             15306228 b66ycurnwpgud        8,602,624            1,048,576

I pointed out that we had 25,600 KB reported as Used-Tmp and roughly 74,000 KB unreported – a total of nearly 100,000 KB that is reasonably close to the 103,800,000 bytes reported by ASH. Moreover the timing of the plan (loading the CTE in the first 2 seconds) seems to agree with the growth to 77,590,000 of temp_space_allocated by the time we get to sample_id 15306220 in ASH. Then we have several seconds of slow growth as the hash join takes place and feeds its resulte up to the sort group by. At the end of the query we happen to have been lucky enough to catch one last sample just before the session had released all its temp space and ceased to be active.  (Note: however, that the sql_id at that sample point was not the sql_id of our big query – and that’s a clue about one of the limitations of using ASH to find the greedy SQL.)

We see the same pattern of behaviour in 19.3.0.0:


SESSION_ID SESSION_SERIAL#  SAMPLE_ID SQL_ID           PGA_ALLOCATED TEMP_SPACE_ALLOCATED
---------- --------------- ---------- ------------- ---------------- --------------------
       136           42767    2217500 1cwabt12zq6zb      143,982,592           46,137,344
                              2217501 1cwabt12zq6zb      193,527,808           54,525,952
                              2217502 1cwabt12zq6zb      193,527,808           57,671,680
                              2217503 1cwabt12zq6zb      193,527,808           61,865,984
                              2217504 1cwabt12zq6zb      197,722,112           67,108,864
                              2217505 1cwabt12zq6zb      150,601,728           70,254,592
                              2217506 1cwabt12zq6zb      150,601,728           70,254,592

We start with an almost instantaneous jump to 46MB of temp_space_allocated in the first second of the query – that’s the 5,593 blocks of the CTE being materialized, then the slow growth of temp space as the hash join runs, spills to disc, and passes its data up to the sort group by. Again we can see that the peak usage was the CTE (46MB) plus the reported spill of 23MB (plus rounding errors and odd bits).

Preliminary Observations

Queries against ASH (v$active_session_history) can show us sessions that were holding space in the temporary tablespace at the moment a sample of active sessions was taken. This may allow us to identify greedy sessions that were causing other sessions to fail with ORA-01652 (unable to allocate temp segment).

We have seen that there is at least one case where we get better information about temp space allocation from ASH than we do from the variants on v$sql_plan that include the SQL Workarea information (v$sql_workarea, v$sql_workarea_active) because the space acquired during materialization of CTEs is not reported as a “tunable SQL workarea” but does appear in the ASH temp_space_allocated.

At first sight it looks as if we may be able to use the query against ASH to identify the statement (by sql_id) that was the one being run by the greedy session when it consumed all the space. As we shall see in a further article, there are various reasons why this may over-optimistic, however in many cases there’s a fair chance that when you see the same sql_id appearing in a number of consecutive rows of the report then that statement may be the thing that is responsible for the recent growth in temp space usage – and you can query v$sql to find the text and call dbms_xplan.display_cursor() to get as much execution plan information as possible.

Further questions

  • When does a session release the temp_space_allocated? Will the space be held (locked) as long as the cursor is open, or can it be released when it is no longer needed? Will it be held, but releasable, even after the cursor has (from the client program’s perspective) been closed?
  • Could we be fooled by a report that said a session was holding a lot of space when it didn’t need it and would have released it if the demand had appeared?
  • Under what conditions might the temp_space_allocated in an ASH sample have nothing to do with the sql_id reported in the same sample?
  • Are there any other reasons why ASH might report temp_space_allocated when an execution plan doesn’t?
  • Is temp_space_allocated only about the temporary tablespace, or could it include informatiom about other (“permanent”) tablespaces ?

Stay tuned for further analysis of the limitations of using v$active_session_history.temp_space_allocated to help identify the srouce of a space management ORA-01652 issue.