Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Announcement: “Oracle Performance Diagnostics and Tuning” Webinar – 20-23 August 2019.

I have just scheduled a new “Oracle Performance Diagnostics and Tuning” webinar to run between 20-23 August 2019, specifically for my friends in New Zealand, but is open to anyone. It will run between 7am-11am AEST each day, which is perfect for those in NZ, but also other parts of the world such as Western […]

Statistics on Load

One of the talks I gave recently on the Groundbreaker Yatra tour was on the changes that arrive with the optimizer when you move to 12c and above. We talked about changes to histograms, all of the adaptive features etc, but also, one of the simplest but coolest optimizer improvements when it comes to loading data. When you perform a direct-path insert into an empty table, or a create-table-as-select (CTAS) operation, there is no need to perform an additional DBMS_STATS call to gather statistics at the conclusion of the load. The load process itself handles that for you. Here’s a simple example of that in action


SQL> create table t (id primary key, tag)
  2  as select rownum id, to_char(rownum) tag
  3  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------
     50000

1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------
        103

1 row selected.

Of course, many people are now aware of this functionality, so why does it warrant a blog post? Mainly to make readers aware of a boundary case. If the table you are creating or loading is an index-organized table (IOT), then you do not get the statistics gathered automatically.


SQL> drop table t purge;

Table dropped.

SQL>
SQL> create table t (id primary key, tag)
  2  organization index
  3  as select rownum id, to_char(rownum) tag
  4  from dual connect by level <= 50000;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T';

  NUM_ROWS
----------


1 row selected.

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T';

LEAF_BLOCKS
-----------


1 row selected.

Initially I suspected that the reason for this is that since the statistics that are being picked up on the fly are being collected on the source data, it only becomes possible for those statistics to be transposed if the target table structure is the same as the source. But if that were (strictly) the case, then other examples would exhibit a similar restriction, such as going from compressed data to uncompressed or vice versa. But you can see that the CTAS on load statistics are fine in this circumstance:


SQL> create table t1 (id primary key, tag)
  2  compress
  3  as select * from t;

Table created.

SQL>
SQL> select num_rows from user_tables
  2  where table_name = 'T1';

  NUM_ROWS
----------
     99999

SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T1';

LEAF_BLOCKS
-----------
        208

Similarly, even if I am copying from an identical index-organized table, the statistics will not be collected.


SQL> create table t2 (id primary key, tag)
  2  organization index
  3  as select * from t;

Table created.

SQL> select num_rows from user_tables
  2  where table_name = 'T2';

  NUM_ROWS
----------


SQL>
SQL> select leaf_blocks
  2  from user_indexes
  3  where table_name = 'T2';

LEAF_BLOCKS
-----------

Whatever the root cause is, just be aware, that if you are direct-path loading IOTs, then make sure you take the time to to also collect statistics on that table before you start querying it in earnest.

Oracle 19c Automatic Indexing: Configuration (All I Need)

In this post, I’ll discuss how to configure the new Automatic Indexing capability introduced in Oracle Database 19c. The intent of Oracle here is to make the configuration of Automatic Indexing as simplistic as possible, with as few levers for DBAs to potentially stuff up as possible. The ultimate goal would be to have a […]

PostgreSQL “bind variable peeking”

custom vs. generic plans

With my Oracle database background, I know how important it is to balance the pros and cons of using bind variables (parameterized queries) vs. literals.

  • Bind variables have the advantage to share the query execution plan with several executions so that there’s no need to compile the SQL statement (parse and optimize) each time
  • Bind variables have the inconvenient to share the query execution plan with the risk that what is optimized for one value is bad for another set of values

This post is about PostgreSQL prepared statements that contain parameters, showing when the execution plan is optimized for those parameter values. For people with an Oracle experience, I start to quickly explain the Oracle database behavior in order to emphasize the difference. Basically, there are two possibilities. The plan can be compiled without any knowledge of the values for which it will be executed. Or the plan can be optimized for a chosen fixed value (called peeked bind in Oracle or sniffed parameter in SQL Server, or custom plan in PostgreSQL).

Oracle

Until Oracle 9iR2 the execution plan was optimized before any binding or execution. From 9iR2, except when “ _optim_peek_user_binds” is set false, the optimization is deferred to the first execution and the plan will be optimized for it. It is then expected to have better estimations, especially with histogram statistics. But the future executions of the same cursor will re-use the same plan, and that can be bad if the data distribution is different for those values. In 11g, Oracle tried to prevent this in some cases where the optimizer can decide to optimize it again with the new execution value (Adaptive Cursor Sharing). But that is about cursor sharing and the goal of this post is to compare with PostgreSQL which has no cursor sharing. The only way to share a cursor for multiple executions is by preparing it — and the limitation is that it can be shared in the same session only: there is no shared pool in PostgreSQL.

PostgreSQL

With PostgreSQL, there is nothing like Oracle cursor sharing to avoid the expensive compilation at each execution. Trying to avoid that requires to prepare the statements. But even there, there is no immediate re-use of the execution plan:

  • The first 5 executions will compile the query plan for each execution — aiming at an accurate plan as if the values were passed as literals.
  • From the 6th execution, a generic plan can be reused by all future executions of the same prepared statement (in the same session). But no ‘peeked bind’ is chosen. The optimizer chooses a plan optimized without the knowledge of the parameter values.

Here is the documentation about it:

PostgreSQL: Documentation: 11: PREPARE

I’ve run a little demo to show this in db<>fiddle:

Postgres 12 Beta | db fiddle

This runs with the following table created and analyzed:

create table DEMO as select 1 n from generate_series(1,11)
union all select 2 from generate_series(1,22)
union all select 3 from generate_series(1,33)
union all select 4 from generate_series(1,44)
union all select 5 from generate_series(1,55)
union all select 6 from generate_series(1,66)
union all select 7 from generate_series(1,77)
union all select 8 from generate_series(1,88)
;
select count(*),count(distinct n) from DEMO;
 count | count
-------+-------
396 | 8
analyze DEMO;
ANALYZE

I’ve created rows with skewed distributions: 11 rows with value ‘1’, 22 rows with value ‘2’,… up to 88 rows with value ‘8’. And I analyzed the table so that histograms are gathered to have the query planner aware of it:

\x
Expanded display is on.
select * from pg_stats where tablename = 'demo' and attname = 'n';
-[ RECORD 1 ]----------+------------------------------------------
schemaname | public
tablename | demo
attname | n
inherited | f
null_frac | 0
avg_width | 4
n_distinct | 8
most_common_vals | {8,7,6,5,4,3,2,1}
most_common_freqs | {0.2222,0.1944,0.1667,0.1389,0.1111,0.08333,0.05556,0.02778}
histogram_bounds |
correlation | 1
most_common_elems |
most_common_elem_freqs |
elem_count_histogram |
\x
Expanded display is off.

No surprises. I have 396 rows, then the 88 rows with value ‘8’ have a density of 88/396=0.222 and so on.

I prepare my statement:

prepare myselect (int) as select count(*) from DEMO where n=$1;
PREPARE

And run this statement several times with different values, the goal being to look at the query planner estimation for the number of rows, for each execution:

postgres=# explain execute myselect(1);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=6.98..6.99 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=11 width=0)
Filter: (n = 1)
postgres=# explain execute myselect(2);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.00..7.01 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=22 width=0)
Filter: (n = 2)
postgres=# explain execute myselect(3);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.03..7.04 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=33 width=0)
Filter: (n = 3)
postgres=# explain execute myselect(4);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.06..7.07 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=44 width=0)
Filter: (n = 4)
postgres=# explain execute myselect(5);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.09..7.10 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=55 width=0)
Filter: (n = 5)
postgres=# explain execute myselect(6);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.08..7.08 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=50 width=0)
Filter: (n = $1)
postgres=# explain execute myselect(7);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.08..7.08 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=50 width=0)
Filter: (n = $1)
postgres=# explain execute myselect(8);
QUERY PLAN
-----------------------------------------------------------
Aggregate (cost=7.08..7.08 rows=1 width=8)
-> Seq Scan on demo (cost=0.00..6.95 rows=50 width=0)
Filter: (n = $1)

As documented, the first 5 executions used a custom plan, getting the estimations from the parameter value. The estimation is exact (11,22,33,44,55) because I have frequency histograms for those values. But starting with the 6th execution, a generic plan was used to avoid to parse at each execution. This plan does not look at the parameter values (we see “$1” in the filter predicate) and does not re-use any of the previous plans. It is estimated with an average selectivity as if there were no histograms. Here the estimation (rows=50) comes from the average number of rows:

select count(n),count(distinct n),
count(n)::numeric/count(distinct n) "avg rows per value" from DEMO;
 count | count | avg rows per value
-------+-------+---------------------
396 | 8 | 49.5000000000000000

Cost decision for generic plan vs. custom plan

The documentation mentions that the cost of the generic plan is compared to the average of the custom plans.

https://doxygen.postgresql.org/plancache_8c.html

This is to avoid that a generic plan is more expensive, as it would be better to spend additional parse time for a custom plan. Besides the documentation, this is “arbitrary” 5 executions and the average custom cost are mentioned in plancache.c

In a previous tweet to explain this, I’ve put an example (3rd screenshot) for this cost decision to keep custom plans:

Here is another example with a partitioned table. Partition pruning can occur at parse time or execution time. But when the partition is not known at parse time, the query planner estimates the cost and cardinality without partition pruning. Here is the full example:

Postgres 12 Beta | db fiddle

set plan_cache_mode=auto

The query planner continues with the custom plan even after the 5th execution because the cost of the generic plan, estimated on the whole table, is higher than the custom plan, even including the parse time.

Forcing generic plan

In order to show the cost of the generic plan, I’ve run it on PostgreSQL 12 where we can control the custom vs. generic choice with PLAN_CACHE_MODE. The default is AUTO (the behavior described above) but we can also set it to FORCE_CUSTOM_PLAN or FORCE_GENERIC_PLAN.

Here, I have forced the generic plan:

Postgres 12 Beta | db fiddle

set plan_cache_mode=force_generic_plan

The filter is parameterized, but only DEMO7 has been read. The “Subplans Removed” is the execution-time partition pruning. However, the cost and cardinality did no estimations for this: (rows=396) is the total number of rows. And for this reason, the AUTO plan cache mode did not choose it because it is much higher than the average cost of the custom plans encountered before. I would have preferred that the query planner estimates an average cost for partition pruning rather than ignoring it for the generic plan cost estimation, because it knows, from the table definition, that only a single partition needs to be read, even if it doesn’t know which one yet.

Forcing custom plan

Back to the non-partitioned table from the first example in this post, where the generic plan was chosen after the 5th execution, I force the custom plan:

Postgres 12 Beta | db fiddle

show plan_cache_mode
set plan_cache_mode=force_custom_plan

This forces the custom plan even after the 5th execution.

Plan cache modes and Oracle equivalents

In PostgreSQL 12 which is still in beta, but will be released soon for production, we have now three choices:

  • Use a custom plan, like when using literals, but with the advantages of parameters (to avoid SQL injection for example). Perfect for ad-hoc queries. There is no easy equivalent in Oracle (I would love a hint to say: do not share this and find the best plan for these values).
  • Use a generic plan, for high throughput performance (avoid to parse each execution) and stability (the plan will be always the same). This is the equivalent of the Oracle “ _optim_peek_user_binds”=false setting that is used often in old OLTP applications, or like when no histograms are gathered.
  • The “auto” method (the only one before version 12) where a prepared statement starts with a custom plan but may switch to a generic one after several executions. Oracle has no equivalent for prepared statements which always use the plan determined by the first execution values.

Sharing execution plans across sessions, or not?

While there, I add a few thoughts about cursor sharing between sessions (although this post was about prepared statements within the same session).

  • Most commercial databases I know (Oracle, DB2, SQL Server) have a shared statement cache to avoid that one session re-parses a SQL statement already parsed by another session recently. Like C or Java programs that are compiled once, and then the binary code is executed.
  • And most open source databases I know (PostgreSQL, MySQL) do not share the execution plans across the sessions. Like Perl or Python, that are compiled when loaded, each time they are executed.

Both approaches have pros and cons:

  • More memory space and latch management issues when shared
  • More CPU time spent in parsing when all cursors are private

I think that there’s no bad or good approach, but it is related to their nature:

  • Commercial databases have many optimizer transformations which increase the parsing time, in order to get acceptable response time even for very bad designed software where SQL cannot be changed (because it is a commercial application). Vendors have to please the customer who says “I cannot change the application code”. Hard parses must be rare to keep this scalable. And people care about the CPU consumption as it is the most common metric for licensing
  • Open Source databases do not go to all those transformations. The answer to bad application design is to re-write the bad queries. That’s something an Open Source community can say, especially when the application itself is homemade or open source. There’s no commercial constraint to workaround this in the database, and parse time can be kept low. And on current hardware, there’s no problem to use more CPU when it avoids having to serialize the access to shared memory structures.

With my Oracle background, I tend to think that non-shared execution plans can be a limit for PostgreSQL evolution. We will see how the parse time will increase when dealing with complex joins, partitioning and parallel queries. But not sharing also solves many problems: my Oracle memories is full of fragmented shared pools, volatile data and bind peeking plan instabilities…

I’ll be happy to get feedback on that on Twitter:

Oracle & Postgres JDBC Fetch Size

TL;DR — By default PostgreSQL ResultSet is not a server cursor but a client buffer. The whole result is fetched before the first next() call.

It is usually a good idea to start with default settings rather than trying to ‘tune’ any possible parameter. Defaults are often suited to the most common cases. However, there’s a big exception to it when using JDBC drivers: the default fetch size is probably not the right setting. Here is an example with the same Java code to show the fetching behavior in Oracle and PostgreSQL.

Java example

Here is my Java code which is exactly the same for Oracle and PostgresSQL

import java.io.*;
import java.sql.*;
import java.util.*;
import java.time.*;
import oracle.jdbc.*;
public class JDBC {
public static void println(String text){
System.out.println(Instant.now().toString() +":" + text);
}
public static void main(String[] args)
throws SQLException,InterruptedException {
try (Connection c = (Connection) DriverManager.getConnection(
args[2],args[0],args[1])
// url, user, password
) {
c.setAutoCommit(false);
try (PreparedStatement s = c.prepareStatement(
"select n,mytime(n) t from demo"
)
) {
//s.setFetchSize(5);
Thread.sleep(1000);
println(" PRS "+s);
try ( ResultSet rs = s.executeQuery() ) {
Thread.sleep(1000);
println(" EXE "+rs);
while ( rs.next()) {
Thread.sleep(1000);
println(" FCH "+rs.getLong("N")+" "+rs.getTimestamp("T"));
}
System.out.println("fetch size: "+rs.getFetchSize());
}
}
}
}
}

This is very simple. The user, password and JDBC url are passed as arguments. The DriverManager finds the right driver for it (ojdbc8.jar and postgresql-jdbc.jar are in my CLASSPATH).

I prepare a simple select which I execute and fetch the rows. The query calls a “mytime()” stored function that returns the execution time timestamp, and waits one second for each call. The goal is to simulate a long query and show when the rows are actually read and when they are fetched.

Oracle JDBC Fetch Size

In order to run it on Oracle I create the following table and function:

alter session set current_schema=demo;
create or replace function mytime(n number) return timestamp as
begin
dbms_session.sleep(1);
return current_timestamp;
end;
/
show errors
create table demo (n int);
insert into demo
select to_number(column_value) N from xmltable('1000 to 1020');
commit;

The function is non-deterministic, and then it is executed for each row.

$ORACLE_HOME/jdk/bin/java JDBC "demo" "demo" "jdbc:oracle:thin:@(DESCRIPTION=(CONNECT_DATA=(SERVICE_NAME=PDB1))(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521)))"

Here is the result. Each fetch line shows the display timestamp first, and then the timestamp from the MYTIME() function which is the read time.

The Java program waits 1 second after each call: prepared at 18:33:10, executed at 18:33:21 — that’s a 10 second execution time. Then I fetched one row every second from 13:33:22 to 18:33:53:

With Oracle, the default JDBC fetch size is 10 rows, and the EXECUTE call also does the first fetch. Then here is the timing:

  • at 18:33:10 the prepareStatement is a very fast PARSE call as the actual optimization is deferred to the first execution.
  • at 18:33:22 the EXECUTE and first FETCH of the cursor occurs. Here it takes 10 seconds to get the first 10 rows. Those rows were read between 18:33:12 and 18:33:21, then fetched, then displayed between 18:33:22 and 18:33:31.
  • Then, the second FETCH call has read 10 more rows from 18:33:32 to 18:33:40 and displayed them from 18:33:42 to 18:33:51
  • Then one row remain: read, fetched and displayed.

PostgreSQL default JDBC Fetch Size

Now I’ll run the same on a PostgreSQL database. In PostgreSQL, current_timestamp is consistent with the query time, then I use clock_timestamp() to get the actual read time.

\connect demo postgres
create table demo(n int);
insert into demo
select generate_series from generate_series(1000,1020);
create function mytime(n int) returns timestamp as
$$
begin
perform pg_sleep(1);
return clock_timestamp();
end;
$$
language plpgsql;

Here is the run with the same Java code:

\! $ORACLE_HOME/jdk/bin/java JDBC "postgres" "demo" "jdbc:postgresql://localhost:5432/demo"

We don’t see any read-fetch pattern here and we have a clue with the fetch size that is set to zero by default. Rows were read from 18:36:07 to 18:36:27 and that matches with the executeQuery() time. All rows were read, and buffered, before displaying the first one. That’s the default behavior: the JDBC ResultSet is not a cursor but a buffer.

PostgreSQL JDBC set Fetch Size

In the Java code above, there’s a setFetchSize() in comments.

Here, I remove the comment to explicitly set the fetch size to 5 (for the demo, in real life you rather set 100 or 1000 to avoid too many roundtrips and context switches).

try (PreparedStatement s = c.prepareStatement(
"select n,mytime(n) t from demo"
)) {
s.setFetchSize(5);

Here is the execution:

$ORACLE_HOME/jdk/bin/java JDBC "postgres" "demo" "jdbc:postgresql://localhost:5432/demo"

There, we can see 5 rows read from 19:13:56 to 19:14:00 during the executeQuery() first FETCH, displayed from 19:14:02 to 19:14:06 and then again the same with 5 rows…

In summary

The behavior of the application can be really different.

The Oracle default has several advantages:

However, it has also some drawbacks:

  • we can get an ORA-1555 if we read slowly and the MVCC retention has expired
  • we waste time in roundtrip and context switch latencies if we forget to increase the fetch size
  • we can get an execution error after having already processed some rows

Note that if there is a blocking operation, like an ORDER BY that has to read the whole rows before returning the first one, the whole result is read and buffered on the server anyway.

PostgreSQL default seems to be more intuitive for today’s developers who are more used to lists than cursors. But this may change with the trend of streaming. The good thing is that you can choose. And anyway, in both case, the default (all rows for PostgreSQL, 10 rows for Oracle) is probably something to change in many cases.

Note that I’ve set AutoCommit to false, and this is required for the PostgreSQL cursor behavior when setting a non-zero fetch size, or it still buffers the whole rows. Anyway, that’s probably the worst JDBC default: never leave Auto Commit to true.

Friday Philosophy – Brexit

I don’t really do politics on this blog, it’s often just too damned divisive. But not only am I angry (and vicariously ashamed) of Brexit but I have a strong suspicion of how things are going to go from now…

https://mwidlake.files.wordpress.com/2019/07/img_3921.jpg?w=710 710w, https://mwidlake.files.wordpress.com/2019/07/img_3921.jpg?w=113 113w, https://mwidlake.files.wordpress.com/2019/07/img_3921.jpg?w=225 225w" sizes="(max-width: 355px) 100vw, 355px" />

I’ll lay my cards on the table first. I did not vote for Brexit. Like the vast majority of people I get on well with in the UK, I wanted to remain part of Europe. Half of my anger with Brexit is that I feel there should never have been a public vote in the first place, for three main reasons:

  1. It has been heavily speculated that the issue of us remaining part of Europe was offered as a public vote as the Conservative party wanted to shut down the growing popularity of the more right-wing, xenophobic parties such as UKIP. Thus it was a waste of time, money, and effort to prove a point that I think could have been done in other ways. There was never any expectation by the people who instigated the referendum that a large percentage of the population would vote for leaving…
  2. Whether we are better off being within the EU and what we lose/gain from it is a very complex issue. I’d say 99% of the population knew nothing like enough about it to make a sensible decision. I think I understood more about the influence of the EU on us than the majority of people in the UK. This comes from me having an interest in environmental matters, workers rights, health & safety, and control of big business. An awful lot of our legislation in these areas came from the EU and were good for the majority and poor for the rich and powerful. However, I don’t think I had enough knowledge to make an informed decision, it was more a gut decision. And the political fight over the vote was almost devoid of sense, reason, even honesty, and was more a campaign based on fear, uncertainty, and doubt. It was a vote by the ill-informed on the ill-defined.
  3. The final reason is that our media and politicians have used the EU as a “distant enemy” to blame or ridicule for decades. It’s almost a national hobby. We had stupid stuff like claims the EU said we had to sell straight bananas or that barmaids would have to completely cover their bosoms. Neither were true. But there has been a consistent blaming of the EU for things that UK politicians thought would be unpopular or that the tabloids felt they could sell copy on due to outrage. It’s just like how businesses blame “the computer system” for anything that goes wrong (“Computer says No! – *cough* “) whereas in fact it’s often nothing to do with the computer system. Thus the EU already had an unfair bad press due to all this political cowardice and media tripe. In many respects, we reaped the crop grown from the seeds of our own stupidity.

Anyway, we had the vote, it was really tight, it gave “the wrong” result. And it seems that far more people have swap from “let’s leave” to “let’s stay” than the other way around, when they got a better understanding of the impact – but we are not getting a second vote. That is despite 6 million plus people signing a petition for a second vote and the biggest public protest march we have ever had in this country.

So what do I think will happen from here on in?

Something I have expected to happen for a couple of years now, but has not really, is an attempt by UK politicians to start trying to blame the EU itself for the mess the UK are in. Basically to start saying “Well, we could leave the EU and it would all be fine – but the EU are being mean to us! The EU won’t let us do X or let us have Y! Because they hate us now and they are not being fair!!!”. We are going to see an escalating number of occurrences where the Conservatives will tell us that the EU negotiators have blocked an utterly reasonable suggestion or are making demands that threaten our sovereignty, or are trying to control us. They will announce that the EU is trying to stop us being the great nation we know we are. I fear that Boris will start turning to Trump to be our best friend in the playground.  

From what I have seen so far, I think the rest of the EU have basically been “You want to leave? Are you mad? OK, if you wish, off you go. But I’m sorry, if you are leaving the club you no longer get the club discount at the shops, you no longer have access to the club house, and you don’t get any say in the club rules. And yes, you do have to pay your outstanding club membership until you actually leave.” Which is all very, very reasonable and, if tables were turned, it is what we in the UK would be doing.

I predict that from here until Boris and the Tories do whatever they do in respect of our fundamentally xenophobic “we are still a mighty empire and are too good for you” walking off in a huff, more and more they are going to try and blame the innocent party, the EU. We are going to hear endless stuff about how they won’t be reasonable in negotiations and are bullying us. I don’t think the EU will do that, but really it’s what we actually deserve for our childish behaviour.

End of Rant

sqlldr, direct path loads and concurrency in 12.2 and later

In my previous post I showed you that Oracle’s SQL loader (sqlldr) utility has a built-in timeout of 30 seconds waiting for locked resources before returning SQL*Loader-951/ORA-604/ORA-54 errors and failing to load data. This can cause quite some trouble! Before showing you the enhancement in 12.2 and later, here is the gist of the previous post.

Concurrency in Oracle sqlldr 12.1 and earlier

To show you how sqlldr times out I need to simulate an exclusive lock on the table in sqlplus for example. That’s quite simple:

SQL> set time on
10:17:00 SQL> lock table t2 in exclusive mode;

Table(s) Locked.

10:17:07 SQL>

Next I started a sqlldr process in another session. Please refer to the previous post for details, or take my word that I’m using a direct path insert strategy. The only difference is the size of the input file – I had to inflate it considerably to buy some time running standard diagnostic tools:

$ date; sqlldr /@loader control=t2_2.ctl ; date
Tue 23 Jul 10:22:35 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Tue Jul 23 10:22:35 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
Tue 23 Jul 10:23:06 BST 2019

The “date” commands reveal a timeout of 30 seconds. Setting ddl_lock_timeout has no bearing on the timeout: the database is waiting on kpodplck wait before retrying ORA-54:

10:22:36 SQL> r
  1* select event, state from v$session where program like 'sqlldr%'

EVENT                                                            STATE
---------------------------------------------------------------- -------------------
kpodplck wait before retrying ORA-54                             WAITING

1 row selected.

This was where I left off with the previous post until I noticed there is another option!

Oracle 12.2 and later

In 12.2 and later you can instruct sqlldr to wait until the lock is released. There is a new parameter named direct_path_lock_timeout:

$ sqlldr | egrep 'Version|direct_path'
Version 19.4.0.0.0
direct_path_lock_wait -- wait for access to table when currently locked  (Default FALSE)

Interestingly there are no hits for direct_path_lock_wait in My Oracle Support’s knowledgebase. There are merely a few documentation references. So what does this parameter do? While the table is still locked in exclusive mode, let’s start the sqlldr process with the new option:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
[ ... waiting ... ]

The process now sits there and waits … and it does so for more than 30 seconds. And instead of kpodplck wait before retrying ORA-54 it waits on … drums please: enq: TM contention!

10:20:11 SQL> select seq#, event, state, round(seconds_in_wait/60, 2) mins_waiting
10:20:21   2  from v$session where program like 'sqlldr%';

      SEQ# EVENT                    STATE               MINS_WAITING
---------- ------------------------ ------------------- ------------
       119 enq: TM - contention     WAITING                     6.53

10:20:26 SQL> 

This is great news if your sqlldr processes compete for TM enqueues and your load process takes a little longer than the previously hard coded timeout of 30 seconds. The process eventually completed successfully after the enqueue was released:

$ date; sqlldr /@loader control=t2_2.ctl direct_path_lock_wait=true; date
Fri 26 Jul 10:13:54 BST 2019

SQL*Loader: Release 19.0.0.0.0 - Production on Fri Jul 26 10:13:54 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct

Load completed - logical record count 950936.

Table T2:
  950924 Rows successfully loaded.

Check the log file:
  t2_2.log
for more information about the load.
Fri 26 Jul 10:21:32 BST 2019

Caveats

As with every pro, there are usually cons associated. The downside to waiting (for a potentially very long time) is that you might not notice load processes beginning to stack up unless proper instrumentation and monitoring are in place. Waiting too long for data to be loaded is equally bad as not loading at all because the end result is identical. As with many features in the database Oracle gives you plenty of options, and it’s up to the developers and architects to make the correct decisions on how to use them.

Summary

Beginning with sqlldr 12.2 Oracle introduced the option to wait for enqueues on segments to be released instead of aborting after 30 seconds.

In the next post I’ll write about another possibility to prevent exclusive table locks in the first place when running multiple concurrent sqlldr sessions.

sqlldr, direct path loads and concurrency in 12.1 and earlier

I have recently come across an interesting issue related to concurrent data loading into the Oracle database using sqlldr’s direct path mode. Although I investigated the situation on 12.1.0.2, I found that the same holds true in 19.4 as well when using the defaults. I reconstructed the case, although it is simplified a little to bring the point home.

The environment I used to put this post together is Oracle Restart 19.4.0 on Oracle Linux 7.6.

Test overview

For this test I am running concurrent sqlldr sessions to demonstrate the case. I am conscious of that fact that I could have used external tables, but then I wouldn’t have been able to write this post :)

Assume there’s a table named t2:

SQL> create table t2 as select * from dba_objects where 1 = 0;

Table created.

I am planning on populating the table with data in CSV format. Using sqlcl it is dead easy to create an input file, simply specify the /*csv*/ hint in your query and spool the output to a file: job done. The resulting input file is named t2.dat.

Next up is the control file. I am a creature of habit and tend to use a control file although I could have tried the same process using express mode.

Using the documentation I figured the following control file should do. Since it’s easy enough to forget passing direct=true to the command line I added it to the options clause. When using sqlcl to create a CSV file it adds the column headings in line 1, and they won’t load so I’m skipping these records. Here is the full control file:

options (direct=true, skip=1)
load data 
infile 't2.dat'
into table t2
append
fields csv without embedded 
date format 'dd-mon-yy'
trailing nullcols
(
OWNER,                                                                          
OBJECT_NAME,                                                                    
SUBOBJECT_NAME,                                                                 
OBJECT_ID,                                                                      
DATA_OBJECT_ID,                                                                 
OBJECT_TYPE,                                                                    
CREATED date,                                                                        
LAST_DDL_TIME date,                                                                  
TIMESTAMP,                                                                      
STATUS,                                                                         
TEMPORARY,                                                                      
GENERATED,                                                                      
SECONDARY,                                                                      
NAMESPACE,                                                                      
EDITION_NAME,                                                                   
SHARING,                                                                        
EDITIONABLE,                                                                    
ORACLE_MAINTAINED,                                                              
APPLICATION,                                                                    
DEFAULT_COLLATION,                                                              
DUPLICATED,                                                                     
SHARDED,                                                                        
CREATED_APPID,                                                                  
CREATED_VSNID,                                                                  
MODIFIED_APPID,                                                                 
MODIFIED_VSNID
)

With the prerequisites at hand I’m ready to perform some testing.

Test #1: concurrent sessions using defaults

First of all, what happens when starting 5 concurrent sessions? Will they complete? Let’s try this in bash:

for i in $(seq 1 5) ; do
  echo starting iteration $i
  sqlldr /@loader control=t2.ctl log=session_${i}.log &
done

This ran to completion without any visible errors, but I prefer to run sanity checking anyway. The first thing to do is to see if all records have been loaded. The input file contains 73148 data records by the way.

$ grep 'Rows successfully loaded' session*log
session_1.log:  73148 Rows successfully loaded.
session_2.log:  73148 Rows successfully loaded.
session_3.log:  73148 Rows successfully loaded.
session_4.log:  73148 Rows successfully loaded.
session_5.log:  73148 Rows successfully loaded.

So this looks ok, what about the run times?

$ grep '^Run' *.log
session_1.log:Run began on Mon Jul 22 21:32:31 2019
session_1.log:Run ended on Mon Jul 22 21:32:44 2019
session_2.log:Run began on Mon Jul 22 21:32:31 2019
session_2.log:Run ended on Mon Jul 22 21:32:40 2019
session_3.log:Run began on Mon Jul 22 21:32:31 2019
session_3.log:Run ended on Mon Jul 22 21:32:46 2019
session_4.log:Run began on Mon Jul 22 21:32:31 2019
session_4.log:Run ended on Mon Jul 22 21:32:48 2019
session_5.log:Run began on Mon Jul 22 21:32:31 2019
session_5.log:Run ended on Mon Jul 22 21:32:42 2019

All in all these loads completed in a reasonably short time. Some took longer to finish than others though. That made me curious and I logged into the database to see what was going on. I never consciously saw this event:

SQL> select sid, event, state from v$session where program like 'sqlldr%';

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        44 kpodplck wait before retrying ORA-54                             WAITING
        52 kpodplck wait before retrying ORA-54                             WAITING
        54 kpodplck wait before retrying ORA-54                             WAITING
       290 Data file init write                                             WAITING
       291 kpodplck wait before retrying ORA-54                             WAITING

SQL> 

Using the output from my above query I could see that one session was active, all the others queued up behind it. Interestingly there is no “enq: TM – contention” which you’d see with direct path inserts in sqlplus for example:

-- in session 1
SQL> insert /*+ append */ into t2 select * from dba_objects;

73157 rows created.

-- notice there is no commit! 

-- Session 2 waits ...
SQL> insert /*+ append */ into t2 select * from dba_objects;

-- the DBA sees it all:
SQL> select sid, event, state from v$session where username = 'MARTIN'

       SID EVENT                                                            STATE
---------- ---------------------------------------------------------------- -------------------
        51 enq: TM - contention                                             WAITING
       270 SQL*Net message from client                                      WAITING

What’s even more interesting is this: ddl_lock_timeout which I thought might have helped, has no effect on sqlldr operations:

SQL> show parameter ddl_lock_timeout

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
ddl_lock_timeout                     integer     0
SQL> 

The parameter isn’t set at all, yet I could see “kpodplck wait before retrying ORA-54” which is strange. ORA-54 is of course the dreaded “resource busy and acquire with nowait specified” error. I managed to trace kpodp back to the direct load path functionality.

Without the ability to control waiting via a database parameter I am aware of, there must be some other timeout. And sure enough, there is:

$ time sqlldr /@loader control=t2.ctl

SQL*Loader: Release 19.0.0.0.0 - Production on Mon Jul 22 21:46:59 2019
Version 19.4.0.0.0

Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved.

Path used:      Direct
SQL*Loader-951: Error calling once/load initialization
ORA-00604: error occurred at recursive SQL level 1
ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired

real    0m32.038s
user    0m0.011s
sys     0m0.011s

It appears as if sqlldr times out waiting for a locked resource after round about 30 seconds, regardless of ddl_lock_timeout.

And this is the end of the story if you aren’t on 12.2 or later. In the next post I’ll show you how you can work around this problem with more current releases.

Oracle 19c Automatic Indexing: Methodology Introduction (After Today)

For the past month or so I’ve been playing around extensively with the new Oracle 19c “Automatic Indexing” feature, so I thought it was time to start blogging about it. Considering it’s only in “Version 1” status, my initial impression is very positive in that it works extremely well doing at what it’s initially designed […]

Sometimes the Simplest Things Are the Most Important

I didn’t ride in First Class a lot during my career, but I can remember one trip, I was sitting in seat 1B. Aisle seat, very front row. Right behind the lavatory bulkhead. The lady next to me in 1A was very nice, and we traded some stories.

I can remember telling her during dinner, wow, just look at us. Sitting in an aluminum tube going 500 miles per hour, 40,000 feet off the ground. It’s 50º below zero out there. Thousands of gallons of kerosene are burning in huge cans bolted to our wings, making it all go. Yet here we sit in complete comfort, enjoying a glass of wine and a steak dinner. And just three feet away from us in that lavatory right there, a grown man is evacuating his bowels.

I said, you know, out of all the inventions that have brought us to where we are here today, the very most important one is probably that wall.