Search

Top 60 Oracle Blogs

Recent comments

How Well a Query Optimizer Handles Subqueries?

At the beginning of December, at the UKOUG Tech17 conference in Birmingham (GB), I presented a comparison of the query optimizers of MySQL 8.0.3 and PostgreSQL 10.1. One of the things I talked about is their ability to handle subqueries. I summarized my findings with the following sentence:

Simple sub-queries that are not correctly optimized were observed.

It goes without saying that such a sentence leaves a lot of questions open. After all, it is just a summary. The aim of this post is to show you which subqueries I tested, and to compare my expectations with the execution plans generated by the query optimizers. In addition, since I’m not limited in time and scope as during a 50-minute presentation, I also discuss how the Oracle Database 12.2 query optimizer handles the same queries.

To check how well a query optimizer handles subqueries, it’s in my opinion sufficient to challenge it with queries that should be obvious (at least for a human being). The type of queries where the response time ratio between a good and a bad execution plan is of several orders of magnitude. If a query optimizer isn’t able to correctly handle such queries, with more complex ones it can only be worse…

For the tests I use two very simple tables:

CREATE TABLE small (u INTEGER NOT NULL, nu INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
CREATE UNIQUE INDEX small_u ON small (u);
CREATE INDEX small_nu ON small (nu);
CREATE INDEX small_n ON small (n);
CREATE INDEX small_nn ON small (nn);
CREATE TABLE large (u INTEGER NOT NULL, nu INTEGER NOT NULL, n INTEGER NULL, nn INTEGER NOT NULL, p VARCHAR(128) NULL);
CREATE UNIQUE INDEX large_u ON large (u);
CREATE INDEX large_nu ON large (nu);
CREATE INDEX large_n ON large (n);
CREATE INDEX large_nn ON large (nn);

The table “small” contains 10 rows; its unique key contains the integer values between 1 and 10. The table “large” contains 1 million rows; its unique key contains the integer values between 1 and 1 million. Note that for both tables the columns “nu” (not unique), “n” (null), and “nn” (not null) contain the same value as the unique key column. The only exception is that the column “n” contains “null” instead of the value “7.” Basically, only the constraints applied to them are different. In addition, the column “p” contains a string of 128 characters that is only present to have tables that aren’t too small (i.e. not very representative).

I considered six types of subqueries:

  • Type A – Scalar subqueries with equality predicate
  • Type B – Scalar subqueries with inequality predicate
  • Type C – Uncorrelated subqueries with either IN or EXISTS
  • Type D – Uncorrelated subqueries with either NOT IN or NOT EXISTS
  • Type E – Correlated subqueries with either IN or EXISTS
  • Type F – Correlated subqueries with either NOT IN or NOT EXISTS

Few notes:

  • For each type I considered two sub-types; the difference between them is given by the position of the tables “small” and “large”
  • I didn’t consider subqueries outside the WHERE clause
  • “IN” could be replaced by either “=ANY” or “=SOME”
  • “NOT IN” could be replaced by “!=ALL”

Type A – Scalar subqueries with equality predicate

Subtype A1 – Table “large” in the subquery

A10: SELECT * FROM small WHERE u = (SELECT nu FROM large WHERE u = 6)
A11: SELECT * FROM small WHERE n = (SELECT n FROM large WHERE u = 6)
A12: SELECT * FROM small WHERE n = (SELECT nn FROM large WHERE u = 6)
A13: SELECT * FROM small WHERE nn = (SELECT n FROM large WHERE u = 6)
A14: SELECT * FROM small WHERE nn = (SELECT nn FROM large WHERE u = 6)

The execution plan I expect for these queries carries out the following operations:

  • Access the table “large” through an index scan that returns at most one value. The operation is executed one single time.
  • Access the table “small” through either a table scan or an index scan and return the rows matching the value returned by the previous operation. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

A10

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | PRIMARY     | small | NULL       | const | small_u       | small_u | 4       | const |    1 |   100.00 | NULL  |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+

A11/A12

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | ref   | small_n       | small_n | 5       | const |    1 |   100.00 | Using where |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

A13/A14

+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | ref   | small_nn      | small_nn | 4       | const |    1 |   100.00 | Using where |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u  | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

A10

-----------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |   141 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | SMALL   |     1 |   141 |     1   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN           | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   2 - access("U"= (SELECT "NU" FROM "LARGE" "LARGE" WHERE "U"=6))
   4 - access("U"=6)

A11

-----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |   141 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     1 |   141 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SMALL_N |     1 |       |     1   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

   2 - access("N"= (SELECT "N" FROM "LARGE" "LARGE" WHERE "U"=6))
   4 - access("U"=6)

A12

-----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |   141 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     1 |   141 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SMALL_N |     1 |       |     1   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

   2 - access("N"= (SELECT "NN" FROM "LARGE" "LARGE" WHERE "U"=6))
   4 - access("U"=6)

A13

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |   141 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     1 |   141 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SMALL_NN |     1 |       |     1   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | LARGE    |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | LARGE_U  |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

   2 - access("NN"= (SELECT "N" FROM "LARGE" "LARGE" WHERE "U"=6))
   4 - access("U"=6)

A14

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |   141 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     1 |   141 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | SMALL_NN |     1 |       |     1   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | LARGE    |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | LARGE_U  |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

   2 - access("NN"= (SELECT "NN" FROM "LARGE" "LARGE" WHERE "U"=6))
   4 - access("U"=6)

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

A10

 Seq Scan on small  (cost=8.44..9.57 rows=1 width=148)
   Filter: (u = $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)

A11/A12

 Seq Scan on small  (cost=8.44..9.57 rows=1 width=148)
   Filter: (n = $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)

A13/A14

 Seq Scan on small  (cost=8.44..9.57 rows=1 width=148)
   Filter: (nn = $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)
Subtype A2 – Table “small” in the subquery

A20: SELECT * FROM large WHERE u = (SELECT nu FROM small WHERE u = 6)
A21: SELECT * FROM large WHERE n = (SELECT n FROM small WHERE u = 6)
A22: SELECT * FROM large WHERE n = (SELECT nn FROM small WHERE u = 6)
A23: SELECT * FROM large WHERE nn = (SELECT n FROM small WHERE u = 6)
A24: SELECT * FROM large WHERE nn = (SELECT nn FROM small WHERE u = 6)

The execution plan I expect for these queries carries out the following operations:

  • Access the table “small” through either a table scan or an index scan that returns at most one value. The operation is executed one single time.
  • Access the table “large” through an index scan and return the rows matching the value returned by the previous operation. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

A20

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+
|  1 | PRIMARY     | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL  |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u | 4       | const |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+

A21/A22

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ref   | large_n       | large_n | 5       | const |    1 |   100.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

A23/A24

+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ref   | large_nn      | large_nn | 4       | const |    1 |   100.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u  | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+----------+---------+-------+------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

A20

-----------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |   149 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | LARGE   |     1 |   149 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN           | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   2 - access("U"= (SELECT "NU" FROM "SMALL" "SMALL" WHERE "U"=6))
   4 - access("U"=6)

A21

-----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |   149 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LARGE   |     1 |   149 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LARGE_N |     1 |       |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

   2 - access("N"= (SELECT "N" FROM "SMALL" "SMALL" WHERE "U"=6))
   4 - access("U"=6)

A22

-----------------------------------------------------------------------------------------------
| Id  | Operation                           | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |         |     1 |   149 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LARGE   |     1 |   149 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LARGE_N |     1 |       |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

   2 - access("N"= (SELECT "NN" FROM "SMALL" "SMALL" WHERE "U"=6))
   4 - access("U"=6)

A23

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |   149 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |     1 |   149 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LARGE_NN |     1 |       |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | SMALL    |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | SMALL_U  |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

   2 - access("NN"= (SELECT "N" FROM "SMALL" "SMALL" WHERE "U"=6))
   4 - access("U"=6)

A24

------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |          |     1 |   149 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |     1 |   149 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | LARGE_NN |     1 |       |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID      | SMALL    |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |     INDEX UNIQUE SCAN               | SMALL_U  |     1 |       |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

   2 - access("NN"= (SELECT "NN" FROM "SMALL" "SMALL" WHERE "U"=6))
   4 - access("U"=6)

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

A20

 Index Scan using large_u on large  (cost=1.55..9.57 rows=1 width=148)
   Index Cond: (u = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

A21/A22

 Index Scan using large_n on large  (cost=1.55..9.57 rows=1 width=148)
   Index Cond: (n = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

A23/A24

 Index Scan using large_nn on large  (cost=1.55..9.57 rows=1 width=148)
   Index Cond: (nn = $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

Type B – Scalar subqueries with inequality predicate

Subtype B1 – Table “large” in the subquery

B10: SELECT * FROM small WHERE u != (SELECT nu FROM large WHERE u = 6)
B11: SELECT * FROM small WHERE n != (SELECT n FROM large WHERE u = 6)
B12: SELECT * FROM small WHERE n != (SELECT nn FROM large WHERE u = 6)
B13: SELECT * FROM small WHERE nn != (SELECT n FROM large WHERE u = 6)
B14: SELECT * FROM small WHERE nn != (SELECT nn FROM large WHERE u = 6)

The execution plan I expect for these queries carries out the following operations:

  • Access the table “large” through an index scan that returns at most one value. The operation is executed one single time.
  • Access the table “small” through a table scan and discard the rows matching the value returned by the previous operation. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

B10

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | range | small_u       | small_u | 4       | NULL  |    9 |   100.00 | Using where |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

B11/B12

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | range | small_n       | small_n | 5       | NULL  |    8 |   100.00 | Using where |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

B13/B14

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | ALL   | small_nn      | NULL    | NULL    | NULL  |   10 |    90.00 | Using where |
|  2 | SUBQUERY    | large | NULL       | const | large_u       | large_u | 4       | const |    1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

B10

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | SMALL   |     9 |  1269 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("U"<> (SELECT "NU" FROM "LARGE" "LARGE" WHERE "U"=6))
   3 - access("U"=6)

B11

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | SMALL   |     9 |  1269 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("N"<> (SELECT "N" FROM "LARGE" "LARGE" WHERE "U"=6))
   3 - access("U"=6)

B12

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | SMALL   |     9 |  1269 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("N"<> (SELECT "NN" FROM "LARGE" "LARGE" WHERE "U"=6))
   3 - access("U"=6)

B13

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | SMALL   |     9 |  1269 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("NN"<> (SELECT "N" FROM "LARGE" "LARGE" WHERE "U"=6))
   3 - access("U"=6)

B14

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |     6   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | SMALL   |     9 |  1269 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("NN"<> (SELECT "NN" FROM "LARGE" "LARGE" WHERE "U"=6))
   3 - access("U"=6)

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

B10

 Seq Scan on small  (cost=8.44..9.57 rows=9 width=148)
   Filter: (u <> $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)

B11/B12

 Seq Scan on small  (cost=8.44..9.57 rows=8 width=148)
   Filter: (n <> $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)

B13/B14

 Seq Scan on small  (cost=8.44..9.57 rows=9 width=148)
   Filter: (nn <> $0)
   InitPlan 1 (returns $0)
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = 6)
Subtype B2 – Table “small” in the subquery

B20: SELECT * FROM large WHERE u != (SELECT nu FROM small WHERE u = 6)
B21: SELECT * FROM large WHERE n != (SELECT n FROM small WHERE u = 6)
B22: SELECT * FROM large WHERE n != (SELECT nn FROM small WHERE u = 6)
B23: SELECT * FROM large WHERE nn != (SELECT n FROM small WHERE u = 6)
B24: SELECT * FROM large WHERE nn != (SELECT nn FROM small WHERE u = 6)

The execution plan I expect for these queries carries out the following operations:

  • Access the table “small” through either a table scan or an index scan that returns at most one value. The operation is executed one single time.
  • Access the table “large” through a table scan and discard the rows matching the value returned by the previous operation. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

B20

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | range | large_u       | large_u | 4       | NULL  | 494590 |   100.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u | 4       | const |      1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+

B21/B22

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ALL   | large_n       | NULL    | NULL    | NULL  | 989170 |    50.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u | 4       | const |      1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+

B23/B24

+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref   | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ALL   | large_nn      | NULL    | NULL    | NULL  | 989170 |    50.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | const | small_u       | small_u | 4       | const |      1 |   100.00 | NULL        |
+----+-------------+-------+------------+-------+---------------+---------+---------+-------+--------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

B20

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  5791   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | LARGE   |   999K|   142M|  5790   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("U"<> (SELECT "NU" FROM "SMALL" "SMALL" WHERE "U"=6))
   3 - access("U"=6)

B21

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  5791   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | LARGE   |   999K|   142M|  5790   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("N"<> (SELECT "N" FROM "SMALL" "SMALL" WHERE "U"=6))
   3 - access("U"=6)

B22

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  5791   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | LARGE   |   999K|   142M|  5790   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("N"<> (SELECT "NN" FROM "SMALL" "SMALL" WHERE "U"=6))
   3 - access("U"=6)

B23

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  5791   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | LARGE   |   999K|   142M|  5790   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("NN"<> (SELECT "N" FROM "SMALL" "SMALL" WHERE "U"=6))
   3 - access("U"=6)

B24

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  5791   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL           | LARGE   |   999K|   142M|  5790   (1)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter("NN"<> (SELECT "NN" FROM "SMALL" "SMALL" WHERE "U"=6))
   3 - access("U"=6)

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

B20

 Seq Scan on large  (cost=1.12..34724.12 rows=999999 width=148)
   Filter: (u <> $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

B21/B22

 Seq Scan on large  (cost=1.12..34724.12 rows=999999 width=148)
   Filter: (n <> $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

B23/B24

 Seq Scan on large  (cost=1.12..34724.12 rows=999999 width=148)
   Filter: (nn <> $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = 6)

Type C – Uncorrelated subqueries with either IN or EXISTS

Subtype C1 – Table “large” in the subquery

C10: SELECT * FROM small WHERE n IN (SELECT n FROM large)
C11: SELECT * FROM small WHERE n IN (SELECT nn FROM large)
C12: SELECT * FROM small WHERE nn IN (SELECT n FROM large)
C13: SELECT * FROM small WHERE nn IN (SELECT nn FROM large)
C14: SELECT * FROM small WHERE EXISTS (SELECT * FROM large)

The execution plan I expect for the queries C10-C13 carries out a semi-join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the table “large” through an index scan.

However, for the query C14, I expect the following operations:

  • Access the table “large” through either a table scan or an index scan. The operation is executed one single time, and only needs to check whether one row exists.
  • If the table “large” contains at least one row, access the table “small” through a table scan. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

C10

+----+-------------+-------+------------+------+---------------+---------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_n       | NULL    | NULL    | NULL          |   10 |   100.00 | Using where                    |
|  1 | SIMPLE      | large | NULL       | ref  | large_n       | large_n | 5       | chris.small.n |    1 |   100.00 | Using index; FirstMatch(small) |
+----+-------------+-------+------------+------+---------------+---------+---------+---------------+------+----------+--------------------------------+

C11

+----+-------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_n       | NULL     | NULL    | NULL          |   10 |   100.00 | Using where                    |
|  1 | SIMPLE      | large | NULL       | ref  | large_nn      | large_nn | 4       | chris.small.n |    1 |   100.00 | Using index; FirstMatch(small) |
+----+-------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+--------------------------------+

C12

+----+-------------+-------+------------+------+---------------+---------+---------+----------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key     | key_len | ref            | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+---------+---------+----------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_nn      | NULL    | NULL    | NULL           |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_n       | large_n | 5       | chris.small.nn |    1 |   100.00 | Using index; FirstMatch(small) |
+----+-------------+-------+------------+------+---------------+---------+---------+----------------+------+----------+--------------------------------+

C13

+----+-------------+-------+------------+------+---------------+----------+---------+----------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key      | key_len | ref            | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+----------+---------+----------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_nn      | NULL     | NULL    | NULL           |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_nn      | large_nn | 4       | chris.small.nn |    1 |   100.00 | Using index; FirstMatch(small) |
+----+-------------+-------+------------+------+---------------+----------+---------+----------------+------+----------+--------------------------------+

C14

+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | small | NULL       | ALL   | NULL          | NULL     | NULL    | NULL |     10 |   100.00 | NULL        |
|  2 | SUBQUERY    | large | NULL       | index | NULL          | large_nu | 4       | NULL | 989170 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+

Oracle Database selects the following execution plans. Even though the execution plan of the queries C10-C11 doesn’t completely fulfill the expectations (the table “small” isn’t accessed through a table scan), the difference wouldn’t be noticeable at runtime. Therefore, I consider that all of them fulfill the expectations.

C10

------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |     9 |  1314 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |         |     9 |  1314 |    12   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |  1269 |     2   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN                   | LARGE_N |  1000K|  4882K|     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------

   3 - filter("N" IS NOT NULL)
   4 - access("N"="N")

C11

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     9 |  1314 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |          |     9 |  1314 |    12   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     9 |  1269 |     2   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN                   | SMALL_N  |     9 |       |     1   (0)| 00:00:01 |
|*  4 |   INDEX RANGE SCAN                   | LARGE_NN |  1000K|  4882K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("N" IS NOT NULL)
   4 - access("N"="NN")

C12

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |    10 |  1460 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |         |    10 |  1460 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LARGE_N |  1000K|  4882K|     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

   3 - access("NN"="N")

C13

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |    10 |  1460 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |          |    10 |  1460 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LARGE_NN |  1000K|  4882K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   3 - access("NN"="NN")

C14

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |    10 |  1410 |     5   (0)| 00:00:01 |
|*  1 |  FILTER               |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL   | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "LARGE" "LARGE"))

PostgreSQL selects the following execution plans. Only the execution plan of the query “C14” fulfills the expectations. That said, the other execution plans, in this specific case, perform well. I checked that when the data changes, also the execution plans changes. So, even though I was surprised by them, everything looks good.

C10

 Merge Semi Join  (cost=11.52..13.58 rows=9 width=148)
   Merge Cond: (small.n = large.n)
   ->  Index Scan using small_n on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_n on large  (cost=0.42..81855.69 rows=1000000 width=4)

C11

 Merge Semi Join  (cost=11.52..13.58 rows=9 width=148)
   Merge Cond: (small.n = large.nn)
   ->  Index Scan using small_n on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_nn on large  (cost=0.42..81855.69 rows=1000000 width=4)

C12

 Merge Semi Join  (cost=0.56..13.59 rows=10 width=148)
   Merge Cond: (small.nn = large.n)
   ->  Index Scan using small_nn on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_n on large  (cost=0.42..81855.69 rows=1000000 width=4)

C13

 Merge Semi Join  (cost=0.56..13.59 rows=10 width=148)
   Merge Cond: (small.nn = large.nn)
   ->  Index Scan using small_nn on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_nn on large  (cost=0.42..81855.69 rows=1000000 width=4)

C14

 Result  (cost=0.03..1.13 rows=10 width=148)
   One-Time Filter: $0
   InitPlan 1 (returns $0)
     ->  Seq Scan on large  (cost=0.00..32223.00 rows=1000000 width=0)
   ->  Seq Scan on small  (cost=0.03..1.13 rows=10 width=148)
Subtype C2 – Table “small” in the subquery

C20: SELECT * FROM large WHERE n IN (SELECT n FROM small)
C21: SELECT * FROM large WHERE n IN (SELECT nn FROM small)
C22: SELECT * FROM large WHERE nn IN (SELECT n FROM small)
C23: SELECT * FROM large WHERE nn IN (SELECT nn FROM small)
C24: SELECT * FROM large WHERE EXISTS (SELECT * FROM small)

The execution plan I expect for the queries C20-C23 carries out a semi-join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the table “large” through an index scan.

However, for the query C24, I expect the following operations:

  • Access the table “small” through either a table scan or an index scan. The operation is executed one single time, and only needs to check whether one row exists.
  • If the table “small” contains at least one row, access the table “large” through a table scan. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

C20

+----+-------------+-------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref           | rows | filtered | Extra                               |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------------------------------+
|  1 | SIMPLE      | small | NULL       | index | small_n       | small_n | 5       | NULL          |   10 |   100.00 | Using where; Using index; LooseScan |
|  1 | SIMPLE      | large | NULL       | ref   | large_n       | large_n | 5       | chris.small.n |    1 |   100.00 | NULL                                |
+----+-------------+-------+------------+-------+---------------+---------+---------+---------------+------+----------+-------------------------------------+

C21

+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref            | rows | filtered | Extra                  |
+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+
|  1 | SIMPLE      | small | NULL       | index | small_nn      | small_nn | 4       | NULL           |   10 |   100.00 | Using index; LooseScan |
|  1 | SIMPLE      | large | NULL       | ref   | large_n       | large_n  | 5       | chris.small.nn |    1 |   100.00 | NULL                   |
+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+

C22

+----+-------------+-------+------------+-------+---------------+----------+---------+---------------+------+----------+-------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref           | rows | filtered | Extra                               |
+----+-------------+-------+------------+-------+---------------+----------+---------+---------------+------+----------+-------------------------------------+
|  1 | SIMPLE      | small | NULL       | index | small_n       | small_n  | 5       | NULL          |   10 |   100.00 | Using where; Using index; LooseScan |
|  1 | SIMPLE      | large | NULL       | ref   | large_nn      | large_nn | 4       | chris.small.n |    1 |   100.00 | NULL                                |
+----+-------------+-------+------------+-------+---------------+----------+---------+---------------+------+----------+-------------------------------------+

C23

+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref            | rows | filtered | Extra                  |
+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+
|  1 | SIMPLE      | small | NULL       | index | small_nn      | small_nn | 4       | NULL           |   10 |   100.00 | Using index; LooseScan |
|  1 | SIMPLE      | large | NULL       | ref   | large_nn      | large_nn | 4       | chris.small.nn |    1 |   100.00 | NULL                   |
+----+-------------+-------+------------+-------+---------------+----------+---------+----------------+------+----------+------------------------+

C24

+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ALL   | NULL          | NULL     | NULL    | NULL | 989170 |   100.00 | NULL        |
|  2 | SUBQUERY    | small | NULL       | index | NULL          | small_nu | 4       | NULL |     10 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

C20

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1368 |    18   (6)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |     9 |  1368 |    18   (6)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |     9 |  1368 |    18   (6)| 00:00:01 |
|   3 |    SORT UNIQUE               |         |     9 |    27 |     1   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN          | SMALL_N |     9 |    27 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | LARGE_N |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |   149 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("N"="N")

C21

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |    10 |  1520 |    18   (6)| 00:00:01 |
|   1 |  NESTED LOOPS                |          |    10 |  1520 |    18   (6)| 00:00:01 |
|   2 |   NESTED LOOPS               |          |    10 |  1520 |    18   (6)| 00:00:01 |
|   3 |    SORT UNIQUE               |          |    10 |    30 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN          | SMALL_NN |    10 |    30 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | LARGE_N  |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| LARGE    |     1 |   149 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   5 - access("N"="NN")

C22

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |     9 |  1368 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |          |     9 |  1368 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |          |     9 |  1368 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE               |          |     9 |    27 |     1   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN          | SMALL_N  |     9 |    27 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| LARGE    |     1 |   149 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("NN"="N")

C23

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |    10 |  1520 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |          |    10 |  1520 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |          |    10 |  1520 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE               |          |    10 |    30 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN          | SMALL_NN |    10 |    30 |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN          | LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| LARGE    |     1 |   149 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   5 - access("NN"="NN")

C24

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  1 |  FILTER            |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL| LARGE    |  1000K|   142M|  5788   (1)| 00:00:01 |
|   3 |   INDEX FULL SCAN  | SMALL_NN |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "SMALL" "SMALL"))

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

C20

 Merge Semi Join  (cost=1.74..2.59 rows=9 width=148)
   Merge Cond: (large.n = small.n)
   ->  Index Scan using large_n on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.n
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

C21

 Merge Semi Join  (cost=1.69..2.60 rows=10 width=148)
   Merge Cond: (large.n = small.nn)
   ->  Index Scan using large_n on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.nn
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

C22

 Merge Semi Join  (cost=1.74..2.59 rows=9 width=148)
   Merge Cond: (large.nn = small.n)
   ->  Index Scan using large_nn on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.n
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

C23

 Merge Semi Join  (cost=1.69..2.60 rows=10 width=148)
   Merge Cond: (large.nn = small.nn)
   ->  Index Scan using large_nn on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.nn
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

C24

 Result  (cost=0.11..32223.11 rows=1000000 width=148)
   One-Time Filter: $0
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=0)
   ->  Seq Scan on large  (cost=0.11..32223.11 rows=1000000 width=148)

Type D – Uncorrelated subqueries with either NOT IN or NOT EXISTS

Subtype D1 – Table “large” in the subquery

D10: SELECT * FROM small WHERE n NOT IN (SELECT n FROM large)
D11: SELECT * FROM small WHERE n NOT IN (SELECT nn FROM large)
D12: SELECT * FROM small WHERE nn NOT IN (SELECT n FROM large)
D13: SELECT * FROM small WHERE nn NOT IN (SELECT nn FROM large)
D14: SELECT * FROM small WHERE NOT EXISTS (SELECT * FROM large)

The execution plan I expect for the queries D10-D13 carries out an anti-join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the index associated to referenced column of the table “large” to check whether rows that fulfill the WHERE clause exist.

However, for the query D14, I expect the following operations:

  • Access the table “large” through either a table scan or an index scan. The operation is executed one single time, and only needs to check whether one row exists.
  • If the table “large” contains no row, access the table “small” through a table scan. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations.

D10

+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+-------------------------------------------------+
| id | select_type        | table | partitions | type           | possible_keys | key     | key_len | ref  | rows | filtered | Extra                                           |
+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+-------------------------------------------------+
|  1 | PRIMARY            | small | NULL       | ALL            | NULL          | NULL    | NULL    | NULL |   10 |   100.00 | Using where                                     |
|  2 | DEPENDENT SUBQUERY | large | NULL       | index_subquery | large_n       | large_n | 5       | func |    2 |   100.00 | Using where; Using index; Full scan on NULL key |
+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+-------------------------------------------------+

D11

+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------------------------------------------+
| id | select_type        | table | partitions | type           | possible_keys | key      | key_len | ref  | rows | filtered | Extra                                           |
+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------------------------------------------+
|  1 | PRIMARY            | small | NULL       | ALL            | NULL          | NULL     | NULL    | NULL |   10 |   100.00 | Using where                                     |
|  2 | DEPENDENT SUBQUERY | large | NULL       | index_subquery | large_nn      | large_nn | 4       | func |    1 |   100.00 | Using where; Using index; Full scan on NULL key |
+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------------------------------------------+

D12

+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type        | table | partitions | type           | possible_keys | key     | key_len | ref  | rows | filtered | Extra                    |
+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+--------------------------+
|  1 | PRIMARY            | small | NULL       | ALL            | NULL          | NULL    | NULL    | NULL |   10 |   100.00 | Using where              |
|  2 | DEPENDENT SUBQUERY | large | NULL       | index_subquery | large_n       | large_n | 5       | func |    2 |   100.00 | Using where; Using index |
+----+--------------------+-------+------------+----------------+---------------+---------+---------+------+------+----------+--------------------------+

D13

+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------+
| id | select_type        | table | partitions | type           | possible_keys | key      | key_len | ref  | rows | filtered | Extra       |
+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL            | NULL          | NULL     | NULL    | NULL |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | index_subquery | large_nn      | large_nn | 4       | func |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+----------------+---------------+----------+---------+------+------+----------+-------------+

D14

+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra            |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+------------------+
|  1 | PRIMARY     | NULL  | NULL       | NULL  | NULL          | NULL     | NULL    | NULL |   NULL |     NULL | Impossible WHERE |
|  2 | SUBQUERY    | large | NULL       | index | NULL          | large_nu | 4       | NULL | 989170 |   100.00 | Using index      |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+------------------+

Oracle Database selects the following execution plans. The execution plan of the queries D13-D14 fulfills the expectations. The others, because of the table scan on “large”, are suboptimal. Note that for the queries D12 the query optimizer is restricted by the fact that the “large_n” index doesn’t contain the null value. I consider this a physical database design issue, not a query optimizer issue. However, the execution plan of the queries D10-D11 is suboptimal because of a query optimizer limitation with nullable values.

D10

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |   146 |  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN ANTI NA |       |     1 |   146 |  5793   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL |    10 |  1410 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| LARGE |  1000K|  4882K|  5787   (1)| 00:00:01 |
----------------------------------------------------------------------------

   1 - access("N"="N")

D11

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |     1 |   146 |   616   (2)| 00:00:01 |
|*  1 |  HASH JOIN ANTI SNA   |          |     1 |   146 |   616   (2)| 00:00:01 |
|   2 |   TABLE ACCESS FULL   | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| LARGE_NN |  1000K|  4882K|   610   (1)| 00:00:01 |
----------------------------------------------------------------------------------

   1 - access("N"="NN")

D12

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |   146 |  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN ANTI NA |       |     1 |   146 |  5793   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL |    10 |  1410 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| LARGE |  1000K|  4882K|  5787   (1)| 00:00:01 |
----------------------------------------------------------------------------

   1 - access("NN"="N")

D13

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   146 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |          |     1 |   146 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LARGE_NN |   900K|  4394K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   3 - access("NN"="NN")

D14

----------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |    10 |  1410 |     5   (0)| 00:00:01 |
|*  1 |  FILTER               |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL   | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LARGE" "LARGE"))

PostgreSQL selects the following execution plans. Only the execution plan of the query D14 fulfills the expectations. The others, because of the table scan on “large” and its materialization, are really bad.

D10-D13

 Seq Scan on small  (cost=0.00..218151.12 rows=5 width=148)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=0.00..41130.00 rows=1000000 width=4)
           ->  Seq Scan on large  (cost=0.00..32223.00 rows=1000000 width=4)

D14

 Result  (cost=0.03..1.13 rows=10 width=148)
   One-Time Filter: (NOT $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on large  (cost=0.00..32223.00 rows=1000000 width=0)
   ->  Seq Scan on small  (cost=0.03..1.13 rows=10 width=148)
Subtype D2 – Table “small” in the subquery

D20: SELECT * FROM large WHERE n NOT IN (SELECT n FROM small)
D21: SELECT * FROM large WHERE n NOT IN (SELECT nn FROM small)
D22: SELECT * FROM large WHERE nn NOT IN (SELECT n FROM small)
D23: SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small)
D24: SELECT * FROM large WHERE NOT EXISTS (SELECT * FROM small)

The execution plan I expect for the queries D20-D23 carries out an anti-join between two data sets:

  • Access the table “small” through a table scan and put the resulting data into a memory structure. This operation is executed one single time.
  • Access the table “large” through a table scan and, for every row, check the memory structure created by the previous operation to find out whether rows that fulfills the WHERE clause exist. This operation is executed one single time.

However, for the query D14, I expect the following operations:

  • Access the table “small” through either a table scan or an index scan. The operation is executed one single time, and only needs to check whether one row exists.
  • If the table “small” contains no row, access the table “large” through a table scan. The operation is executed one single time.

MySQL selects the following execution plans. All of them fulfill the expectations (what isn’t visible in the execution plans is that for the queries D20-D23 the result set generated by the subquery is materialized).

D20/D22

+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ALL   | NULL          | NULL    | NULL    | NULL | 989170 |   100.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | index | small_n       | small_n | 5       | NULL |     10 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+--------+----------+-------------+

D21/D23

+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+
|  1 | PRIMARY     | large | NULL       | ALL   | NULL          | NULL     | NULL    | NULL | 989170 |   100.00 | Using where |
|  2 | SUBQUERY    | small | NULL       | index | small_nn      | small_nn | 4       | NULL |     10 |   100.00 | Using index |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+--------+----------+-------------+

D24

+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+------------------+
| id | select_type | table | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra            |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+------------------+
|  1 | PRIMARY     | NULL  | NULL       | NULL  | NULL          | NULL     | NULL    | NULL | NULL |     NULL | Impossible WHERE |
|  2 | SUBQUERY    | small | NULL       | index | NULL          | small_nu | 4       | NULL |   10 |   100.00 | Using index      |
+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+------------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

D20

---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |   999K|   144M|  5795   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI NA|       |   999K|   144M|  5795   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL     | SMALL |    10 |    30 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL     | LARGE |  1000K|   142M|  5788   (1)| 00:00:01 |
---------------------------------------------------------------------------------

   1 - access("N"="N")

D21

-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |   999K|   144M|  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI SNA|          |   999K|   144M|  5793   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN        | SMALL_NN |    10 |    30 |     1   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL      | LARGE    |  1000K|   142M|  5788   (1)| 00:00:01 |
-------------------------------------------------------------------------------------

   1 - access("N"="NN")

D22

---------------------------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |   999K|   144M|  5795   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI NA|       |   999K|   144M|  5795   (1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL     | SMALL |    10 |    30 |     3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL     | LARGE |  1000K|   142M|  5788   (1)| 00:00:01 |
---------------------------------------------------------------------------------

   1 - access("NN"="N")

D23

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   999K|   144M|  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI|          |   999K|   144M|  5793   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN    | SMALL_NN |    10 |    30 |     1   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | LARGE    |  1000K|   142M|  5788   (1)| 00:00:01 |
---------------------------------------------------------------------------------

   1 - access("NN"="NN")

D24

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  1 |  FILTER            |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL| LARGE    |  1000K|   142M|  5788   (1)| 00:00:01 |
|   3 |   INDEX FULL SCAN  | SMALL_NN |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "SMALL" "SMALL"))

PostgreSQL selects the following execution plans. All of them fulfill the expectations.

D20-D23

 Seq Scan on large  (cost=1.12..34724.12 rows=500000 width=148)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

D24

 Result  (cost=0.11..32223.11 rows=1000000 width=148)
   One-Time Filter: (NOT $0)
   InitPlan 1 (returns $0)
     ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=0)
   ->  Seq Scan on large  (cost=0.11..32223.11 rows=1000000 width=148)

Type E – Correlated subqueries with either IN or EXISTS

Subtype E1 – Table “large” in the subquery

E10: SELECT * FROM small WHERE n IN (SELECT n FROM large WHERE large.u = small.u)
E11: SELECT * FROM small WHERE n IN (SELECT nn FROM large WHERE large.u = small.u)
E12: SELECT * FROM small WHERE nn IN (SELECT n FROM large WHERE large.u = small.u)
E13: SELECT * FROM small WHERE nn IN (SELECT nn FROM large WHERE large.u = small.u)
E14: SELECT * FROM small WHERE n IN (SELECT n FROM large WHERE large.nu = small.u)
E15: SELECT * FROM small WHERE n IN (SELECT nn FROM large WHERE large.nu = small.u)
E16: SELECT * FROM small WHERE nn IN (SELECT n FROM large WHERE large.nu = small.u)
E17: SELECT * FROM small WHERE nn IN (SELECT nn FROM large WHERE large.nu = small.u)
E18: SELECT * FROM small WHERE EXISTS (SELECT * FROM large WHERE large.u = small.u)
E19: SELECT * FROM small WHERE EXISTS (SELECT * FROM large WHERE large.nu = small.u)

The execution plan I expect for these queries is a join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the table “large” through an index scan and check whether at least one row fulfills the WHERE clause.

Note that only for the queries E14-E17/E19 a semi-join is necessary. For the others, because the WHERE clause in the subquery references a unique value in table “large”, a “regular” join can take place.

MySQL selects the following execution plans. Except for the query E18, the others fulfill the expectations. For the query E18 the query optimizer does not recognize that no semi-join is necessary.

E10

+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_u,small_n | NULL    | NULL    | NULL          |   10 |   100.00 | NULL        |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+

E11

+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_u,small_n  | NULL    | NULL    | NULL          |   10 |   100.00 | NULL        |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

E12

+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_u,small_nn | NULL    | NULL    | NULL          |   10 |   100.00 | NULL        |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n  | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

E13

+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_u,small_nn | NULL    | NULL    | NULL          |   10 |   100.00 | NULL        |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

E14

+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys    | key      | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_u,small_n  | NULL     | NULL    | NULL          |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_nu,large_n | large_nu | 4       | chris.small.u |    1 |     5.00 | Using where; FirstMatch(small) |
+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+

E15

+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys     | key      | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_u,small_n   | NULL     | NULL    | NULL          |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_nu,large_nn | large_nu | 4       | chris.small.u |    1 |     5.00 | Using where; FirstMatch(small) |
+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+

E16

+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys    | key      | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_u,small_nn | NULL     | NULL    | NULL          |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_nu,large_n | large_nu | 4       | chris.small.u |    1 |     5.00 | Using where; FirstMatch(small) |
+----+-------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+--------------------------------+

E17

+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys     | key      | key_len | ref           | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+
|  1 | SIMPLE      | small | NULL       | ALL  | small_u,small_nn  | NULL     | NULL    | NULL          |   10 |   100.00 | NULL                           |
|  1 | SIMPLE      | large | NULL       | ref  | large_nu,large_nn | large_nu | 4       | chris.small.u |    1 |     5.00 | Using where; FirstMatch(small) |
+----+-------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+--------------------------------+

E18

+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL          | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u       | large_u | 4       | chris.small.u |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+

E19

+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL          | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu      | large_nu | 4       | chris.small.u |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+

Oracle Database selects the following execution plans. Even though the execution plan of the queries E10/E11/E14/E15 doesn’t completely fulfill the expectations (the table “small” isn’t accessed through a table scan), the difference wouldn’t be noticeable at runtime. Therefore, I consider that all of them fulfill the expectations.

E10

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                         |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                        |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |  1269 |     2   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN                  | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID         | LARGE   |     1 |    10 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("LARGE"."U"="SMALL"."U")
   6 - filter("N"="N")

E11

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                         |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                        |         |     9 |  1359 |    20   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |  1269 |     2   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN                  | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID         | LARGE   |     1 |    10 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("LARGE"."U"="SMALL"."U")
   6 - filter("N"="NN")

E12

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   4 - access("LARGE"."U"="SMALL"."U")
   5 - filter("NN"="N")

E13

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |    10 |  1510 |    23   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL         | SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   4 - access("LARGE"."U"="SMALL"."U")
   5 - filter("NN"="NN")

E14

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     9 |  1359 |    29   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |          |     9 |  1359 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     9 |  1269 |     2   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN                   | SMALL_N  |     9 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |  1000K|  9765K|     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | LARGE_NU |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("N" IS NOT NULL)
   4 - filter("N"="N")
   5 - access("LARGE"."NU"="SMALL"."U")

E15

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     9 |  1359 |    21   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |          |     9 |  1359 |    21   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     9 |  1269 |     2   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN                   | SMALL_N  |     9 |       |     1   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |  1000K|  9765K|     3   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("N" IS NOT NULL)
   4 - filter("LARGE"."NU"="SMALL"."U")
   5 - access("N"="NN")

E16

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |    10 |  1510 |    33   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |          |    10 |  1510 |    33   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                  | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |  1000K|  9765K|     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | LARGE_NU |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("NN"="N")
   4 - access("LARGE"."NU"="SMALL"."U")

E17

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |    10 |  1510 |    33   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI                   |          |    10 |  1510 |    33   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                  | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |  1000K|  9765K|     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("LARGE"."NU"="SMALL"."U")
   4 - access("NN"="NN")

E18

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |    10 |  1460 |    13   (0)| 00:00:01 |
|   1 |  NESTED LOOPS      |         |    10 |  1460 |    13   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| LARGE_U |     1 |     5 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

   3 - access("LARGE"."U"="SMALL"."U")

E19

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |    10 |  1460 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |          |    10 |  1460 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LARGE_NU |  1000K|  4882K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   3 - access("LARGE"."NU"="SMALL"."U")

PostgreSQL selects the following execution plans. Except for the execution plan of the query E19, the others fulfill the expectations. Note that, in this specific case, the execution plan selected for the query E19 perform well. I checked that when the data changes, also the execution plans changes. So, even though I was surprised by it, it is good.

E10-E13

 Seq Scan on small  (cost=0.00..45.47 rows=5 width=148)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = small.u)

E14-E17

 Seq Scan on small  (cost=0.00..45.47 rows=5 width=148)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Index Scan using large_nu on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (nu = small.u)

E18

 Merge Join  (cost=1.69..2.60 rows=10 width=148)
   Merge Cond: (large.u = small.u)
   ->  Index Only Scan using large_u on large  (cost=0.42..81855.69 rows=1000000 width=4)
   ->  Sort  (cost=1.27..1.29 rows=10 width=148)
         Sort Key: small.u
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=148)

E19

 Merge Semi Join  (cost=0.56..13.59 rows=10 width=148)
   Merge Cond: (small.u = large.nu)
   ->  Index Scan using small_u on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_nu on large  (cost=0.42..81855.69 rows=1000000 width=4)
Subtype E2 – Table “small” in the subquery

E20: SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.u = large.u)
E21: SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.u = large.u)
E22: SELECT * FROM large WHERE nn IN (SELECT n FROM small WHERE small.u = large.u)
E23: SELECT * FROM large WHERE nn IN (SELECT nn FROM small WHERE small.u = large.u)
E24: SELECT * FROM large WHERE n IN (SELECT n FROM small WHERE small.nu = large.u)
E25: SELECT * FROM large WHERE n IN (SELECT nn FROM small WHERE small.nu = large.u)
E26: SELECT * FROM large WHERE nn IN (SELECT n FROM small WHERE small.nu = large.u)
E27: SELECT * FROM large WHERE nn IN (SELECT nn FROM small WHERE small.nu = large.u)
E28: SELECT * FROM large WHERE EXISTS (SELECT * FROM small WHERE small.u = large.u)
E29: SELECT * FROM large WHERE EXISTS (SELECT * FROM small WHERE small.nu = large.u)

The execution plan I expect for these queries is a join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the table “large” through an index scan and select the rows that fulfills the WHERE clause.

Note that only for the queries E24-E27/E29 a semi-join is necessary. For the others, since the WHERE clause in the subquery references a unique value in table “small”, a “regular” join can take place.

MySQL selects the following execution plans. The ones for the queries E20-E27 fulfill the expectations. The execution plan of the others, because of the way the table “large” is accessed, is really bad.

E20

+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | index  | small_u,small_n | small_n | 5       | NULL          |   10 |   100.00 | Using index |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+

E21

+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key      | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | index  | small_u,small_nn | small_nn | 4       | NULL          |   10 |   100.00 | Using index |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n  | large_u  | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+

E22

+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | index  | small_u,small_n  | small_n | 5       | NULL          |   10 |   100.00 | Using index |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

E23

+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys    | key      | key_len | ref           | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+
|  1 | SIMPLE      | small | NULL       | index  | small_u,small_nn | small_nn | 4       | NULL          |   10 |   100.00 | Using index |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn | large_u  | 4       | chris.small.u |    1 |     5.00 | Using where |
+----+-------------+-------+------------+--------+------------------+----------+---------+---------------+------+----------+-------------+

E24

+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref            | rows | filtered | Extra                      |
+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_nu,small_n | NULL    | NULL    | NULL           |   10 |   100.00 | Start temporary            |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n  | large_u | 4       | chris.small.nu |    1 |     5.00 | Using where; End temporary |
+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+

E25

+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+
| id | select_type | table | partitions | type   | possible_keys     | key     | key_len | ref            | rows | filtered | Extra                      |
+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_nu,small_nn | NULL    | NULL    | NULL           |   10 |   100.00 | Start temporary            |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_n   | large_u | 4       | chris.small.nu |    1 |     5.00 | Using where; End temporary |
+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+

E26

+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+
| id | select_type | table | partitions | type   | possible_keys    | key     | key_len | ref            | rows | filtered | Extra                      |
+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_nu,small_n | NULL    | NULL    | NULL           |   10 |   100.00 | Start temporary            |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.nu |    1 |     5.00 | Using where; End temporary |
+----+-------------+-------+------------+--------+------------------+---------+---------+----------------+------+----------+----------------------------+

E27

+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+
| id | select_type | table | partitions | type   | possible_keys     | key     | key_len | ref            | rows | filtered | Extra                      |
+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+
|  1 | SIMPLE      | small | NULL       | ALL    | small_nu,small_nn | NULL    | NULL    | NULL           |   10 |   100.00 | Start temporary            |
|  1 | SIMPLE      | large | NULL       | eq_ref | large_u,large_nn  | large_u | 4       | chris.small.nu |    1 |     5.00 | Using where; End temporary |
+----+-------------+-------+------------+--------+-------------------+---------+---------+----------------+------+----------+----------------------------+

E28

+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL          | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u       | small_u | 4       | chris.large.u |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+

E29

+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL          | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu      | small_nu | 4       | chris.large.u |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+

Oracle Database selects the following execution plans. Even though the execution plan of the queries E20-E17 doesn’t completely fulfill the expectations (the table “small” isn’t accessed through a table scan), the difference wouldn’t be noticeable at runtime. Therefore, I consider that all of them fulfill the expectations.

E20

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                         |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                        |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |    54 |     2   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN                  | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID         | LARGE   |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("SMALL"."U"="LARGE"."U")
   6 - filter("N"="N")

E21

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   3 |    VIEW                      | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  4 |     HASH JOIN                |                  |       |       |            |          |
|   5 |      INDEX FAST FULL SCAN    | SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   6 |      INDEX FAST FULL SCAN    | SMALL_U          |    10 |    60 |     1   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN         | LARGE_U          |     1 |       |     1   (0)| 00:00:01 |
|*  8 |   TABLE ACCESS BY INDEX ROWID| LARGE            |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   4 - access(ROWID=ROWID)
   7 - access("SMALL"."U"="LARGE"."U")
   8 - filter("N"="NN")

E22

-------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                         |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                        |         |     9 |  1395 |    20   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |    54 |     2   (0)| 00:00:01 |
|*  4 |     INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN                  | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS BY INDEX ROWID         | LARGE   |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("N" IS NOT NULL)
   5 - access("SMALL"."U"="LARGE"."U")
   6 - filter("NN"="N")

E23

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |                  |    10 |  1550 |    22   (0)| 00:00:01 |
|   3 |    VIEW                      | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  4 |     HASH JOIN                |                  |       |       |            |          |
|   5 |      INDEX FAST FULL SCAN    | SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   6 |      INDEX FAST FULL SCAN    | SMALL_U          |    10 |    60 |     1   (0)| 00:00:01 |
|*  7 |    INDEX UNIQUE SCAN         | LARGE_U          |     1 |       |     1   (0)| 00:00:01 |
|*  8 |   TABLE ACCESS BY INDEX ROWID| LARGE            |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   4 - access(ROWID=ROWID)
   7 - access("SMALL"."U"="LARGE"."U")
   8 - filter("NN"="NN")

E24

--------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                          |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS                         |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE                         |         |     9 |    54 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |    54 |     2   (0)| 00:00:01 |
|*  5 |      INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN                   | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS BY INDEX ROWID          | LARGE   |     1 |   149 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

   5 - filter("N" IS NOT NULL)
   6 - access("SMALL"."NU"="LARGE"."U")
   7 - filter("N"="N")

E25

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE               |                  |    10 |    60 |     2   (0)| 00:00:01 |
|   4 |     VIEW                     | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  5 |      HASH JOIN               |                  |       |       |            |          |
|   6 |       INDEX FAST FULL SCAN   | SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   7 |       INDEX FAST FULL SCAN   | SMALL_NU         |    10 |    60 |     1   (0)| 00:00:01 |
|*  8 |    INDEX UNIQUE SCAN         | LARGE_U          |     1 |       |     1   (0)| 00:00:01 |
|*  9 |   TABLE ACCESS BY INDEX ROWID| LARGE            |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   5 - access(ROWID=ROWID)
   8 - access("SMALL"."NU"="LARGE"."U")
   9 - filter("N"="NN")

E26

--------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                          |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS                         |         |     9 |  1395 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE                         |         |     9 |    54 |     2   (0)| 00:00:01 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| SMALL   |     9 |    54 |     2   (0)| 00:00:01 |
|*  5 |      INDEX FULL SCAN                   | SMALL_N |     9 |       |     1   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN                   | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|*  7 |   TABLE ACCESS BY INDEX ROWID          | LARGE   |     1 |   149 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

   5 - filter("N" IS NOT NULL)
   6 - access("SMALL"."NU"="LARGE"."U")
   7 - filter("NN"="N")

E27

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |                  |    10 |  1550 |    13   (8)| 00:00:01 |
|   3 |    SORT UNIQUE               |                  |    10 |    60 |     2   (0)| 00:00:01 |
|   4 |     VIEW                     | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  5 |      HASH JOIN               |                  |       |       |            |          |
|   6 |       INDEX FAST FULL SCAN   | SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   7 |       INDEX FAST FULL SCAN   | SMALL_NU         |    10 |    60 |     1   (0)| 00:00:01 |
|*  8 |    INDEX UNIQUE SCAN         | LARGE_U          |     1 |       |     1   (0)| 00:00:01 |
|*  9 |   TABLE ACCESS BY INDEX ROWID| LARGE            |     1 |   149 |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   5 - access(ROWID=ROWID)
   8 - access("SMALL"."NU"="LARGE"."U")
   9 - filter("NN"="NN")

E28

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |    10 |  1520 |    12   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                |         |    10 |  1520 |    12   (0)| 00:00:01 |
|   2 |   NESTED LOOPS               |         |    10 |  1520 |    12   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | SMALL_U |    10 |    30 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
|   5 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |   149 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   4 - access("SMALL"."U"="LARGE"."U")

E29

-----------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |    10 |  1520 |     8  (13)| 00:00:01 |
|   1 |  NESTED LOOPS                |          |    10 |  1520 |     8  (13)| 00:00:01 |
|   2 |   NESTED LOOPS               |          |    10 |  1520 |     8  (13)| 00:00:01 |
|   3 |    SORT UNIQUE               |          |    10 |    30 |     1   (0)| 00:00:01 |
|   4 |     INDEX FULL SCAN          | SMALL_NU |    10 |    30 |     1   (0)| 00:00:01 |
|*  5 |    INDEX UNIQUE SCAN         | LARGE_U  |     1 |       |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS BY INDEX ROWID| LARGE    |     1 |   149 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

   5 - access("SMALL"."NU"="LARGE"."U")

PostgreSQL selects the following execution plans. Only the execution plan of the queries E28/E29 fulfill the expectations. The execution plan of the others, because of the way the table “large” is accessed, is really bad.

E20-E23

 Seq Scan on large  (cost=0.00..598473.00 rows=500000 width=148)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = large.u)

E24-E27

 Seq Scan on large  (cost=0.00..598473.00 rows=500000 width=148)
   Filter: (SubPlan 1)
   SubPlan 1
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (nu = large.u)

E28

 Merge Join  (cost=1.69..2.60 rows=10 width=148)
   Merge Cond: (large.u = small.u)
   ->  Index Scan using large_u on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.u
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

E29

 Merge Semi Join  (cost=1.69..2.60 rows=10 width=148)
   Merge Cond: (large.u = small.nu)
   ->  Index Scan using large_u on large  (cost=0.42..81855.69 rows=1000000 width=148)
   ->  Sort  (cost=1.27..1.29 rows=10 width=4)
         Sort Key: small.nu
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

Type F – Correlated subqueries with either NOT IN or NOT EXISTS

Subtype F1 – Table “large” in the subquery

F10: SELECT * FROM small WHERE n NOT IN (SELECT n FROM large WHERE large.u = small.u)
F11: SELECT * FROM small WHERE n NOT IN (SELECT nn FROM large WHERE large.u = small.u)
F12: SELECT * FROM small WHERE nn NOT IN (SELECT n FROM large WHERE large.u = small.u)
F13: SELECT * FROM small WHERE nn NOT IN (SELECT nn FROM large WHERE large.u = small.u)
F14: SELECT * FROM small WHERE n NOT IN (SELECT n FROM large WHERE large.nu = small.u)
F15: SELECT * FROM small WHERE n NOT IN (SELECT nn FROM large WHERE large.nu = small.u)
F16: SELECT * FROM small WHERE nn NOT IN (SELECT n FROM large WHERE large.nu = small.u)
F17: SELECT * FROM small WHERE nn NOT IN (SELECT nn FROM large WHERE large.nu = small.u)
F18: SELECT * FROM small WHERE NOT EXISTS (SELECT * FROM large WHERE large.u = small.u)
F19: SELECT * FROM small WHERE NOT EXISTS (SELECT * FROM large WHERE large.nu = small.u)

The execution plan I expect for these queries is an anti-join between two data sets:

  • Access the table “small” through a table scan. The operation is executed one single time.
  • For every row of the table “small,” access the table “large” through an index scan and check whether rows that fulfill the WHERE clause exist.

MySQL selects the following execution plans. All of them fulfill the expectations.

F10

+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL            | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u,large_n | large_u | 4       | chris.small.u |    1 |   100.00 | Using where |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+

F11

+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL             | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.u |    1 |   100.00 | Using where |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

F12

+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL            | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u,large_n | large_u | 4       | chris.small.u |    1 |    19.00 | Using where |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+------+----------+-------------+

F13

+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL             | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u,large_nn | large_u | 4       | chris.small.u |    1 |    10.00 | Using where |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+------+----------+-------------+

F14

+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys    | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL             | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu,large_n | large_nu | 4       | chris.small.u |    1 |   100.00 | Using where |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+

F15

+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys     | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL              | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu,large_nn | large_nu | 4       | chris.small.u |    1 |   100.00 | Using where |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+

F16

+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys    | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL             | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu,large_n | large_nu | 4       | chris.small.u |    1 |    19.00 | Using where |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+------+----------+-------------+

F17

+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys     | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL              | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu,large_nn | large_nu | 4       | chris.small.u |    1 |    10.00 | Using where |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+------+----------+-------------+

F18

+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL    | NULL          | NULL    | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | eq_ref | large_u       | large_u | 4       | chris.small.u |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+------+----------+-------------+

F19

+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key      | key_len | ref           | rows | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+
|  1 | PRIMARY            | small | NULL       | ALL  | NULL          | NULL     | NULL    | NULL          |   10 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | large | NULL       | ref  | large_nu      | large_nu | 4       | chris.small.u |    1 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+------+----------+-------------+

Oracle Database selects the following execution plans. All of them fulfill the expectations.

F10/F12

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |    18   (0)| 00:00:01 |
|*  1 |  FILTER                      |         |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LARGE" "LARGE" WHERE "LARGE"."U"=:B1
              AND LNNVL("N"<>:B2)))
   3 - filter(LNNVL("N"<>:B1))
   4 - access("LARGE"."U"=:B1)

F11

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     9 |  1269 |    18   (0)| 00:00:01 |
|*  1 |  FILTER                      |         |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| LARGE   |     1 |    10 |     3   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LARGE" "LARGE" WHERE "LARGE"."U"=:B1
              AND LNNVL("NN"<>:B2)))
   3 - filter(LNNVL("NN"<>:B1))
   4 - access("LARGE"."U"=:B1)

F13

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     1 |   151 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI           |         |     1 |   151 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL          | SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| LARGE   |  1000K|  9765K|     2   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | LARGE_U |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   3 - filter("NN"="NN")
   4 - access("LARGE"."U"="SMALL"."U")

F14/F16

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     9 |  1269 |    23   (0)| 00:00:01 |
|*  1 |  FILTER                              |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL                  | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |     1 |    10 |     4   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | LARGE_NU |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LARGE" "LARGE" WHERE "LARGE"."NU"=:B1 AND
              LNNVL("N"<>:B2)))
   3 - filter(LNNVL("N"<>:B1))
   4 - access("LARGE"."NU"=:B1)

F15

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     9 |  1269 |    23   (0)| 00:00:01 |
|*  1 |  FILTER                              |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL                  | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |     1 |    10 |     4   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | LARGE_NU |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "LARGE" "LARGE" WHERE "LARGE"."NU"=:B1 AND
              LNNVL("NN"<>:B2)))
   3 - filter(LNNVL("NN"<>:B1))
   4 - access("LARGE"."NU"=:B1)

F17

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |     1 |   151 |    33   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI                   |          |     1 |   151 |    33   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL                  | SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| LARGE    |  1000K|  9765K|     3   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | LARGE_NN |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   3 - filter("LARGE"."NU"="SMALL"."U")
   4 - access("NN"="NN")

F18

------------------------------------------------------------------------------
| Id  | Operation          | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |         |     1 |   146 |    13   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |         |     1 |   146 |    13   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL   |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX UNIQUE SCAN| LARGE_U |   900K|  4394K|     1   (0)| 00:00:01 |
------------------------------------------------------------------------------

   3 - access("LARGE"."U"="SMALL"."U")

F19

-------------------------------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   146 |    23   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |          |     1 |   146 |    23   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| SMALL    |    10 |  1410 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | LARGE_NU |   900K|  4394K|     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------

   3 - access("LARGE"."NU"="SMALL"."U")

PostgreSQL selects the following execution plans. Except for the queries F18/F19, the others fulfill the expectations. However, in this specific case, the execution plan selected for the queries E18/E19 perform well. I checked that when the data changes, also the execution plans changes. So, even though I was surprised by them, they are good.

F10-F13

 Seq Scan on small  (cost=0.00..45.47 rows=5 width=148)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Index Scan using large_u on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (u = small.u)

F14-F17

 Seq Scan on small  (cost=0.00..45.47 rows=5 width=148)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Index Scan using large_nu on large  (cost=0.42..8.44 rows=1 width=4)
           Index Cond: (nu = small.u)

F18

 Merge Anti Join  (cost=0.56..13.59 rows=1 width=148)
   Merge Cond: (small.u = large.u)
   ->  Index Scan using small_u on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_u on large  (cost=0.42..81855.69 rows=1000000 width=4)

F19

 Merge Anti Join  (cost=0.56..13.59 rows=1 width=148)
   Merge Cond: (small.u = large.nu)
   ->  Index Scan using small_u on small  (cost=0.14..12.29 rows=10 width=148)
   ->  Index Only Scan using large_nu on large  (cost=0.42..81855.69 rows=1000000 width=4)
Subtype F2 – Table “small” in the subquery

F20: SELECT * FROM large WHERE n NOT IN (SELECT n FROM small WHERE small.u = large.u)
F21: SELECT * FROM large WHERE n NOT IN (SELECT nn FROM small WHERE small.u = large.u)
F22: SELECT * FROM large WHERE nn NOT IN (SELECT n FROM small WHERE small.u = large.u)
F23: SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.u = large.u)
F24: SELECT * FROM large WHERE n NOT IN (SELECT n FROM small WHERE small.nu = large.u)
F25: SELECT * FROM large WHERE n NOT IN (SELECT nn FROM small WHERE small.nu = large.u)
F26: SELECT * FROM large WHERE nn NOT IN (SELECT n FROM small WHERE small.nu = large.u)
F27: SELECT * FROM large WHERE nn NOT IN (SELECT nn FROM small WHERE small.nu = large.u)
F28: SELECT * FROM large WHERE NOT EXISTS (SELECT * FROM small WHERE small.u = large.u)
F29: SELECT * FROM large WHERE NOT EXISTS (SELECT * FROM small WHERE small.nu = large.u)

The execution plan I expect for these queries is an anti-join between two data sets:

  • Access the table “small” through a table scan and put the resulting data into a memory structure. This operation is executed one single time.
  • Access the table “large” through a table scan and, for every row, check the memory structure created by the previous operation to find out whether rows that fulfill the WHERE clause exist. This operation is executed one single time.

MySQL selects the following execution plans. None of them, because of the access to table “small” for every row in the table “large,” fulfill the expectations. They are really bad.

F20

+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL            | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u,small_n | small_u | 4       | chris.large.u |      1 |   100.00 | Using where |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+

F21

+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL             | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u,small_nn | small_u | 4       | chris.large.u |      1 |   100.00 | Using where |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+

F22

+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys   | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL            | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u,small_n | small_u | 4       | chris.large.u |      1 |    19.00 | Using where |
+----+--------------------+-------+------------+--------+-----------------+---------+---------+---------------+--------+----------+-------------+

F23

+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys    | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL             | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u,small_nn | small_u | 4       | chris.large.u |      1 |    10.00 | Using where |
+----+--------------------+-------+------------+--------+------------------+---------+---------+---------------+--------+----------+-------------+

F24

+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys    | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL             | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu,small_n | small_nu | 4       | chris.large.u |      1 |   100.00 | Using where |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+

F25

+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys     | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL              | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu,small_nn | small_nu | 4       | chris.large.u |      1 |   100.00 | Using where |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+

F26

+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys    | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL             | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu,small_n | small_nu | 4       | chris.large.u |      1 |    19.00 | Using where |
+----+--------------------+-------+------------+------+------------------+----------+---------+---------------+--------+----------+-------------+

F27

+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys     | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL              | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu,small_nn | small_nu | 4       | chris.large.u |      1 |    10.00 | Using where |
+----+--------------------+-------+------------+------+-------------------+----------+---------+---------------+--------+----------+-------------+

F28

+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type   | possible_keys | key     | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL    | NULL          | NULL    | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | eq_ref | small_u       | small_u | 4       | chris.large.u |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------+--------+----------+-------------+

F29

+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+
| id | select_type        | table | partitions | type | possible_keys | key      | key_len | ref           | rows   | filtered | Extra       |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+
|  1 | PRIMARY            | large | NULL       | ALL  | NULL          | NULL     | NULL    | NULL          | 989170 |   100.00 | Using where |
|  2 | DEPENDENT SUBQUERY | small | NULL       | ref  | small_nu      | small_nu | 4       | chris.large.u |      1 |   100.00 | Using index |
+----+--------------------+-------+------------+------+---------------+----------+---------+---------------+--------+----------+-------------+

Oracle Database selects the following execution plans. Only the execution plan of the queries F23/F27-F29 fulfill the expectations. The others, because of the missing anti-join optimization, are bad.

F20/F22

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  1000K  (1)| 00:00:40 |
|*  1 |  FILTER                      |         |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | LARGE   |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "SMALL" "SMALL" WHERE "SMALL"."U"=:B1
              AND LNNVL("N"<>:B2)))
   3 - filter(LNNVL("N"<>:B1))
   4 - access("SMALL"."U"=:B1)

F21

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         |   999K|   142M|  1000K  (1)| 00:00:40 |
|*  1 |  FILTER                      |         |       |       |            |          |
|   2 |   TABLE ACCESS FULL          | LARGE   |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID| SMALL   |     1 |     6 |     1   (0)| 00:00:01 |
|*  4 |    INDEX UNIQUE SCAN         | SMALL_U |     1 |       |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "SMALL" "SMALL" WHERE "SMALL"."U"=:B1
              AND LNNVL("NN"<>:B2)))
   3 - filter(LNNVL("NN"<>:B1))
   4 - access("SMALL"."U"=:B1)

F23

--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |   999K|   147M|  5794   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI   |                  |   999K|   147M|  5794   (1)| 00:00:01 |
|   2 |   VIEW                  | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|   4 |     INDEX FAST FULL SCAN| SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| SMALL_U          |    10 |    60 |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS FULL     | LARGE            |  1000K|   142M|  5788   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

   1 - access("SMALL"."U"="LARGE"."U" AND "NN"="NN")
   3 - access(ROWID=ROWID)

F24/F26

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |   999K|   142M|  1994K  (1)| 00:01:18 |
|*  1 |  FILTER                              |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL                  | LARGE    |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     1 |     6 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | SMALL_NU |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "SMALL" "SMALL" WHERE "SMALL"."NU"=:B1 AND
              LNNVL("N"<>:B2)))
   3 - filter(LNNVL("N"<>:B1))
   4 - access("SMALL"."NU"=:B1)

F25

-------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |   999K|   142M|  1994K  (1)| 00:01:18 |
|*  1 |  FILTER                              |          |       |       |            |          |
|   2 |   TABLE ACCESS FULL                  | LARGE    |  1000K|   142M|  5789   (1)| 00:00:01 |
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| SMALL    |     1 |     6 |     2   (0)| 00:00:01 |
|*  4 |    INDEX RANGE SCAN                  | SMALL_NU |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "SMALL" "SMALL" WHERE "SMALL"."NU"=:B1 AND
              LNNVL("NN"<>:B2)))
   3 - filter(LNNVL("NN"<>:B1))
   4 - access("SMALL"."NU"=:B1)

F27

--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |   999K|   147M|  5794   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI   |                  |   999K|   147M|  5794   (1)| 00:00:01 |
|   2 |   VIEW                  | index$_join$_002 |    10 |    60 |     2   (0)| 00:00:01 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|   4 |     INDEX FAST FULL SCAN| SMALL_NN         |    10 |    60 |     1   (0)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| SMALL_NU         |    10 |    60 |     1   (0)| 00:00:01 |
|   6 |   TABLE ACCESS FULL     | LARGE            |  1000K|   142M|  5788   (1)| 00:00:01 |
--------------------------------------------------------------------------------------------

   1 - access("SMALL"."NU"="LARGE"."U" AND "NN"="NN")
   3 - access(ROWID=ROWID)

F28

--------------------------------------------------------------------------------
| Id  | Operation            | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |         |   999K|   144M|  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI|         |   999K|   144M|  5793   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN    | SMALL_U |    10 |    30 |     1   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | LARGE   |  1000K|   142M|  5788   (1)| 00:00:01 |
--------------------------------------------------------------------------------

   1 - access("SMALL"."U"="LARGE"."U")

F29

---------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |   999K|   144M|  5793   (1)| 00:00:01 |
|*  1 |  HASH JOIN RIGHT ANTI|          |   999K|   144M|  5793   (1)| 00:00:01 |
|   2 |   INDEX FULL SCAN    | SMALL_NU |    10 |    30 |     1   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL  | LARGE    |  1000K|   142M|  5788   (1)| 00:00:01 |
---------------------------------------------------------------------------------

   1 - access("SMALL"."NU"="LARGE"."U")

PostgreSQL selects the following execution plans. Only the ones of the queries F28/F29 fulfill the expectations. The others, because of the table scan on “small” for every row in the table “large,” are really bad.

F20-F23

 Seq Scan on large  (cost=0.00..598473.00 rows=500000 width=148)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (u = large.u)

F24-F27

 Seq Scan on large  (cost=0.00..598473.00 rows=500000 width=148)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Seq Scan on small  (cost=0.00..1.12 rows=1 width=4)
           Filter: (nu = large.u)

F28/F29

 Hash Anti Join  (cost=1.23..44849.14 rows=999990 width=148)
   Hash Cond: (large.u = small.u)
   ->  Seq Scan on large  (cost=0.00..32223.00 rows=1000000 width=148)
   ->  Hash  (cost=1.10..1.10 rows=10 width=4)
         ->  Seq Scan on small  (cost=0.00..1.10 rows=10 width=4)

Summary

The number of queries that the query optimizers handle correctly are the following:

  • Oracle Database 12.2: 72 out of 80
  • MySQL 8.0.3: 67 out of 80
  • PostgreSQL 10.0: 60 out of 80

Since not all queries are handled correctly, for best performance it is sometimes necessary to rewrite them.