Search

Top 60 Oracle Blogs

Recent comments

subqueries

How to hint – 1

Here’s a quick tutorial in hinting, promped by a question on the OTN database forum.
The OP has a hash semi-join and Oracle appears to be ignoring a hint to use a nested loop:

    > I tried forcing the optimizer to not use hash join by adding NO_USE_HASH, USE_NL to my sql but it doesn’t seem to work.
    > Can anyone please help check what I have done wrong.
    > select /*+ NO_USE_HASH(C2)  USE_NL(C2) */
    >         SC.SID, SC.MID, SC.INDATE, SC.EXDATE, SC.AUDATE
    > FROM    SSCLASS SC
    > WHERE   SC.SID = 0
    > AND     SC.CID = 0
    > AND     SC.MID = 1
    > AND     SC.INDATE <= SC.EXDATE
    > AND     EXISTS (
    >                 SELECT  SSCID FROM SSCLASS C2
    >                 WHERE   C2.SSCID = SC.SSCID
    >                 AND     C2.AUDATE >= to_date('2009-01-01','yyyy-MM-dd')
    >         )
    > ORDER BY
    >        SSCID, INDATE, EXDATE
    >
    > PLAN_TABLE_OUTPUT
    > Plan hash value: 1476588646
    >
    > ------------------------------------------------------------------------------------------------------
    > | Id  | Operation                     | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    > ------------------------------------------------------------------------------------------------------
    > |   0 | SELECT STATEMENT              |              |   204K|    10M|       | 35799   (1)| 00:07:10 |
    > |   1 |  SORT ORDER BY                |              |   204K|    10M|    25M| 35799   (1)| 00:07:10 |
    > |*  2 |   HASH JOIN SEMI              |              |   204K|    10M|    10M| 33077   (1)| 00:06:37 |
    > |*  3 |    TABLE ACCESS BY INDEX ROWID| SSCLASS      |   204K|  7983K|       |  9110   (1)| 00:01:50 |
    > |*  4 |     INDEX RANGE SCAN          | X5_SSCLASS   |   204K|       |       |   582   (1)| 00:00:07 |
    > |*  5 |    INDEX RANGE SCAN           | X6_SSCLASS   |  4955K|    66M|       | 17276   (1)| 00:03:28 |
    > ------------------------------------------------------------------------------------------------------
    

I’m not going to argue about what plans might be good or bad, and I’m going to assume the OP simply wants a nested loop semi join using a “good” index into the table aliased as C2; so I’m just going to demonstrate on this simple example how to approach that specific problem. The critical error the OP has made is that the join he’s trying to affect doesn’t exist in the query block where he’s put his hint – so he needs to find out what query will exist after the subquery has been nested and the optimizer is looking at the semi-join.

Here’s initial query, with default execution plan, I’ll point out that there is an index on the n1 column that I’m using in the existence test:

select
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |    15 |  2865 |    26   (4)| 00:00:01 |
|*  1 |  HASH JOIN SEMI       |       |    15 |  2865 |    26   (4)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL   | T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|   3 |   INDEX FAST FULL SCAN| T1_I1 |  3000 | 12000 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")
   2 - filter("T2"."N2"=15)

So I’ve emulated the hash semi-join into the second table that the OP wants to get rid of, and I’m not using the target index in a “precision” fashion.

I happen to know that there is a hint that I can use to make the subquery operate as a nested loop semijoin. It’s /*+ nl_sj */ and it has to go in the subquery. Unfortunately it’s a hint that’s deprecated in 10g, but never mind that for the moment. I’m also going to adopt “sensible practice” and give each of my query blocks a name. Let’s see what we get from dbms_xplan with the hint.

explain plan
set statement_id = 'sj_hinted'
for
select
	/*+
		qb_name(main)
	*/
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			/*+
				qb_name(subq) nl_sj
			*/
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

select * from table(dbms_xplan.display(null,'sj_hinted','outline'));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------
Plan hash value: 635111780

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    15 |  2865 |    37   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |       |    15 |  2865 |    37   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |  3000 | 12000 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$A93AFAED" "T1"@"SUBQ")
      LEADING(@"SEL$A93AFAED" "T2"@"MAIN" "T1"@"SUBQ")
      INDEX(@"SEL$A93AFAED" "T1"@"SUBQ" ("T1"."N1"))
      FULL(@"SEL$A93AFAED" "T2"@"MAIN")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ")
      OUTLINE_LEAF(@"SEL$A93AFAED")
      ALL_ROWS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N2"=15)
   3 - access("T1"."N1"="T2"."N1")

Note how I’ve used a statement_id to label my plan, and I’ve added the extra predicate ‘outline’ to the call to dbms_xplan. The outline shows me the complete set of hints I need to reproduce the execution plan; technically it’s the information that would be stored by Oracle as an outline or an SQL Baseline.

There are a few session-level parameter settings I don’t really need included, and a couple of things which can’t qualify as “legal” SQL hints, though, and I’m going to ignore those. (Don’t you love the “ignore the hints” hint, though!)

So let’s take the minimum set of hints back into the SQL:

explain plan
set statement_id = 'full_hints'
for
select
	/*+
		qb_name(main)
		unnest(@subq)
		leading(@sel$a93afaed t2@main t1@subq)
		use_nl(@sel$a93afaed t1@subq)
		full(@sel$a93afaed t2@main)
		index(@sel$a93afaed t1@subq(t1.n1))
	*/
	*
from
	t2
where	t2.n2 = 15
and	exists (
		select
			/*+
				qb_name(subq)
			*/
			null
		from	t1
		where	t1.n1 = t2.n1
	)
;

select * from table(dbms_xplan.display(null,'full_hints','outline'));

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |    15 |  2865 |    37   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |       |    15 |  2865 |    37   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL| T2    |    15 |  2805 |    22   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | T1_I1 |  3000 | 12000 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      USE_NL(@"SEL$A93AFAED" "T1"@"SUBQ")
      LEADING(@"SEL$A93AFAED" "T2"@"MAIN" "T1"@"SUBQ")
      INDEX(@"SEL$A93AFAED" "T1"@"SUBQ" ("T1"."N1"))
      FULL(@"SEL$A93AFAED" "T2"@"MAIN")
      OUTLINE(@"SUBQ")
      OUTLINE(@"MAIN")
      UNNEST(@"SUBQ")
      OUTLINE_LEAF(@"SEL$A93AFAED")
      ALL_ROWS
      OPTIMIZER_FEATURES_ENABLE('10.2.0.3')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T2"."N2"=15)
   3 - access("T1"."N1"="T2"."N1")

Job done – we used a bit of hackery to get the plan we wanted, then used the legal hints to reproduce the plan.

It is important to name your query blocks as this helps you to identify what transformations apply when, and how to label your tables correctly in your code; and you have to remember that the “strange” query block names that appear (such as @”SEL$A93AFAED”) are dependent on the query block names you originally supplied.

The method isn’t perfect since (a) sometimes hints that are needed don’t get into the outline, and (b) sometimes the outline actually doesn’t reproduce the plan if all you use are the “legal” hints – but it may help you in most cases.

Star Transformation – 2

After publishing my note about emulating star transformations when there was something blocking the optimizer’s ability to do them automatically I ended up having an interesting email conversation with Mark Farnham about variations on the theme, concluding with a completely different (and possibly more intuitive) SQL statement to achieve the same aim.

You will recall that I started off with the following statement:

select
	ord.*
from
	products	prd,
	customers	cst,
	orders		ord
where
	prd.grp = 50
and	cst.grp = 50
and	ord.id_prd = prd.id
and	ord.id_cst = cst.id
;

Although this specific example is so simple that the optimizer could turn it into a star transformation automatically, I used it to demonstrate a “minimum work” version of the statement as follows (with the execution plan from 11.1.0.6):

select
	ord.*
from
	(
	select
		/*+
			leading(prd ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	prid
	from
		products	prd,
		orders		ord
		where
		prd.grp = 50
	and	ord.id_prd = prd.id
		)	prid,
	(
	select
		/*+
			leading(cst ord)
			use_nl(ord)
			no_merge
		*/
		ord.rowid 	crid
	from
		customers	cst,
		orders		ord
	where
		cst.grp = 50
	and	ord.id_cst = cst.id
	)	crid,
	orders	ord
where
	prid.prid = crid.crid
and	ord.rowid = prid.prid
;

--------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |    11 |  1650 |   570 |
|   1 |  NESTED LOOPS               |            |    11 |  1650 |   570 |
|*  2 |   HASH JOIN                 |            |    11 |   264 |   559 |
|   3 |    VIEW                     |            |  3361 | 40332 |   277 |
|   4 |     NESTED LOOPS            |            |  3361 | 87386 |   277 |
|*  5 |      TABLE ACCESS FULL      | CUSTOMERS  |    98 |   882 |    81 |
|*  6 |      INDEX RANGE SCAN       | ORD_CST_FK |    34 |   578 |     2 |
|   7 |    VIEW                     |            |  3390 | 40680 |   281 |
|   8 |     NESTED LOOPS            |            |  3390 | 88140 |   281 |
|*  9 |      TABLE ACCESS FULL      | PRODUCTS   |   100 |   900 |    81 |
|* 10 |      INDEX RANGE SCAN       | ORD_PRD_FK |    34 |   578 |     2 |
|  11 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("PRID"."PRID"="CRID"."CRID")
   5 - filter("CST"."GRP"=50)
   6 - access("ORD"."ID_CST"="CST"."ID")
   9 - filter("PRD"."GRP"=50)
  10 - access("ORD"."ID_PRD"="PRD"."ID")

Go back to the original SQL statement, though. The select list contains columns from just the orders table, and there’s a fairly well-known suggestion (or possibly guideline) that “tables that are not in the select list should not appear in the from clause, they should appear in subqueries in the where clause”. It’s not difficult to come up with examples where this strategy is a bad idea – but it’s often worth thinking through the consequences of trying to apply it. (Of course, you often find that after you’ve rewritten your SQL to match the strategy the optimizer transforms it back into the join that you had been avoiding.)

Since the products and customers tables don’t appear in the select list, can I find a way of rewriting my statement with subqueries ? The answer is yes. Here’s the SQL, with the execution plan I got.

select
	ord.*
from
	orders ord
where
	ord.rowid in (
		select
			/*+
				no_use_hash_aggregation
			*/
			prid.prid
		from
			(
			select	/*+ no_merge */
				ord.rowid 	prid
			from
				products	prd,
				orders		ord
				where
				prd.grp = 50
			and	ord.id_prd = prd.id
			)	prid,
			(
			select	/*+ no_merge */
				ord.rowid 	crid
			from
				customers	cst,
				orders		ord
			where
				cst.grp = 50
			and	ord.id_cst = cst.id
			)	crid
		where
			prid.prid = crid.crid
	)
;

------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |   138 |   591   (1)| 00:00:08 |
|   1 |  NESTED LOOPS               |            |     1 |   138 |   591   (1)| 00:00:08 |
|   2 |   VIEW                      | VW_NSO_1   |    11 |   132 |   589   (1)| 00:00:08 |
|   3 |    SORT UNIQUE              |            |     1 |   264 |            |          |
|*  4 |     HASH JOIN               |            |    11 |   264 |   589   (1)| 00:00:08 |
|   5 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|   6 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|*  7 |        TABLE ACCESS FULL    | PRODUCTS   |   100 |   900 |    94   (2)| 00:00:02 |
|*  8 |        INDEX RANGE SCAN     | ORD_PRD_FK |    33 |   561 |     2   (0)| 00:00:01 |
|   9 |      VIEW                   |            |  3314 | 39768 |   294   (1)| 00:00:04 |
|  10 |       NESTED LOOPS          |            |  3314 | 86164 |   294   (1)| 00:00:04 |
|* 11 |        TABLE ACCESS FULL    | CUSTOMERS  |   100 |   900 |    94   (2)| 00:00:02 |
|* 12 |        INDEX RANGE SCAN     | ORD_CST_FK |    33 |   561 |     2   (0)| 00:00:01 |
|  13 |   TABLE ACCESS BY USER ROWID| ORDERS     |     1 |   126 |     1   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("PRID"."PRID"="CRID"."CRID")
   7 - filter("PRD"."GRP"=50)
   8 - access("ORD"."ID_PRD"="PRD"."ID")
  11 - filter("CST"."GRP"=50)
  12 - access("ORD"."ID_CST"="CST"."ID")

You’ll notice that this plan is remarkably similar to the plan I got from the joins with inline views – although the new plan has one extra “sort unique” operation at line 3. The optimizer has transformed my query by unnesting the subqueries and turning them into inline views – adding in a “distinct” operation because that’s what happens when you turn an “IN subquery” into a join when there isn’t a suitable uniqueness constraint on the join column.

There are two reasons for adopting this subquery approach. There’s the (highly subjective) argument that the code is a little easier to understand in this form, especially if you then want to join the orders table onwards to other tables. There’s also an objective argument relating to the “sort unique”. Note that I included the hint “no_use_hash_aggregation” to stop Oracle from using a “hash unique” operation at this point. By forcing Oracle to sort for uniqueness I know that the rowids will appear in (at least) an “extent-based” order rather than a completely random order. It’s possible that walking the fact table in physical order will be a little more efficient than accessing it in a completely randomised order. The former might find multiple rows in a block or benefit from some form of O/S or SAN read-ahead; the latter is more likely to turn one row into one random physical read.

One final thought – I said that you might want to take this type of approach for queries where the optimizer can’t do a star transformation automatically. There’s one very important case where this is always true – when you’re running Standard Edition, which doesn’t support bitmap indexes. So if you’re running SE and want to do star transformations – you now know how.

Footnote: I haven’t spent much time testing the limits of variations of this code although I did spend 30 minutes extending the example to use three dimension tables. If you think that this strategy might solve a production problem, think about using the sample code to create and test a better model of your production requirement before rewriting the production code – and if you go to production, check very carefully that the resulting code does produce the same results.

Enhanced Subquery Optimizations in Oracle

While googling I found a fresh article from Oracle for the VLDB journal: Enhanced Subquery Optimization in Oracle. It primarily discusses subqueries – how Oracle deals with them on optimization and at run-time. Paper’s topics overview: subquery coalescing – here I’ve done some testing of the feature partially available in 11gR2 (parallel) group-by pushdown – [...]