Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Power BI- Loading PBI Log Files

There’s a reason that log analytics programs, like Splunk, Data Dog and Sumo Logic are so popular.  Even Microsoft has a Log Analytics product, but the important message here is log data is massive and parsing through it to find important information can be a bit of a pain.  The second word in Log Analytics IS “analytics”.  Due to this, the first thought when faced with the number of logs from many complex Power BI environments that people are building, (multiple data sources, multiple data centers, SSRS, Power BI, etc) was to load the logs into Power BI.

I’ve been working with trace files, but the log files should have been the first files I should have discussed, (my bad!)  Let’s correct that oversight right now.

1st Example- Power BI Desktop Logs

First we’ll start with the standard Power BI Desktop log, which can be found in C:\Users\\appdata\local\Microsoft\Power BI Desktop\Traces\Performance.  This file keep track of the main processing performed by Power BI.  This log rotates out on a regular basis, so you’ll see more than one and the log will have the naming convention of:

PBIDesktop...log

We will load this log file into Power BI Desktop by clicking on Get Data –> Text/CSV and then choose to view all files and navigate to the folder that contains the log files.  Choose the desktop log from the list of files available and click OK.

You’ll need to format the data to produce a working table, which the M query below will demonstrate:

let
    Source = Csv.Document(File.Contents("C:\Users\kegorman.NORTHAMERICA\Documents\Traces\PBIDesktop.2020.2018-08-23T18-20-38-862814.log"),5,"",null,1252),
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}, {"Column2", type text}, {"Column3", Int64.Type}, {"Column4", type text}, {"Column5", type text}}),
        #"Replaced Value" = Table.ReplaceValue(#"Changed Type","{Start:","",Replacer.ReplaceText,{"Column5"}),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Replaced Value", "Column5", Splitter.SplitTextByDelimiter(",Action:", QuoteStyle.Csv), {"Column5.1", "Column5.2"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column5.1", type datetime}, {"Column5.2", type text}}),
        #"Replaced Value1" = Table.ReplaceValue(#"Changed Type1",",ProductVersion:2.56.5023.1043 (PBIDesktop)","",Replacer.ReplaceText,{"Column5.2"}),
        #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1","}","",Replacer.ReplaceText,{"Column5.2"}),
        #"Split Column by Delimiter1" = Table.SplitColumn(#"Replaced Value2", "Column5.2", Splitter.SplitTextByEachDelimiter({"Duration:"}, QuoteStyle.Csv, true), {"Column5.2.1", "Column5.2.2"}),
        #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column5.2.1", type text}, {"Column5.2.2", type duration}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed Type2",{{"Column5.2.2", "Duration"}, {"Column5.2.1", "Action"}, {"Column5.1", "Start Time"}}),
        #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Column4"}),
        #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns",{{"Column3", "PID"}, {"Column2", "Type"}}),
        #"Replaced Value3" = Table.ReplaceValue(#"Renamed Columns1",":","",Replacer.ReplaceText,{"Type"}),
        #"Renamed Columns2" = Table.RenameColumns(#"Replaced Value3",{{"Column1", "Main Action"}})
    in
#"Renamed Columns2"

The table then results in a very workable data set that appears similar to this:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl.png?r... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl.png?r... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl.png?w... 1400w" sizes="(max-width: 700px) 100vw, 700px" data-recalc-dims="1" />

This offers you a dataset that you can load, reload and/or append to with log data that is available to report on.  As we’ve seen previously, you can create reports on resource usage, but this one would be used to search for “where type=’Error'” or look at the steps performed by the child TID #57, etc.

There weren’t any errors in my log, so it’s kind of difficult to demonstrate, as you’d only want it to display if there was a problem, but you could set up a report that only shows the Actions that take over duration of 5 seconds.  I decided instead to just simulate the data, displaying the Action and the TID for those that fell into a certain number range…:)

https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl2.png?... 300w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl2.png?... 768w, https://i2.wp.com/dbakevlar.com/wp-content/uploads/2018/08/log_tbl2.png?... 1400w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

2nd Example-  Power BI Report Server

Second one is  inspecting the Reporting Server Portal log, (RSPortal**.log) that resides in #000000;">C:\Program Files\Microsoft Power BI Report Server\PBIRS\LogFiles

We again load this log file via Get Data –> Text/CSV and then choose to view all files, as it won’t see the .log extension otherwise.  Choose the file and click on Edit.

The M query displays the changes I performed to format the data into something that can easily be worked with.  Because of the stagnated output of the data lines, this will format the error and warning messages, with the rest of the rows only having the Information Message fulfilled, the rest of the columns will be null:

let
    Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Program Files\Microsoft Power BI Report Server\PBIRS\LogFiles\RSPortal.log"), null, null, 1252)}),
        #"Split Column by Delimiter" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", Int64.Type}, {"Column1.4", type text}}),
        #"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"Column1.1", "Information Message"}, {"Column1.2", "Status"}, {"Column1.3", "Status Code"}, {"Column1.4", "Status Message"}})
    in
#"Renamed Columns"

We can then create a simple table in Power BI:

Values:  Status Message

Filters:  status = Error

You then will receive the following output:

https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/Error_msg..pn... 300w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/Error_msg..pn... 768w, https://i0.wp.com/dbakevlar.com/wp-content/uploads/2018/08/Error_msg..pn... 1066w" sizes="(max-width: 650px) 100vw, 650px" data-recalc-dims="1" />

Of everything included in the log file, this quickly isolates the error message and displays it in the report.

The deal is, this uses Power BI a bit differently than we originally considered it-  it’s not just for the business user, but it can be for the technical professional as well.  I’ve worked with this kind of data my entire career and if there’s a way I can display it the way I need to answer the questions the business needs answering from me, what better way than to use the tool they’re already using to answer their questions about the business every day? </p />
</p></div>

    	  	<div class=

Error Logging

Error logging is a topic that I’ve mentioned a couple of times in the past, most recently as a follow-up in a discussion of the choices for copying a large volume of data from one table to another, but originally in an addendum about a little surprise you may get when you use extended strings (max_string_size = EXTENDED).

If you use the default call to dbms_errlog.create_error_log() to create an error logging table then Oracle will create a table with a few columns of its own plus every column (name) that you have in your original table – but it will create your columns as varchar2(4000), or nvarchar2(2000), or raw(2000) – unless you’ve set the max_string_size to extended.  Here’s a simple  demo script with results from two different systems, one with the default setting the other with the extended setting (note, there’s a little inconsistency in handling raw() columns.


rem
rem     Script:         log_errors_min.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Jun 2018
rem     Purpose:
rem

create table t1 (
        v1      varchar2(10),
        n1      number(2,0),
        d1      date,
        nv1     nvarchar2(10),
        r1      raw(10)
);


execute dbms_errlog.create_error_log('t1')

desc err$_t1


max_string_size = STANDARD
--------------------------
 Name			       Null?	Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$			NUMBER
 ORA_ERR_MESG$				VARCHAR2(2000)
 ORA_ERR_ROWID$ 			ROWID
 ORA_ERR_OPTYP$ 			VARCHAR2(2)
 ORA_ERR_TAG$				VARCHAR2(2000)
 V1					VARCHAR2(4000)
 N1					VARCHAR2(4000)
 D1					VARCHAR2(4000)
 NV1					NVARCHAR2(2000)
 R1					RAW(2000)


max_string_size = EXTENDED
--------------------------
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ORA_ERR_NUMBER$                        NUMBER
 ORA_ERR_MESG$                          VARCHAR2(2000)
 ORA_ERR_ROWID$                         ROWID
 ORA_ERR_OPTYP$                         VARCHAR2(2)
 ORA_ERR_TAG$                           VARCHAR2(2000)
 V1                                     VARCHAR2(32767)
 N1                                     VARCHAR2(32767)
 D1                                     VARCHAR2(32767)
 NV1                                    NVARCHAR2(16383)
 R1                                     RAW(32767)

Every single “original” column that appears in this table will be a LOB, with an inline LOB locator of 30 or more bytes. (At least, that’s the 12.1.0.2 implementation, I haven’t checked for 12.2 or 18.3).

If this is going to be a problem (e.g. you have a table defined with 500 columns but only use 120 of them) you can create a minimalist error logging table. Provided you create it with the ora_err% columns suitably defined you can add only those columns you’re really interested in (or feel threatened by), and you don’t have to declare them at extreme lengths. e.g.


create table err$_special (
        ora_err_number$         number,
        ora_err_mesg$           varchar2(2000),
        ora_err_rowid$          rowid,
        ora_err_optyp$          varchar2(2),
        ora_err_tag$            varchar2(2000),
        n1                      varchar2(128)
)
;

insert into t1 values(1,'abc','02-jan-1984',sys_op_c2c('abc'),hextoraw('0xFF')) 
log errors into err$_special
reject limit unlimited
;

execute print_table('select * from err$_special')


ORA_ERR_NUMBER$               : 1722
ORA_ERR_MESG$                 : ORA-01722: invalid number

ORA_ERR_ROWID$                :
ORA_ERR_OPTYP$                : I
ORA_ERR_TAG$                  :
N1                            : abc


If you try to create an error logging table that doesn’t include the 5 critical columns you’ll see Oracle error ORA-38900: missing mandatory column “ORA_ERR_{something}” of error log table “{your logging table name}” when you try to log errors into it, and the 5 critical columns have to be the first 5 columns (in any order) in the table or you’ll get Oracle error ORA-38901: column “ORA_ERR_{something}$” of table “{your logging table name}” when you try to log errors into it.

18c Database installation on Windows

If you’re a Windows enterprise, or you want to run your 18c database on your Windows laptop/desktop for research and education, then there has been some good news this week.  The software is now available to you on the OTN network page.  Here’s a walk through of the software installation process

Head to the standard database downloads page

http://www.oracle.com/technetwork/database/enterprise-edition/downloads/index.html

Accept the license agreement and choose the Windows version to download

image

Note – if you want to see all of the various Windows 18c components (grid, client, examples, etc), you can get that here

Once you have downloaded the software, note that when you unzip it, you are unzipping it directly into place, not into a staging area.  So unzip to a folder that you intend to be your ORACLE_HOME location.

image

Once the unzip has been completed, in the base directory where you unzipped the files, there will be a setup.exe file.  Double click on that to launch the installer

image

The familiar java based software installer will appear. For this blog post, I opted to solely go with software configuration – I’ll cover database creation in a separate post.

image

For my use, I’m just using a home laptop, so single instance for me. If you do want RAC, you’ll be needing to download more components anyway (eg Grid)

image

You’ll then get the standard pre-installation checks on your machine.  For reference, the machine I’m installing the software on – its a 32G RAM machine on Windows 10, and I did not get any warnings.

image

I nominated my existing ORACLE_BASE location as the target for this ORACLE_HOME as well

image

and went with the default option of using a virtual Windows account to own the software:

image

Note: The first time I did this install, I went with the Windows Built-In account because I had an old 11g database installation under the same ORACLE_BASE, and I thought at least it would be consistent.  Whilst the installation and subsequent use of 18c worked fine, and my 11g instance was fine, it totally hosed by 12c installation which was also under the same ORACLE_BASE.  The 12c installation had been done with the default virtual account, and it appears the subsequent addition of 18c using the built-in Windows SYSTEM account reset the permissions on the critial diagnostic directory paths.  From that point on, my 12c installation could not start and got “permission denied” errors when trying to access various destinations.

So my advice would be – adopt a consistent approach for any software under a common ORACLE_BASE.

Next you choose the edition you want to install.  Enterprise for me naturally Smile

image

And then you are ready to install. Just hit the Install button, sit back and relax.

image

You will see the familiar progress dialog box, and like all progress boxes from any vendor, the percentage complete will typically have no true bearing on how long the process will take Smile For the record, my installation took around 8-10mins at this stage, most of it configuring the central inventory.

image

All things going well, you finally get the confirmation screen, and voila! You’re 18c database software is installed!

image

Here’s an (accelerated) video showing the above steps as they were performed on my machine.

Enjoy 18c on Windows!

New Zealand: “Oracle Indexing Internals and Best Practices” Seminars November 2018.

Good news for those of you in beautiful New Zealand. Due to popular demand, I’ll be returning to run some of my acclaimed “Oracle Indexing Internals and Best Practices” seminars in November 2018. The dates and events are: Wellington: 19-20 November 2018: Registration Here or Buy Directly Here Auckland: 21-22 November 2018: Registration Here or […]

RV Life and Working Remote

I get a lot of questions about what it’s like to work remote while living in our 5th wheel.  I’ll link this post to danceswithwinnebagos.com, too, so for those asking the same question from that site, it’s a two for one… </p />
</p></div>

    	  	<div class=

Descending bug

Following on from Monday’s posting about reading execution plans and related information, I noticed a question on the ODC database forum asking about the difference between “in ({list of values})” and a list of “column = {constant}” predicates connected by OR. The answer to the question is that there’s essentially no difference as you would be able to see from the predicate section of an execution plan:


SELECT  c1, c2, c3, c4, c5, c6, c7, c8..  
FROM    TAB1  
WHERE   STS IN ( 'A', 'B')  
AND     cnt < '4'  
AND     dt < sysdate  
and     rownum <=1;  
  
---------------------------------------------------------------------------------------------------------  
| Id  | Operation                     | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT              |                         |     1 |  1847 | 33399   (1)| 00:03:14 |  
|*  1 |  COUNT STOPKEY                |                         |       |       |            |          |  
|   2 |   INLIST ITERATOR             |                         |       |       |            |          |  
|*  3 |    TABLE ACCESS BY INDEX ROWID| TAB1                    |   114K|   201M| 33399   (1)| 00:03:14 |  
|*  4 |     INDEX RANGE SCAN          | TAB1_STS_IDX            |   114K|       |   723   (1)| 00:00:05 |  
---------------------------------------------------------------------------------------------------------  
  
Predicate Information (identified by operation id):  
---------------------------------------------------  
   1 - filter(ROWNUM<=1)  
   3 - filter("cnt"<'4' AND "dt"

Note how the predicate section tells you that the original “sts in ( ‘A’, ‘B’ )” has been transformed into “sts = ‘A’ or sts = ‘B'”.

A further point I made about IN-lists in Monday’s post was that as one step in the transformation Oracle would sort the list and eliminate duplicates, and it suddenly occurred to me to wonder whether Oracle would sort the list in descending order if the only relevant index were defined to start with a descending column. Naturally I had to try it so here’s a suitable script to prepare some data:

rem
rem     Script:         descending_bug_06.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2018
rem     Purpose:
rem
rem     Last tested
rem             18.3.0.0        Crashes
rem             12.2.0.1        Crashes
rem             12.1.0.2        Crashes
rem             11.2.0.4        Bad Plan
rem

create table t1
nologging
pctfree 95 pctused 5
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        case mod(rownum,1000)
                when 0 then 'A'
                when 3 then 'B'
                when 6 then 'C'
                       else 'D'
        end                             sts,
        case mod(rownum,1000)
                when 0 then '1'
                when 3 then '2'
                when 6 then '3'
                       else '4'
        end                             cnt,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > comment to avoid WordPress format issue
;

create index t1_i1a on t1(sts) nologging;
create index t1_i1d on t1(sts desc) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size skewonly'
        );
end;
/

There is one oddity in this script – if you’ve got every column in an index declared as DESC you’ve made a mistake and none of the columns should be declared as DESC. The feature is relevant only if you want a mixture of ascending and descending column in a single index.

An important detail of the script is that I’ve gathered stats AFTER creating the objects – it’s important to do this, even in 18.3, because (a) creating the “descending” index will result in a hidden virtual column being created to represent the descending column and I want make sure I have stats on that column and (b) the “stats on creation” code doesn’t generate histograms and I want a (frequency) histogram on columns sts and the hidden, virtual, descending version of the column.

After generating the data and checking that I have the correct histograms for sts and sys_nc00006$ (the relevant hidden column) I can then run the following test:

set serveroutput off
alter session set statistics_level = all;

alter index t1_i1d invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter index t1_i1d   visible;
alter index t1_i1a invisible;

select  sts, count(*)
from    t1
where   sts in ('B','C')
group by
        sts
;

select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

alter session set statistics_level = typical;
set serveroutput on


The code makes one index invisible then runs a query that should use an inlist iterator; then it switches indexes making the invisible one visible and vice versa and repeats the query. I’ve enabled rowsource execution statistics and pulled the execution plans from memory to make sure I don’t get fooled by any odd glitches that might exist within “explain plan”. Here are the results from 11.2.0.4 – normal index, then descending index – with a little cosmetic cleaning:


S   COUNT(*)
- ----------
B        100
C        100

SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.01 |       5 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.01 |       5 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    200 |00:00:00.01 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1A |      2 |    178 |    200 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("STS"='B' OR "STS"='C'))


Index altered.
Index altered.


S   COUNT(*)
- ----------
C        100
B        100


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      2 |00:00:00.02 |     198 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      2 |00:00:00.02 |     198 |
|*  2 |   INDEX FULL SCAN    | T1_I1D |      1 |   1000 |    200 |00:00:00.02 |     198 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


As expected we see counts of 100 for ‘B’s and ‘C’s, and we also see that the “sort group by nosort” operation with the descending index has produced the results in reverse order. The problem though is that the optimizer has decided to use an “index full scan” on the descending index, and the estimate of the rows returned is terribly wrong (and seems to be the common “5% guess”, used once for each target value), and the number of buffer visits is huge compared to the result from the normal index – Oracle really did walk every leaf block in the index to get this result. The predicate section also looks rather silly – why hasn’t the optimizer produced predicates more like: “sys_nc00006$ = sys_op_descend(‘B’)” ?

In passing you’ll notice that the estimated rows in the plan using the normal index is a little low. This is the result of Oracle using a small sample (ca. 5,500 rows) in 11g to gather histogram stats. 12c will do better for a frequency histogram with the fast algorithm it uses for a 100% (auto) sample size.

So 11g doesn’t do very well but we’ve got 12.1.0.2, 12.2.0.1, and (in the last couple of weeks) 18.3 to play with. Here’s the result from 12.1.0.2 and 12.2.0.1 for the query that should use the descending index:


select  sts, count(*)
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [qernsRowP], [1], [], [], [], [], [], [], [], [], [], []


SQL_ID  f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |        |      1 |        |      0 |00:00:00.01 |       0 |
|   1 |  SORT GROUP BY NOSORT|        |      1 |      2 |      0 |00:00:00.01 |       0 |
|   2 |   INLIST ITERATOR    |        |      1 |        |    101 |00:00:00.03 |       5 |
|*  3 |    INDEX RANGE SCAN  | T1_I1D |      2 |    200 |    101 |00:00:00.01 |       5 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR
              "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR
              SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))

The query crashed! The plan, however, did look appropriate – the optimizer picked an inlist iterator, picked an index range scan, got the correct estimate of rows (index entries), and did better with the predicate section (though having used a sensible predicate for the access predciate it then used the bizarre version as the filter predicate). Judging from the A-rows column the query seems to have crashed at roughly the point where the optimizer was switching from the range scan for the first iteration into the range scan for the second iteration.

And then there’s Oracle 18.3 – which does the same as the 12c versions :(

To make sure that my silly “single column so it shouldn’t be declared descending” index was the sole cause of the problem I repeated the tests using a two-column index on (sts, cnt).

Conclusion:

Descending indexes or (to be more accurate) indexes with descending columns can still produce problems even in the very latest version of Oracle.

Footnote

Oracle MoS has the wonderful “ORA-600/ORA-7445/ORA-700 Error Look-up Tool (Doc ID 153788.1)” (which doesn’t yet allow you to choose 18.3 as a version) so I used this to do a look up for ORA-00600 errors with first paremeter qernsRowP in 12.2.0.1 and got the following suggestion from doc ID 285913.1: “set event:10119 to disable no-sort fetch and then reparse the failing SQL.” The example suggested setting the event to level 12, and this solved the problem for all three failing versions – but the suggestion came with a warning: “Setting this event at system level may impact the performance of database.” The execution plan (taken, in this case, from 18.2) may explain the warning:

 

S   COUNT(*)
- ----------
B	 100
C	 100

SQL_ID	f20u42pmw1z6w, child number 0
-------------------------------------
select sts, count(*) from t1 where sts in ('B','C') group by  sts

------------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	    |	   1 |	      |      2 |00:00:00.01 |	    4 |       |       | 	 |
|   1 |  HASH GROUP BY	   |	    |	   1 |	    2 |      2 |00:00:00.01 |	    4 |  1558K|  1558K|  659K (0)|
|   2 |   INLIST ITERATOR  |	    |	   1 |	      |    200 |00:00:00.01 |	    4 |       |       | 	 |
|*  3 |    INDEX RANGE SCAN| T1_I1D |	   2 |	  200 |    200 |00:00:00.01 |	    4 |       |       | 	 |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(("T1"."SYS_NC00006$"=SYS_OP_DESCEND('B') OR "T1"."SYS_NC00006$"=SYS_OP_DESCEND('C')))
       filter((SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='B' OR SYS_OP_UNDESCEND("T1"."SYS_NC00006$")='C'))


The plan has changed from using a “sort group by nosort” – which effectively means just keeping a running count as you go – to a real “hash group by” which means you have to do the hashing arithmetic for every value (though maybe there’s a deterministic trick that means Oracle won’t do the arithmetic if the next value to be hashed is the same as the previous value) and the actual memory used (659K) does seem a little extreme for counting two distinct values.

 

Take care with regular expressions

In an Office Hours session a couple of months back, I covered an important change that comes to regular expressions once you upgrade to 12c Release 2. You can see the video covering the issue here:

but for the TL;DR brigade reading this post: Regular expressions are not deterministic when you take NLS settings into account and thus cannot be used in constraints and/or function-based indexes.

This is just a post to quickly revisit the topic for anyone thinking of upgrading from an earlier release to 12c Release 2. An AskTOM question came in asking what would happen to such constraints during the upgrade process.

The sad answer is … nothing. For example, if you successfully (and by strict definition, incorrectly) created a constraint with a regular expression in 11g, then after the upgrade, that constraint will still be present in your new 12c Release 2 system. It will continue to work as it did in 11g, and even if you disable/enable it, or put it through a validate command to exercise the data, it will work as it did before.

To be honest, I do not like this, because it can become what I call a “sleeper” problem. If, 6 months after you upgrade, you happen to drop and recreate that constraint you’ll be most distressed to find that it cannot be added, and you will have most probably long forgotten that it was caused by an event that occurred 6 months ago, namely the upgrade. And perhaps worse, you now have an index or constraint that could potentially be corrupted by innocent tinkering with session NLS settings.

So before you upgrade, definitely take a cursory glance through your constraint definitions and take remedial action if needed.

Parallel Execution of PL/SQL

A recent experience with text searching showed up some limitations on parallel execution of PL/SQL functions, and how to work around them.

Text Searching can be CPU Intensive?

It started with a requirement to find names that roughly matched a search string. The roughness of the match was determined by calculating the Levenshtein distance between each customer name and the search string. Informally, the Levenshtein distance is a mathematical function that calculates the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other. Oracle has implemented it as a function in a delivered package UTL_MATCH.EDIT_DISTANCE().  This function isn't directly relevant to the rest of the discussion about parallelism, except that it is an example of CPU intensive processing that doesn't alter the database where parallel execution may be a legitimate tactic for improving performance.
The examples in this article use the SH schema in the Oracle sample schemas (available on Github).

The following query searches case-insensitively for customer names within a Levenshtein distance of 3 from 'Stephen'. It finds 'Steven' and 'Staphany'.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">set autotrace on timi on pages 99 lines 200
with x as (
select c.cust_first_name
, utl_match.edit_distance(upper(cust_first_name),'STEPHEN') edit_distance
from customers c
)
select * from x
where edit_distance <= 3
/

CUST_FIRST_NAME EDIT_DISTANCE
-------------------- -------------
Staphany 3
Steven 2
Steven 2

However, to do so Oracle had to full scan the table and execute the function for every row in the table.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">Plan hash value: 2008213504
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2775 | 19425 | 429 (4)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 2775 | 19425 | 429 (4)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("UTL_MATCH"."EDIT_DISTANCE"(UPPER("CUST_FIRST_NAME"),'STEPHEN')<=3)

Oracle implemented Levenshtein as a C function that is then called from PL/SQL, so it just consumes CPU and doesn't do anything else to the database. You can start to see why the user complained that the query with UTL_MATCH was slow.
However, the first question is how much time do we spend on the full scan and how much time do we spend executing this function?

Follow the Time with Instrumentation 

For test purposes, I am going to build my own packaged functions with session instrumentation. Then I can use Active Session History (ASH) to work out where the time went.
NB: ASH requires a licence for the Diagnostics Pack

  • One function levenshtein() calls to UTL_MATCH.EDIT_DISTANCE().
  • The other dolittle() is a control function that does nothing except instrumentation. It is used to measure the intrusion effect of the instrumentation.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE OR REPLACE package dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
END;
/

CREATE OR REPLACE package body dmk AS
function levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
l_distance INTEGER;
l_module VARCHAR2(64);
l_action VARCHAR2(64);
BEGIN
dbms_application_info.read_module(l_module,l_action); /*save current module/action*/
dbms_application_info.set_action('levenshtein()'); /*set action for function*/
l_distance := utl_match.edit_distance(UPPER(p1),UPPER(p2));
dbms_application_info.set_action(l_action); /*restore previous action*/
RETURN l_distance;
END levenshtein;

function dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER IS
l_distance INTEGER;
l_module VARCHAR2(64);
l_action VARCHAR2(64);
BEGIN
dbms_application_info.read_module(l_module,l_action);
dbms_application_info.set_action('dolittle()');
l_distance := 1;
dbms_application_info.set_action(l_action);
RETURN l_distance;
END dolittle;
END dmk;
/

Now, I will run a test query that executes UTL_MATCH for each of the 55500 rows in the CUSTOMERS table and executes that all 10 times inside a PL/SQL block.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">set timi on autotrace on lines 500 serveroutput on
DECLARE
l_counter INTEGER := 0;
BEGIN
dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
FOR j IN 1..10 LOOP
FOR i IN (
select sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
, sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
from customers c
) LOOP
l_counter := l_counter+i.b;
END LOOP;
END LOOP;
dbms_application_info.set_module(null,null);
dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:25.08

We can see that the Levenshtein function was executed 555000 times.  Now I will query the ASH for this test and group it by ACTION.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">set timi on autotrace off lines 500
column module format a25
column action format a25
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE ACTION SQL_ID ASH_SECS
------------------------- ------------------------- ------------- ----------
DMK LEVENSHTEIN TEST dolittle() 7gp0w6qdvxrd2 2
DMK LEVENSHTEIN TEST levenshtein() 7gp0w6qdvxrd2 3
DMK LEVENSHTEIN TEST 7gp0w6qdvxrd2 20

The runtime of the Levenshtein function took 3 seconds, and the function that does nothing except instrumentation is 2 seconds, so the overhead of UTL_MATCH is only about 1 second, and there are 20 seconds in the SQL. In this test, the overhead of Levenshtein is low, but it would still be worth doing the full scan in parallel.

I Can't Get No Parallelism 

But there is a problem! I specified parallelism with a hint, but I don't get a parallel plan.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select /*+PARALLEL(C 4)*/ avg(utl_match.edit_distance(UPPER(cust_first_name),'STEPHEN'))
from customers c
/

Plan hash value: 1978308596
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 115 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 7 | | |
| 2 | SORT AGGREGATE | | 1 | 7 | | |
| 3 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 379K| 115 (0)| 00:00:01 |
---------------------------------------------------------------------------------

However, if I use a simple SQL function, then I do get parallelism.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select /*+PARALLEL(C 4)*/ max(cust_first_name)
from customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 115 (0)| 00:00:01 | | | |
| 1 | SORT AGGREGATE | | 1 | 7 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 7 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 7 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 55500 | 379K| 115 (0)| 00:00:01 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 379K| 115 (0)| 00:00:01 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------

Note
-----
- Degree of Parallelism is 4 because of table property

So, UTL_MATCH.EDIT_FUNCTION must be preventing parallelism in some way. There is a closed Oracle bug 169587070 that details exactly this problem with UTL_MATCH.

PARALLEL_ENABLE and PRAGMA RESTRICT_REFERENCES 

This is also documented default behaviour in PL/SQL. Database VLDB and Partitioning Guide: About Parallel Execution of Functions, Functions in Parallel Queries:
"A user-written function may be executed in parallel in any of the following cases: 

  • If it has been declared with the PARALLEL_ENABLE keyword
  • If it is declared in a package or type and has a PRAGMA RESTRICT_REFERENCES clause that indicates all of WNDS, RNPS, and WNPS 
  • If it is declared with CREATE FUNCTION and the system can analyze the body of the PL/SQL code and determine that the code neither writes to the database nor reads or modifies package variables" 

Oracle only wraps the bodies of their delivered package, but not the package specifications. So, you can see for yourself that the delivered UTL_MATCH package does not contain these specifications.

Workarounds 

There are various workarounds for this.

  • I could add the PARALLEL_ENABLE declarations to the functions in the package specification of the delivered UTL_MATCH package. Although it does work, I would certainly not be happy to alter any delivered Oracle package in any serious database, without approval from Oracle support.
  • Or, I could add the RESTRICT_REFERENCES pragma to the package specification. Again, although this works, it involves altering a delivered package.
  • However, I can wrap the delivered package in my own packaged function with either PARALLEL_ENABLE (my personal preferrance).
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE;
END;
/
  • Or you can use RESTRICT_REFERENCES in the package specification, but you must include the TRUST pragma to over-ride the lack of a pragma definition in the called packaged function.
#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">CREATE OR REPLACE package dmk AS
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (levenshtein,WNDS,RNDS,WNPS,RNPS,TRUST);
FUNCTION dolittle(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER;
PRAGMA RESTRICT_REFERENCES (do_little,WNDS,RNDS,WNPS,RNPS,TRUST);
END;
/

Now, I get parallel execution of the unaltered delivered UTL_MATCH.EDIT_DISTANCE() function.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">select /*+PARALLEL(C 4) FULL(C)*/ sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
, sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
from customers c
/

Plan hash value: 1221513835
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 7 | 115 (0)| 00:00:01 | | | |
| 1 | SORT AGGREGATE | | 1 | 7 | | | | | |
| 2 | PX COORDINATOR | | | | | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 7 | | | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | 1 | 7 | | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | 55500 | 379K| 115 (0)| 00:00:01 | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| CUSTOMERS | 55500 | 379K| 115 (0)| 00:00:01 | Q1,00 | PCWP | |
-----------------------------------------------------------------------------------------------------------------

Note
-----
- Degree of Parallelism is 4 because of table property

Conclusion: Parallel PL/SQL 

I can now repeat the earlier test, but with a parallel hint, the runtime goes down from 25 to 9 seconds, although about the same amount of database time is recorded by ASH. So parallelism can improve the performance for the end user, but will not reduce the total CPU overhead. If anything is likely to increase overall CPU time.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">set timi on lines 500
DECLARE
l_counter INTEGER := 0;
BEGIN
dbms_application_info.set_module('DMK LEVENSHTEIN TEST',null);
FOR j IN 1..10 LOOP
FOR i IN (
select /*+PARALLEL(C 4)*/
sum(dmk.levenshtein(cust_first_name,'STEPHEN')) a
, sum(dmk.dolittle(cust_first_name,'STEPHEN')) b
from customers c
) LOOP
l_counter := l_counter+i.b;
END LOOP;
END LOOP;
dbms_application_info.set_module(null,null);
dbms_output.put_line('Executions: '||l_counter);
END;
/
Executions: 555000

PL/SQL procedure successfully completed.

Elapsed: 00:00:09.34

set timi on autotrace off lines 500
column module format a32
column action format a32
select module, action, sql_id, sum(1) ash_Secs
from v$active_Session_History
where module = 'DMK LEVENSHTEIN TEST'
and sample_time >= SYSDATE-1/1440
group by module, action, sql_id
/

MODULE ACTION SQL_ID ASH_SECS
-------------------------------- -------------------------------- ------------- ----------
DMK LEVENSHTEIN TEST 3391gqpzu5g2k 21
DMK LEVENSHTEIN TEST levenshtein() 3391gqpzu5g2k 5
DMK LEVENSHTEIN TEST dolittle() 3391gqpzu5g2k 4

NB: I have not been able to get parallelism to work in a PL/SQL function defined in a WITH clause because you cannot specify PARALLEL_ENABLE, and a pragma can only be specified in a package specification.

#eeeeee; border: 0px solid #000000; font-family: courier new; font-size: 100%; overflow: auto; padding-left: 4px; padding-right: 4px; width: 95%;">WITH
FUNCTION levenshtein(p1 VARCHAR2, p2 VARCHAR2) RETURN INTEGER PARALLEL_ENABLE IS
BEGIN
RETURN utl_match.edit_distance(UPPER(p1),UPPER(p2));
END;
select /*+PARALLEL(C 4)*/
sum(levenshtein(cust_first_name,'STEPHEN')) a
from customers c
/
ORA-06553: PLS-712: illegal option for subprogram LEVENSHTEIN

Gooey GUIDs

Do a quick Google search and you’ll find plenty of blog posts about why GUIDs are superior to integers for a unique identifier, and of course, an equal number of posts about why integers are superior to GUIDs. In the Oracle world, most people have been using sequence numbers since they were pretty much the only option available to us in earlier versions. But developers coming from other platforms often prefer GUIDs simply due to their familiarity with them.

I’m pretty much ambivalent when it comes to which one to use.  In fact, a good example is the AskTOM database -which had exclusively sequence-based primary keys on inception, but as the database has evolved and developers of different backgrounds and ages have worked on it, there is now of mix of strategies. The older tables have sequence based primary keys, and many of the newer tables have GUIDs as primary keys. Don’t get me wrong – I’m not advocating that you should have a mix – for a given database schema I’d recommend picking one regime and sticking with it. But my point is, that even with the mix of approaches in the AskTOM schema, I’ve never encountered any problems or issues with either.

However, there is one use case where I would strongly recommend using some caution on the use of GUIDs, and that is in the arena of systems that load data in bulk (eg data warehouses).

GUIDs are not cheap to generate. A quick look at the history on the structure and generation of unique IDs at https://en.wikipedia.org/wiki/Universally_unique_identifier all come down to a common component – the need to generate a good random number, and “good” can be a struggle for computers because you need algorithms that ensure sufficient randomness and distribution of the generated numbers.  That takes CPU cycles and whilst that is something that you will never notice when using a GUID for the 100 customers in your customer table, it definitely will be noticeable if you are going to attach a GUID to every one of your 10 million daily sales transactions, or telephone records, or similar.

Here’s a simple example where I’ll load 100 million rows into a table.  First I’ll try the conventional (when it comes to Oracle) approach of using a sequence number to uniquely identify each row.



SQL> create sequence seq cache 50000;

Sequence created.

SQL>
SQL> create table t1
  2  ( pk int,
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.


SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t1
  2  select seq.nextval, int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:03:31.42
SQL>
SQL> commit;

Commit complete.

Elapsed: 00:00:00.01
SQL>

3 minutes 30 seconds for 100 million rows is pretty good performance for a laptop, although obviously the table structure here is very simple.

And now I’ll repeat the exercise with the same table structure, but using a raw column to hold the output of a call to SYS_GUID().



SQL>
SQL>
SQL> create table t2
  2  ( pk raw(20),
  3    data1 int,
  4    data2 varchar2(10),
  5    data3 int,
  6    data4 varchar2(10)
  7  )
  8  tablespace demo
  9  /

Table created.

SQL>
SQL> set timing on
SQL> insert /*+ APPEND */ into t2
  2  select sys_guid(), int_val, char_val, int_val, char_val
  3  from
  4   ( select level int_val, to_char(level) char_val from dual connect by level <= 10000 ),
  5   ( select 1 from dual connect by level <= 10000 );

100000000 rows created.

Elapsed: 00:30:56.78
SQL> commit;

Commit complete.

Elapsed: 00:00:00.03

That’s right – we’ve blown out to 30 minutes. As you can see, there can be a large cost when the row volumes (and hence number of calls to generate a GUID) get large. We can even take the INSERT out of the equation here, and simply do a raw stress test to see how many GUIDs we can call from the SQL engine using the following test harness.



SQL> create table t ( sz int, dur interval day to second );

Table created.

SQL>
SQL>
SQL> declare
  2    ts_start timestamp;
  3    ts_end   timestamp;
  4    iter int;
  5    dummy raw(32);
  6  begin
  7   for i in 1 .. 8 loop
  8    iter := power(10,i);
  9
 10    ts_start := systimestamp;
 11    if iter <= 10000 then
 12       select max(x) into dummy from
 13       (
 14       select sys_guid() x from
 15       ( select 1 from dual connect by level <= iter )
 16       );
 17    else
 18       select max(x) into dummy from
 19       (
 20       select sys_guid() x from
 21       ( select 1 from dual connect by level <= iter/10000 ),
 22       ( select 1 from dual connect by level <= 10000 )
 23       );
 24    end if;
 25
 26    ts_end := systimestamp;
 27    insert into t values (iter, ts_end - ts_start );
 28    commit;
 29
 30   end loop;
 31  end;
 32  /

PL/SQL procedure successfully completed.

SQL>
SQL> select * from t order by 1;

        SZ DUR
---------- ----------------------------------------------------------
        10 +00 00:00:00.000000
       100 +00 00:00:00.000000
      1000 +00 00:00:00.015000
     10000 +00 00:00:00.172000
    100000 +00 00:00:01.607000
   1000000 +00 00:00:16.083000
  10000000 +00 00:02:49.713000
 100000000 +00 00:26:46.570000
 
 

I’m not trying to scare you off GUIDs – but like any functionality or feature you’re using to build applications, make sure you test it for business requirements you need to satisfy and make an informed decision on how best to use (or not use) it.

A tribute to Natural Join

By Franck Pachot

.
I know that lot of people are against the ANSI join syntax in Oracle. And this goes beyond the limits when talking about NATURAL JOIN. But I like them and use them quite often.

Why is Natural Join bad?

Natural join is bad because it relies on column names, and, at the time of writing the query, you don’t know which columns will be added or removed later. Here is an example on the SCOTT schema, joining on DEPTNO which has the same name in DEPT and EMP:

SQL> select * from EMP natural join DEPT where DNAME='SALES';
 
DEPTNO EMPNO ENAME JOB MGR HIREDATE SAL COMM DNAME LOC
---------- ---------- ---------- --------- ---------- --------- ---------- ---------- -------------- -------------
30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 SALES CHICAGO
30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 SALES CHICAGO
30 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 SALES CHICAGO
30 7900 JAMES CLERK 7698 03-DEC-81 950 SALES CHICAGO
30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 SALES CHICAGO
30 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 SALES CHICAGO

The DEPT table has a ‘LOC’column for the location of the department. But the data model may evolve and you may add a location for each employee. And we may also call it LOC:

SQL> alter table EMP add (LOC varchar2(10));
Table altered.

But now our Natural Join adds this column to the join predicate and the result is wrong because it shows only rows which have same department location as employee location:

SQL> select * from EMP natural join DEPT where DNAME='SALES';
 
no rows selected

Projection

In my opinion, the problem is not the Natural Join. Column names have a meaning for their tables. But the tables have different roles in our queries. As soon as a table or view participates to our query, we should redefine the column names. If we don’t, the result is completely wrong as:

SQL> select * from EMP join DEPT using(DEPTNO) where DNAME='SALES';
 
DEPTNO EMPNO ENAME JOB MGR HIREDATE SAL COMM LOC DNAME LOC
---------- ---------- ---------- --------- ---------- --------- ---------- ---------- ---------- -------------- -------------
30 7521 WARD SALESMAN 7698 22-FEB-81 1250 500 SALES CHICAGO
30 7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 SALES CHICAGO
30 7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 SALES CHICAGO
30 7900 JAMES CLERK 7698 03-DEC-81 950 SALES CHICAGO
30 7698 BLAKE MANAGER 7839 01-MAY-81 2850 SALES CHICAGO
30 7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 SALES CHICAGO

Look: the result has two columns with the same name. This is completely wrong for a relational database and I don’t even understand why this parses without raising an error.

The projection is the most important relational operation, often overlooked as if it was just a rename for aesthetic purpose. You need to name the columns of your result set. They are the metadata for the interface between SQL and the host language. ‘select *’ is a shortcut when running an interactive query, to get a glance at the result rows. But a SQL query result is not complete without proper column names. And in most cases, at least when you query more than one table, the name of the query result columns should be different than the name of the underlying table columns. A department may have a location. And an employee may have a location. But the location of the employee department is something completely different than the employee location.

Then, as you need to name each column anyway, why not doing it as soon as possible? Do it for each table involved in the query, so that you are sure that all column names are correct within the query. As soon as you introduce an new table in the FROM clause, you should actually name the columns according to their role in the query. Let’s take an example with an airline data model. Each airport is linked to a city. This can be a CITY column in the AIRPORTS table. But as soon as you join FLIGHTS with AIRPORTS, this table has a different role. You join on destination airport or source airport. Then you alias the AIRPORTS table in the FROM clause, such as DST_AIRPORTS or SRC_AIRPORTS. Within the query, you can reference the columns with the table alias, such as DST_AIRPORTS.CITY or SRC_AIRPORTS.CITY but this cannot be exposed as-is in the query result. You must name them in the SELECT clause with something like SELECT DST_AIRPORTS.CITY as DST_ARP_CITY , SRC_AIRPORTS.CITY as SRC_ARP_CITY.

Then, as I’ll need to rename them anyway, I prefer to do it as soon as I join to a new table in the FROM clause. Instead of joining to AIRPORTS DST_AIRPORTS I can join to (SELECT IATA DST_ARP_IATA, CITY DST_ARP_CITY FROM AIRPORTS) and all column names will relate to the role without table aliases and without further renaming. And when I do that correctly, I can use natural join without risk.

Projection in the FROM clause

Let’s take an example. Here is a query in DEPT where I explicitly mention that LOC is the department location. This is implicit when the column name belongs to the DEPT table. But it will not be implicit anymore once I join this table to another table. Here is the view ready to be included in any query:


SQL> select DEPTNO,DNAME DEPT_DNAME,LOC DEPT_LOC from DEPT where DNAME='SALES';
 
DEPTNO DEPT_DNAME DEPT_LOC
---------- -------------- -------------
30 SALES CHICAGO

Now, I can join this to the EMP table. I prefix all columns from EMP with “EMP_” and all columns from DEPT with “EMP_DEPT_” because they belong to DEPT when in the role of employee department:

SQL> select EMP_EMPNO,EMP_ENAME,EMP_DEPT_DNAME,EMP_DEPT_LOC,EMP_LOC,EMP_MGR_EMPNO
from
(select DEPTNO EMP_DEPTNO,EMPNO EMP_EMPNO,ENAME EMP_ENAME,MGR EMP_MGR_EMPNO,LOC EMP_LOC from EMP)
natural join
(select DEPTNO EMP_DEPTNO,DNAME EMP_DEPT_DNAME,LOC EMP_DEPT_LOC from DEPT)
where EMP_DEPT_DNAME='SALES';
 
EMP_EMPNO EMP_ENAME EMP_DEPT_DNAME EMP_DEPT_LOC EMP_LOC EMP_MGR_EMPNO
---------- ---------- -------------- ------------- ---------- -------------
7521 WARD SALES CHICAGO 7698
7844 TURNER SALES CHICAGO 7698
7499 ALLEN SALES CHICAGO 7698
7900 JAMES SALES CHICAGO 7698
7698 BLAKE SALES CHICAGO 7839
7654 MARTIN SALES CHICAGO 7698

As you can see, when the names are clearly indicating the column with its role in the join, and how they are correlated with the other tables, there is no need to mention any join predicate. I used Natural Join because the join is on EMP_DEPTNO and I’m sure that it will always be the one and only one column with the same name. By query design.

And the column names in the result are correct, explicitly mentioning what is an Employee attribute or an Employee department attribute. That can be easy to parse and put in an object graph in the host language. You can see there that the MGR column of EMP was named EMP_MGR_EMPNO because this is actually what it is: the EMPNO of the employee manager. It is a foreign key to the EMP table.

And then, adding more information about the manager is easy: join with EMP again but with the proper projection of columns: EMPNO will be EMP_MGR_EMPNO when in the role of the employee manager, ENAME will be EMP_MGR_ENAME, DEPTNO will be EMP_MGR_DEPTNO, and so on:


SQL> select EMP_EMPNO,EMP_ENAME,EMP_DEPT_DNAME,EMP_DEPT_LOC,EMP_LOC,EMP_MGR_DEPTNO,EMP_MGR_ENAME
from
(select DEPTNO EMP_DEPTNO,EMPNO EMP_EMPNO,ENAME EMP_ENAME,MGR EMP_MGR_EMPNO,LOC EMP_LOC from EMP)
natural join
(select DEPTNO EMP_DEPTNO,DNAME EMP_DEPT_DNAME,LOC EMP_DEPT_LOC from DEPT)
natural join
(select DEPTNO EMP_MGR_DEPTNO,EMPNO EMP_MGR_EMPNO,ENAME EMP_MGR_ENAME from EMP)
where EMP_DEPT_DNAME='SALES';
 
EMP_EMPNO EMP_ENAME EMP_DEPT_DNAME EMP_DEPT_LOC EMP_LOC EMP_MGR_DEPTNO EMP_MGR_ENAME
---------- ---------- -------------- ------------- ---------- -------------- -------------
7900 JAMES SALES CHICAGO 30 BLAKE
7499 ALLEN SALES CHICAGO 30 BLAKE
7654 MARTIN SALES CHICAGO 30 BLAKE
7844 TURNER SALES CHICAGO 30 BLAKE
7521 WARD SALES CHICAGO 30 BLAKE
7698 BLAKE SALES CHICAGO 10 KING

No need to review the whole query when adding a new table. No need to solve the new ‘column ambiguously defined’. We don’t even need to alias the tables here.

Want to add the department name of the manager? That’s easy: join to DEPT with the right column projection (all prefixed by EMP_MGR_DEPT as the new columns are all about the employee manager’s department):

SQL> select EMP_EMPNO,EMP_ENAME,EMP_DEPT_DNAME,EMP_DEPT_LOC,EMP_LOC,EMP_MGR_DEPTNO,EMP_MGR_ENAME,EMP_MGR_DEPT_DNAME
from
(select DEPTNO EMP_DEPTNO,EMPNO EMP_EMPNO,ENAME EMP_ENAME,MGR EMP_MGR_EMPNO,LOC EMP_LOC from EMP)
natural join
(select DEPTNO EMP_DEPTNO,DNAME EMP_DEPT_DNAME,LOC EMP_DEPT_LOC from DEPT)
natural join
(select DEPTNO EMP_MGR_DEPTNO,EMPNO EMP_MGR_EMPNO,ENAME EMP_MGR_ENAME from EMP)
natural join
(select DEPTNO EMP_MGR_DEPTNO,DNAME EMP_MGR_DEPT_DNAME,LOC EMP_MGR_DEPT_LOC from DEPT)
where EMP_DEPT_DNAME='SALES';
 
EMP_EMPNO EMP_ENAME EMP_DEPT_DNAME EMP_DEPT_LOC EMP_LOC EMP_MGR_DEPTNO EMP_MGR_EN EMP_MGR_DEPT_D
---------- ---------- -------------- ------------- ---------- -------------- ---------- --------------
7698 BLAKE SALES CHICAGO 10 KING ACCOUNTING
7900 JAMES SALES CHICAGO 30 BLAKE SALES
7499 ALLEN SALES CHICAGO 30 BLAKE SALES
7654 MARTIN SALES CHICAGO 30 BLAKE SALES
7844 TURNER SALES CHICAGO 30 BLAKE SALES
7521 WARD SALES CHICAGO 30 BLAKE SALES

This can be even easier when you generate SQL queries. When adding a new table to join to, you just prefix all columns with their role. Check foreign keys so that the naming is consistent with the referenced tables. Then when parsing the result, the naming convention can help to break on the object hierarchy.

Additional notes

I mentioned that aliasing the subquery is not mandatory because I do not have to prefix the column names. However, when looking at the predicates section of the execution plan, the columns may be prefixed with an internal alias:

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("from$_subquery$_006"."EMP_MGR_DEPTNO"="from$_subquery$_009"."EMP_MGR_DEPTNO")
2 - access("from$_subquery$_001"."EMP_MGR_EMPNO"="from$_subquery$_006"."EMP_MGR_EMPNO" AND "from$_subquery$_001"."EMP_DEPTNO"="from$_subquery$_003"."EMP_DEPTNO")

Then it is a good idea to add prefixes, such as EMP, EMP_DEPT, EMP_MGR EMP_MGR_DEPTNO in the query above so that the predicates become:

Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("EMP_MGR"."EMP_MGR_DEPTNO"="EMP_MGR_DEPT"."EMP_MGR_DEPTNO")
2 - access("EMP"."EMP_MGR_EMPNO"="EMP_MGR"."EMP_MGR_EMPNO" AND "EMP"."EMP_DEPTNO"="EMP_DEPT"."EMP_DEPTNO")
5 - filter("DNAME"='SALES')

I also like to add a QB_NAME hint so that I can reference easily those subqueries if I have to add some hints there. Finally, this is what I can generate for this query:


SQL> select EMP_EMPNO,EMP_ENAME,EMP_DEPT_DNAME,EMP_DEPT_LOC,EMP_LOC,EMP_MGR_DEPTNO,EMP_MGR_ENAME,EMP_MGR_DEPT_DNAME
from
(select /*+qb_name(EMP)*/ DEPTNO EMP_DEPTNO,EMPNO EMP_EMPNO,ENAME EMP_ENAME,MGR EMP_MGR_EMPNO,LOC EMP_LOC from EMP) EMP
natural join
(select /*+qb_name(EMP_DEPT)*/ DEPTNO EMP_DEPTNO,DNAME EMP_DEPT_DNAME,LOC EMP_DEPT_LOC from DEPT) EMP_DEPT
natural join
(select /*+qb_name(EMP_MGR)*/ DEPTNO EMP_MGR_DEPTNO,EMPNO EMP_MGR_EMPNO,ENAME EMP_MGR_ENAME from EMP) EMP_MGR
natural join
(select /*+qb_name(EMP_MGR_DEPT)*/ DEPTNO EMP_MGR_DEPTNO,DNAME EMP_MGR_DEPT_DNAME,LOC EMP_MGR_DEPT_LOC from DEPT) EMP_MGR_DEPT
where EMP_DEPT_DNAME='SALES';

So what?

My goal here is not to recommend to always use natural joins. This depends on the context (ad-hoc queries, embedded ones in existing code with naming standards,…) and whether con control exactly the columns names. There are also a few bugs with ANSI joins, and natural join is not widely used, so maybe not tested a lot. But when I hear that Natural Join is bad, I want to explain the why/how/when. And one of the good sides of it is that it forces us to do the projection/rename as soon as possible and this makes the query easier to read/maintain/evolve. Of course, using natural join in that way requires that all tables are added to the FROM clause through a subquery which carefully names all columns in the SELECT clause so that the correlation with the other tables is clearly defined.

 

Cet article A tribute to Natural Join est apparu en premier sur Blog dbi services.