Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Parse Calls

When dealing with the library cache / shared pool it’s always worth checking from time to time to see if a new version of Oracle has changed any of the statistics you rely on as indicators of potential problems. Today is also (coincidentally) a day when comments about “parses” and “parse calls” entered my field of vision from two different directions. I’ve tweeted out references to a couple of quirkly little posts I did some years ago about counting parse calls and what a parse call may entail, but I thought I’d finish the day off with a little demo of what the session cursor cache does for you when your client code issues parse calls.

There are two bit of information I want to highlight – activity in the library cache and a number that shows up in the session statistics. Here’s the code to get things going:

rem
rem     Script:         12c_session_cursor_cache.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem
rem     Note:
rem     start_1.sql contains the one line
rem          select * from t1 where n1 = 0;
rem

create table t1 
as
select 99 n1 from dual
;

execute dbms_stats.gather_table_stats(user,'t1')

spool 12c_session_cursor_cache

prompt  =======================
prompt  No session cursor cache
prompt  =======================

alter session set session_cached_cursors = 0;

set serveroutput off
set feedback off

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

@start_1000

set feedback on
set serveroutput on

execute snap_my_stats.end_snap
execute snap_libcache.end_snap


prompt  ============================
prompt  Session cursor cache enabled
prompt  ============================


alter session set session_cached_cursors = 50;

set serveroutput off
set feedback off

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

execute snap_libcache.start_snap
execute snap_my_stats.start_snap

@start_1000

set feedback on
set serveroutput on

execute snap_my_stats.end_snap
execute snap_libcache.end_snap

spool off

I’ve made use of a couple of little utilities I wrote years ago to take snapshots of my session statistics and the library cache (v$librarycache) stats. I’ve also used my “repetition” framework to execute a basic query 1,000 times. The statement is a simple “select from t1 where n1 = 0”, chosen to return no rows.

The purpose of the whole script is to show you the effect of running exactly the same SQL statement many times – first with the session cursor cache disabled (session_cached_cursors = 0) then with the cache enabled at its default size.

Here are some results from an instance of 12.2.0.1 – which I’ve edited down by eliminating most of the single-digit numbers.

=======================
No session cursor cache
=======================
---------------------------------
Session stats - 23-Apr 17:41:06
Interval:-  4 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
Requests to/from client                                                      1,002
opened cursors cumulative                                                    1,034
user calls                                                                   2,005
session logical reads                                                        9,115
non-idle wait count                                                          1,014
session uga memory                                                          65,488
db block gets                                                                2,007
db block gets from cache                                                     2,007
db block gets from cache (fastpath)                                          2,006
consistent gets                                                              7,108
consistent gets from cache                                                   7,108
consistent gets pin                                                          7,061
consistent gets pin (fastpath)                                               7,061
logical read bytes from cache                                           74,670,080
calls to kcmgcs                                                              5,005
calls to get snapshot scn: kcmgss                                            1,039
no work - consistent read gets                                               1,060
table scans (short tables)                                                   1,000
table scan rows gotten                                                       1,000
table scan disk non-IMC rows gotten                                          1,000
table scan blocks gotten                                                     1,000
buffer is pinned count                                                       2,000
buffer is not pinned count                                                   2,091
parse count (total)                                                          1,035
parse count (hard)                                                               8
execute count                                                                1,033
bytes sent via SQL*Net to client                                           338,878
bytes received via SQL*Net from client                                     380,923
SQL*Net roundtrips to/from client                                            1,003

PL/SQL procedure successfully completed.

---------------------------------
Library Cache - 23-Apr 17:41:06
Interval:-      4 seconds
---------------------------------
Type      Cache                           Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
----      -----                           ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE SQL AREA                       1,040       1,032   1.0       1,089       1,073   1.0         0         1
NAMESPACE TABLE/PROCEDURE                   17          16    .9         101          97   1.0         0         0
NAMESPACE BODY                               9           9   1.0          26          26   1.0         0         0
NAMESPACE SCHEDULER GLOBAL ATTRIBU          40          40   1.0          40          40   1.0         0         0

PL/SQL procedure successfully completed.

The thing to notice, of course, is the large number of statistics that are (close to) multiples of 1,000 – i.e. the number of executions of the SQL statement. In particular you can see the ~1,000 “parse count (total)” which is not reflected in the “parse count (hard)” because the statement only needed to be loaded into the library cache and optimized once.

The other notable statistics come from the library cache where we do 1,000 gets and pins on the “SQL AREA” – the “get” creates a “KGL Lock” (the “breakable parse lock”) that is made visible as an entry in v$open_cursor (x$kgllk), and the the “pin” created a “KGL Pin” that makes it impossible for anything to flush the child cursor from memory while we’re executing it.

So what changes when we enabled the session cursor cache:


============================
Session cursor cache enabled
============================

Session altered.

---------------------------------
Session stats - 23-Apr 17:41:09
Interval:-  3 seconds
---------------------------------
Name                                                                         Value
----                                                                         -----
Requests to/from client                                                      1,002
opened cursors cumulative                                                    1,004
user calls                                                                   2,005
session logical reads                                                        9,003
non-idle wait count                                                          1,013
db block gets                                                                2,000
db block gets from cache                                                     2,000
db block gets from cache (fastpath)                                          2,000
consistent gets                                                              7,003
consistent gets from cache                                                   7,003
consistent gets pin                                                          7,000
consistent gets pin (fastpath)                                               7,000
logical read bytes from cache                                           73,752,576
calls to kcmgcs                                                              5,002
calls to get snapshot scn: kcmgss                                            1,002
no work - consistent read gets                                               1,000
table scans (short tables)                                                   1,000
table scan rows gotten                                                       1,000
table scan disk non-IMC rows gotten                                          1,000
table scan blocks gotten                                                     1,000
session cursor cache hits                                                    1,000
session cursor cache count                                                       3
buffer is pinned count                                                       2,000
buffer is not pinned count                                                   2,002
parse count (total)                                                          1,002
execute count                                                                1,003
bytes sent via SQL*Net to client                                           338,878
bytes received via SQL*Net from client                                     380,923
SQL*Net roundtrips to/from client                                            1,003

PL/SQL procedure successfully completed.

---------------------------------
Library Cache - 23-Apr 17:41:09
Interval:-      3 seconds
---------------------------------
Type      Cache                           Gets        Hits Ratio        Pins        Hits Ratio   Invalid    Reload
----      -----                           ----        ---- -----        ----        ---- -----   -------    ------
NAMESPACE SQL AREA                           5           5   1.0       1,014       1,014   1.0         0         0
NAMESPACE TABLE/PROCEDURE                    7           7   1.0          31          31   1.0         0         0
NAMESPACE BODY                               6           6   1.0          19          19   1.0         0         0

PL/SQL procedure successfully completed.

The first thing to note is that “parse count (total)” still shows up 1,000 parse calls. However we also see the statistic “session cursor cache hits” at 1,000. Allowing for a little noise around the edges virtually every parse call has turned into a short-cut that takes us through the session cursor cache directly to the correct cursor.

This difference shows up in the library cache activity where we still see 1,000 pins – we have to pin the cursor to execute it – but we no longer see 1,000 “gets”. In the absence of the session cursor cache the session has to keep searching for the statement then creating and holding a KGL Lock while we execute the statement – but when the cache is enabled the session will very rapidly recognise that the statement is one we are likely to re-use, so it will continue to hold the KGL lock
after we have finished executing the statement and we can record the location of the KGL lock in a session state object. After the first couple of executions of the statement we no longer have to search for the statement and attach a spare lock to it, we can simply navigate from our session state object to the cursor.

As before, the KGL Lock will show up in v$open_cursor – though this time it will not disappear between executions of the statement. Over the history of Oracle versions the contents of v$open_cursor have become increasingly helpful, so I’ll just show you what the view held for my session by the end of the test:


SQL> select cursor_type, sql_text from V$open_cursor where sid = 250 order by cursor_type, sql_text;

CURSOR_TYPE                                                      SQL_TEXT
---------------------------------------------------------------- ------------------------------------------------------------
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN DBMS_OUTPUT.DISABLE; END;
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN snap_libcache.end_snap; END;
DICTIONARY LOOKUP CURSOR CACHED                                  BEGIN snap_my_stats.end_snap; END;
DICTIONARY LOOKUP CURSOR CACHED                                  SELECT DECODE('A','A','1','2') FROM SYS.DUAL
OPEN                                                             begin         dbms_application_info.set_module(
OPEN                                                             table_1_ff_2eb_0_0_0
OPEN-RECURSIVE                                                    SELECT VALUE$ FROM SYS.PROPS$ WHERE NAME = 'OGG_TRIGGER_OPT
OPEN-RECURSIVE                                                   select STAGING_LOG_OBJ# from sys.syncref$_table_info where t
OPEN-RECURSIVE                                                   update user$ set spare6=DECODE(to_char(:2, 'YYYY-MM-DD'), '0
PL/SQL CURSOR CACHED                                             SELECT INDX, KGLSTTYP LTYPE, KGLSTDSC NAME, KGLSTGET GETS, K
PL/SQL CURSOR CACHED                                             SELECT STATISTIC#, NAME, VALUE FROM V$MY_STATS WHERE VALUE !
SESSION CURSOR CACHED                                            BEGIN DBMS_OUTPUT.ENABLE(1000000); END;
SESSION CURSOR CACHED                                            BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END;
SESSION CURSOR CACHED                                            BEGIN snap_libcache.start_snap; END;
SESSION CURSOR CACHED                                            BEGIN snap_my_stats.start_snap; END;
SESSION CURSOR CACHED                                            select * from t1 where n1 = 0
SESSION CURSOR CACHED                                            select /*+ no_parallel */ spare4 from sys.optstat_hist_contr

17 rows selected.

The only one of specific interest is the penultimate one in the output – its type is “SESSION CURSOR CACHED” and we can recognise our “select from t1” statement.

Bitmap Index On Column With 212552698 Distinct Values, What Gives? (I’d Rather Be High)

In my previous post on Indexing The Autonomous Warehouse, I highlighted how it might be necessary to create indexes to improve the performance and scalability of highly selective queries, as it might on any Data Warehouse running on an Exadata platform. In the post, I created a Bitmap Index and showed how improve SQL performance […]

You should set OCSID.CLIENTID

each time you grab an Oracle JDBC connection from the pool

For troubleshooting and monitoring performance, you want to follow what happens from the end-user to the database. It is then mandatory to identify the end-user and application from the database session. With Oracle there are some ‘dbms_application_info’ strings to be set, like MODULE, ACTION and CLIENT_INFO. That’s about the tasks in the application code (like identifying the Java class or method from which the SQL statement is prepared) but that’s not about the end-user.

And you should forget about the CLIENT_INFO which is not very useful and rather misleading. OCSID.MODULE and OCSID.ACTION are set from JDBC with Connection.setClientInfo (One reason I find the CLIENT_INFO name misleading is that it cannot be set with setClientInfo). Of course, you can also call ‘dbms_application_info.set_module’ but that’s an additional call to the database (which means network latency, OS context switch,…). Using the JDBC setClientInfo with the OCSID namespace sends this information with the next call.

Now, about identifying the end-user, there’s the session CLIENT_ID (aka CLIENT_IDENTIFIER) that you can also set with Connection.setClientInfo (OCSID.CLIENTID). This one is visible in many Oracle views and follows the database links. Here is an example, I create a demo user and a database link:

connect sys/oracle@//localhost/PDB1 as sysdba
drop public database link PDB1@SYSTEM;
grant dba to demo identified by demo;
create public database link PDB1@SYSTEM connect to SYSTEM
identified by oracle using '//localhost/PDB1';

The following JavaScript (run from SQLcl) connects with a JDBC Thin driver, sets OCSID.MODULE, OCSID.ACTION and OCSID.CLIENTID, and displays CLIENT_IDENTIFIER, MODULE and ACTION from V$SESSION:

script
var DriverManager = Java.type("java.sql.DriverManager");
var con = DriverManager.getConnection(
"jdbc:oracle:thin:@//localhost/PDB1","demo","demo"
);
con.setAutoCommit(false);
function showSessionInfo(){
var sql=con.createStatement();
var res=sql.executeQuery("\
select client_identifier,service_name,module,action,value \
from v$session \
join v$mystat using(sid) \
join v$statname using(statistic#) \
where name='user calls' \
");
while(res.next()){
print();
print(" CLIENT_IDENTIFIER: "+res.getString(1));
print(" SERVICE: "+res.getString(2));
print(" MODULE: "+res.getString(3));
print(" ACTION: "+res.getString(4));
print(" User Calls: "+res.getInt(5));
print();
}
}
showSessionInfo();
con.setClientInfo('OCSID.CLIENTID','my Client ID');
con.setClientInfo('OCSID.MODULE','my Module');
con.setClientInfo('OCSID.ACTION','my Action');
showSessionInfo();
// run a statement through DBLINK:
var sql=con.createStatement();
sql.executeUpdate("call dbms_output.put_line@PDB1@SYSTEM(null)");

I also display the ‘user calls’ from V$MYSTAT. Here is the output:

SQL> .
CLIENT_IDENTIFIER: null
SERVICE: pdb1
MODULE: JDBC Thin Client
ACTION: null
User Calls: 4
CLIENT_IDENTIFIER: my Client ID
SERVICE: pdb1
MODULE: my Module
ACTION: my Action
User Calls: 5

The second execution sees the MODULE, ACTION and CLIENT_IDENTIFIER set with the previous setClientInfo(). And the most important is that the ‘user calls’ statistic has been incremented only by one, which means that setting them did not add any additional roundtrips to the database server.

Now, after the call through database link, I display all user sessions from V$SESSION. I can see my SQLcl (java) with nothing set, the JDBC thin session with MODULE, ACTION and CLIENT_IDENTIFIER, and the DBLINK session (connected to SYSTEM) with only the CLIENT_IDENTIFIER set:

SQL> select username,client_identifier,module,action
2 from v$session where type='USER';
  USERNAME   CLIENT_IDENTIFIER                   MODULE       ACTION
__________ ___________________ ________________________ ____________
SYSTEM my Client ID oracle@db192
SYS java@db192 (TNS V1-V3)
DEMO my Client ID my Module my Action

Following the end-user down to all layers (application, database, remote databases) is great for end-to-end troubleshooting and performance analysis. Set this OCSID.CLIENTID to identify the application (micro-)service and the end-user (like a browser Session ID), for no additional cost, and you will find this information in many performance views:

select table_name, listagg(distinct column_name,', ') 
within group (order by column_name)
from dba_tab_columns
where column_name in ('CLIENT_IDENTIFIER','CLIENT_INFO','CLIENT_ID','MODULE','ACTION')
--and table_name like 'GV%'
group by table_name
order by 2;

You see how the ‘CLIENT_INFO’ is useless (except for an additional level to module/action for SQL Monitor) and how CLIENT_ID(ENTIFIER) is everywhere, including ASH (Active Session history).

With a micro-services architecture, you will have many connections to the database (don’t tell me that each microservice has its own database — databases were invented decades ago when streaming data everywhere was an un-maintainable/un-scalable/errorprone mess, and schemas and views were invented to provide this data-micro-services within the same database system). Then the best practice is to:

  • connect with a dedicated SERVICE_NAME
  • identify the end-user with a CLIENT_ID

and then end-to-end tracing, tuning and troubleshooting will become easy.

Not Just the How of AD with Linux VM/SQL 2019, but the WHY

Azure Directory is available with Linux SQL Server 2019 in Preview and as I was setting it up in my Azure environment on a Linux Red Hat 7.3 VM, I was, as many are, happy that they list the commands for the Azure CLI to set up authentication with Azure Directory, but was concerned, that with so many new to Linux, that they didn’t describe in the steps WHY we were running certain commands or setting best practices around Linux database server design.

The setup expects that you already have a Linux VM and SQL 2019 already up and running. The first step they go into is role assignment for the AD login, setting the AD login up as the VM Administrator.

az vm extension set \     
--publisher Microsoft.Azure.ActiveDirectory.LinuxSSH \     
--name AADLoginForLinux \     
--resource-group myResourceGroup \     
--vm-name myVM

The above command expects you to replace “myResourceGroup” and “myVM” with the correct values for the resource group and Linux VM.

The next step then proceeds to run more AZ commands, but it also expects the DBA or administrator to be familiar with Linux and Linux scripting techniques after NOT using them in the first command. I find this assumption leaving us all open to human error:

https://dbakevlar.com/wp-content/uploads/2019/04/vm_cmds_doc-300x196.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/04/vm_cmds_doc-768x501.jpg 768w, https://dbakevlar.com/wp-content/uploads/2019/04/vm_cmds_doc.jpg 1583w" sizes="(max-width: 1024px) 100vw, 1024px" />
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/login-using-aad

Let’s talk about variables in an OS environment. In Windows, we can set these at the command line, both at the session or consistently as part of login scripts. We can set this in the GUI in the My Computer, advanced settings, environment variables. This also can be done for Linux, either at the command line, via login scripts referred to as run commands, (bash_rc) or profiles owned by individual logins.

To set these, you simple enter a unique word and assign it a value. Some flavors of Linux require you to “export” or “set” the variable, not just state it-

export  = 
set  = 

You can then verify the variable is set by using the ECHO command and calling the variable with a $ sign before it:

echo $

You can just as easily remove them by the “unset” command, by overwriting them with new values for the variable keyword, logging out or with other profile/run command scripts.

We can use the set and echo command process to make more sense of what is being performed as part of the Linux AD authentication instructions to the VM, too:

username=$(az account show --query user.name --output tsv) 
 
echo $username 
kgorman@microsoft.com
vm=$(az vm show --resource-group SQL2019_grp --name SQL2019RH1 --query id -o tsv)
echo $vm
/subscriptions/00aa000e-xx00x-xxx-x00-x00x-xx00x/resourceGroups/SQL2019_grp/providers/Microsoft.Compute/virtualMachines/SQL2019RH1

Now we can use these variables as part of the third command and know what is being passed into the full command, knowing that we were able to dynamically push Azure CLI commands into the final command:

 az role assignment create --role "Virtual Machine Administrator Login"     --assignee $username --scope $vm  

The output from the command shows that it pulled the original variables into the third command to complete the requirements to build the role assignment. I did a mass replace to protect the IDs, but you get the idea how to verify that the values are

{
  "canDelegate": null,
  "id": "/subscriptions/00aa000e-xx00x-xxx-x00-x00x-xx00x/resourceGroups/SQL2019_grp/providers/Microsoft.Compute/virtualMachines/SQL2019RH1/providers/Microsoft.Authorization/roleAssignments/
0e00e0b0-bf00-0000-b000-000d0e0fe0d0",
  "name": "0e00e0b0-bf00-0000-b000-000d0e0fe0d0",
  "principalId": "d0c00cba-0c0b-00f0-b00e-d00000000000",
  "resourceGroup": "SQL2019_grp",
  "roleDefinitionId": "/subscriptions/
00aa000e-xx00x-xxx-x00-x00x-xx00x/providers/Microsoft.Authorization/roleDefinitions/0c0000c0-00e0-0000-0000-ea0c00e000e0",
  "scope": "/subscriptions/
00aa000e-xx00x-xxx-x00-x00x-xx00x/resourceGroups/SQL2019_grp/providers/Microsoft.Compute/virtualMachines/SQL2019RH1",
  "type": "Microsoft.Authorization/roleAssignments"

A good update for this doc would be to first explain how and why we set variables and then use the default configurations for the session to require less commands having to be entered:

username=$(az account show --query user.name --output tsv) 
az configure --defaults group=SQL2019_grp
az configure --defaults vm=SQL2019RH1
vm=$(az vm show --query id -o tsv)

az vm extension set \     
--publisher Microsoft.Azure.ActiveDirectory.LinuxSSH \     
--name AADLoginForLinux 


"name": "AADLoginForLinux",
   "protectedSettings": null,
   "provisioningState": "Succeeded",
   "publisher": "Microsoft.Azure.ActiveDirectory.LinuxSSH",
   "resourceGroup": "SQL2019_grp",
   "settings": null,
   "tags": null,
   "type": "Microsoft.Compute/virtualMachines/extensions",
   "typeHandlerVersion": "1.0",
   "virtualMachineExtensionType": "AADLoginForLinux"

And in the end, we can check our work to verify we’ve done everything correctly:

az vm list --query '[].{Name:name, OS:storageProfile.osDisk.osType, Admin:osProfile.adminUsername}' --output table

You should see your new AD login now in the list.

This doesn’t take you too far into Linux scripting, but hopefully it answers the WHY you are running the commands that are displayed in the instructions and the order that you’re running them in.



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Not Just the How of AD with Linux VM/SQL 2019, but the WHY], All Right Reserved. 2019.

The Late Spring Speaking Gauntlet

There are busy times for everyone and if you speak at conferences, the busy times are March,May and November. I am recovering from the early spring rush, and now it’s time to prepare for the late spring one.

I’ve been fortunate enough to be accepted to speak at the following regional SQL Saturdays and look forward to speaking and meeting new folks, along with catching up with conference friends:

SQL Saturday Raleigh, April 27th

  • Optimizing Power BI, Taming of the Shrew, (Analytics)

SQL Saturday Jacksonville, May 5th

  • GDPR, The Buck Stops Here, (data governance)
  • Be a Part of the Solution with Automation, (DevOps)
  • Women in Technology Panel, (lunch session)

SQL Saturday Atlanta, May 18th

  • Essential Linux Skills for the DBA, (Linux)

That will finish May up and I’ll have announcements for June soon, but until then, I’ll just be over here typing furiously on a couple book chapters and keeping the day gig going… </p />
</p></div>

    	  	<div class=

PL/SQL – Don’t mix and match scope

Here’s a simple little PL/SQL block where we call an inner procedure PARAMETER_TESTER from its parent block. Pay particular attention to the parameter we pass to the procedure, and it’s value throughout the execution of that procedure.


SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15     parameter_tester(100);
 16  end;
 17  /
Param came in as: 100
Param left as   : 100
glob_var is now : 1

PL/SQL procedure successfully completed.

Now I’ll slowly extend the code, and just by eyeballing it, see if you can predict what the output will be before looking past the end of the PL/SQL block.



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15     local_var := glob_var;
 16     parameter_tester(local_var);
 17  end;
 18  /
Param came in as: 0
Param left as   : 0
glob_var is now : 1

PL/SQL procedure successfully completed.

So far so good I imagine. The parameter came in as zero, we incremented the global variable which of course had no impact on the parameter. Let’s now up the ante a little.



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15     parameter_tester(glob_var);
 16  end;
 17  /
Param came in as: 0
Param left as   : 1
glob_var is now : 1

PL/SQL procedure successfully completed.

This is perhaps the first one that you might find a little unexpected. Notice that the value of the parameter passed to the procedure has changed within the inner procedure even though it was passed (implicitly) as an IN parameter. People often assume that if you pass anything to a procedure without the IN OUT or OUT specification, then the parameter is “read only” and cannot be touched by code. This is true to the extent that you cannot perform an assignment to that parameter as you can see below



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         param := param + 1;
  9     end;
 10
 11  begin
 12     parameter_tester(glob_var);
 13  end;
 14  /
       param := param + 1;
       *
ERROR at line 8:
ORA-06550: line 8, column 8:
PLS-00363: expression 'PARAM' cannot be used as an assignment target
ORA-06550: line 8, column 8:
PL/SQL: Statement ignored

but that is not the same as saying that the parameter is fixed in value throughout the duration of the call. This behaviour is documented in the PL/SQL language manual in that an IN parameter can be passed by reference rather than a static value.

image

Now I’ll explore some other examples of how you might get caught out by this. I’ll modify the example just slightly now so that I’m passing an expression rather than just “glob_var”.



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15     parameter_tester(glob_var+1);
 16  end;
 17  /
Param came in as: 1
Param left as   : 1
glob_var is now : 1

PL/SQL procedure successfully completed.

Now “normal” service has been resumed, in that the expression is evaluated first and hence is passed by value to the procedure leaving the parameter value unchanged throughout the procedure.

So it would appear an expression will disable the “pass by reference” mechanism? Well, let’s try two more examples



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15    parameter_tester(to_char(glob_var));
 16  end;
 17  /
Param came in as: 0
Param left as   : 0
glob_var is now : 1

PL/SQL procedure successfully completed.

SQL>
SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15    parameter_tester(to_number(glob_var));
 16  end;
 17  /
Param came in as: 0
Param left as   : 1
glob_var is now : 1

PL/SQL procedure successfully completed.

Notice the last one in particular. Even though I had a TO_NUMBER expression around “glob_var”, it was still passed by reference and hence the parameter value changed throughout the execution of the inner procedure. This is because the PL/SQL compiler detected that the the TO_NUMBER function was redundant and optimized it out during the compilation process. That left just a parameter input of “glob_var” which could then be passed by reference.

Bottom line – it’s generally considered poor programming practice in any language to be mix and matching the scope of variables, in this case, using glob_var both within and outside the inner procedure. Trying to subvert the behaviour of the PL/SQL engine by using expressions such as to_char(glob_var) is a recipe for disaster. Who knows what additional optimizations the next release of the PL/SQL compiler will have? Perhaps it will optimize out “glob_val+1” or to_char(glob_var) and hence pass them by reference etc. If your functionality really demands on coding with these blurred scopes, then make sure you perform an assignment to a local variable and pass that to avoid unexpected side-effects.



SQL> set serverout on
SQL> declare
  2
  3     glob_var  int := 0;
  4     local_var int;
  5
  6     procedure PARAMETER_TESTER(param int) is
  7     begin
  8         dbms_output.put_line('Param came in as: '||param);
  9         glob_var := glob_var + 1;
 10         dbms_output.put_line('Param left as   : '||param);
 11         dbms_output.put_line('glob_var is now : '||glob_var);
 12     end;
 13
 14  begin
 15     local_var := glob_var;
 16     parameter_tester(local_var);
 17  end;
 18  /
Param came in as: 0
Param left as   : 0
glob_var is now : 1

PL/SQL procedure successfully completed.

Quick and easy masking

I had a request from a client a while back regarding masking of data. They had an application with sensitive data in the Production environment (where access and audit were very tightly controlled) but the issue was how to respect that sensitivity in non-Production environments whilst still preserving full size data sizes for application testing.

After some conversations about requirements, it turned out that since (even in non-Production environments) all access to application components was logged and audited, the issue was simply protecting against “inadvertent” access to sensitive data. For example, in the application, if I searched for “males with black hair with an interest in technology” I should never see the name “Connor McDonald” on screen in the results, because simply viewing that data could be treated as a breach of privacy.

Addenda: Anyone adding a comment to this post with “Shouldn’t that be males with black hair with a hint of grey” will be banned forever Smile

I suggested Data Redaction as a potential solution, but they wanted the data actually obfuscated within the database. Then I suggested Data Masking which definitely does the job, but they felt this offered far more than their simple requirement of just obfuscate the data and retain the distribution of data to preserve performance characteristics as much as possible.

So ultimately here is the basis of a little routine I created for them to obfuscate their data to satisfy their requirement of not viewing meaningful data in their non-Production environments. Please read the footnote at the end of this post before you consider using this code. The logic here is straight forward

  • Take the alphanumeric characters in a logical order as a single string.
  • Use DBMS_RANDOM to come up with 255 variations of that string and store them in an array.
  • Use the TRANSLATE function to perform a simple obfuscation of the input string.

That translation could then be used during the process of copying data from Production, or as a post-copy task.

First I’ll demo the concept of coming up with randomised strings. Here’s a simple anonymous block to create 10 random strings based on a base string of: ABC….Zabc…z0123456789


SQL> set serverout on
SQL> declare
  2    type rand_list is table of varchar2(62) index by pls_integer;
  3    source varchar2(62) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  4    perms  rand_list;
  5    source_variant varchar2(62);
  6  begin
  7    for i in 1 .. 10 loop
  8      source_variant := null;
  9      for j in ( select rownum r from dual connect by level <= 52 order by dbms_random.value )
 10      loop
 11        source_variant := source_variant || substr(source,j.r,1);
 12      end loop;
 13      for j in ( select rownum r from dual connect by level <= 10 order by dbms_random.value )
 14      loop
 15        source_variant := source_variant || substr(source,52+j.r,1);
 16      end loop;
 17      perms(i) := source_variant;
 18      dbms_output.put_line(source_variant);
 19    end loop;
 20  end;
 21  /
xrLCuowbNfvXlMJSBKctPyheZWTnQUjYAdizkFVOIHGsmDRpaEqg5637918402
OkVxApdsLqTlyvgYXQGaSeNPMCjZEJhHRtbcIBrowKUunifzDFmW8013742695
gcVxtNeMdBuRJhvYrnkCIWzoyFwfLpqSATabQGUZKmDOEilPHsXj5601892743
vOUuisITyJjbaQLqfnpFVBoCrYtzHeDAmWExPkNwgScXKRlZGMhd3104879256
VLZkXndmytCWRwKeNgYifvTESzGFhHcqaMJPoljxUsAQrubOpBDI7302189564
VfPLKmObZzBMCpgrjUaQYieRNJykDwtnuWXFxdTEIlScqvohAGsH0138924756
ABMkfozebljnPGqHJhpNwxYEUCuQLyDrOasIgtFZKVmciTXSdvRW5702139864
iNjrmVTlyzFSLswfBvaWKUAHIZOcgMPYXCGqeoDJbhpktERdQunx1976045328
hiYGzxMEZFPcmkutbDlrCeynSITKHNJaXAwfpRsdqLjvQUoWVOgB0479632158
uZpPXOHLAvzricdjtqSQBKVhYoFfmUxEDNCwWlIJsbRgkMeGynaT6451328790

So a simple obfuscation could be:


SQL> select
  2    translate(
  3      'Connor McDonald',
  4      'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789',
  5      'ehZAFjmtYwWilTRDnCcLrvQqEpakKGPMxXyudVUNsoJOBfzSIbHg2163095784')
  6  from dual;

TRANSLATE('CONN
---------------
ZsNNsO lKAsNaVG

Armed with this, I can create a package which extends this to choose from 255 random strings and use these as a simple obfuscator for source data.


SQL> create or replace
  2  package masker is
  3    function rand_masks(p_size int default 255, p_refresh varchar2 default 'N',p_seed number default 0) return  sys.odcivarchar2list pipelined;
  4    function standard_source return varchar2;
  5    function idx_entry(p_idx int, p_size int default 255, p_refresh varchar2 default 'N',p_seed number default 0) return varchar2;
  6    function mask(p_input varchar2) return varchar2;
  7  end;
  8  /

Package created.

SQL> create or replace
  2  package body masker is
  3    perms sys.odcivarchar2list := sys.odcivarchar2list();
  4
  5  procedure init(p_size int,p_seed int) is
  6    source varchar2(62) := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
  7    source_variant varchar2(62);
  8  begin
  9    dbms_random.seed(p_seed);
 10    perms.delete;
 11    perms.extend(p_size);
 12    for i in 1 .. p_size loop
 13      source_variant := null;
 14      for j in ( select rownum r from dual connect by level <= 52 order by dbms_random.value )
 15      loop
 16        source_variant := source_variant || substr(source,j.r,1);
 17      end loop;
 18      for j in ( select rownum r from dual connect by level <= 10 order by dbms_random.value )
 19      loop
 20        source_variant := source_variant || substr(source,52+j.r,1);
 21      end loop;
 22      perms(i) := source_variant;
 23  --      dbms_output.put_line(source_variant);
 24    end loop;
 25  end;
 26
 27  function rand_masks(p_size int default 255, p_refresh varchar2 default 'N',p_seed number default 0) return sys.odcivarchar2list pipelined is
 28  begin
 29    if perms.count < p_size or p_refresh in ('Y','y') then
 30      init(p_size,p_seed);
 31    end if;
 32
 33    for i in 1 .. p_size
 34    loop
 35      pipe row ( perms(i));
 36    end loop;
 37
 38    return;
 39  end;
 40
 41  function standard_source return varchar2 is
 42  begin
 43     return 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
 44  end;
 45
 46  function idx_entry(p_idx int, p_size int default 255, p_refresh varchar2 default 'N',p_seed number default 0) return varchar2 is
 47  begin
 48    if perms.count < p_size or p_refresh in ('Y','y') then
 49      init(p_size,p_seed);
 50    end if;
 51    return perms(p_idx);
 52  end;
 53
 54  function mask(p_input varchar2) return varchar2 is
 55  begin
 56    return translate(p_input,masker.standard_source,masker.idx_entry(ascii(p_input)));
 57  end;
 58
 59  end;
 60  /

Package body created.

Now I’ll test this with some sample data


SQL> create table t ( pk int, n varchar2(50));

Table created.

SQL>
SQL> insert into t values (1,'Connor McDonald');

1 row created.

SQL> insert into t values (2,'Chris Saxon');

1 row created.

SQL> insert into t values (3,'Maria Colgan');

1 row created.

SQL> insert into t values (4,'Bobby Curtis');

1 row created.

SQL> insert into t values (5,'Joel Kallman');

1 row created.

SQL> insert into t values (6,'Steven Feuerstein');

1 row created.

SQL> insert into t values (7,'Connor McDonald');

1 row created.

SQL> select pk, n, masker.mask(n) diddled
  2  from t;

        PK N                                                  DIDDLED
---------- -------------------------------------------------- ------------------------------
         1 Connor McDonald                                    sAJJAe CvnAJjWt
         2 Chris Saxon                                        sweOy RjrAJ
         3 Maria Colgan                                       czEJz BhMbzm
         4 Bobby Curtis                                       nkjjI EpzLBS
         5 Joel Kallman                                       oYfi luiiIuj
         6 Steven Feuerstein                                  CyUrUE SUtUWQyUXE
         7 Connor McDonald                                    sAJJAe CvnAJjWt

7 rows selected.

There we go! A simple obfuscator that runs quite efficiently on source data whilst preserving the distribution of the data.

Footnote: Remember that any masking scheme that does not completely randomize the obfuscation of  source data is not a complete encryption or security solution. Most mechanisms to deduce source data from a manipulated variant of that data involves distribution analysis of letters, n-grams, words and phrases. The moment you preserve distribution of source data for (say) performance and/or optimizer plan preservation, you are allowing for source data to be derived if there is enough of it available to perform that analysis. Remember that this is not a substitute for appropriate security and encryption protections.

Microsoft Webinar on the Future of the DBA- Certifications

Thanks to those that attended of the over 1600 registrations for the Microsoft webinar yesterday on the future of the DBA in the Cloud. This was a fun session for me, as I got to demo one of my favorite skill areas since starting at Microsoft- automation of cloud deployments in Azure.

Bash’in it

Its natural that others would be doing this, but I’m writing all my main scripts in BASH, which prepares the environments, calls the Azure CLI commands and then other scripts, (Power Shell, Json Templates, etc.) from the main “wrapper” script. I’m also deploying, not only the infrastructure or databases, but logical objects and data as part of my deployment, so its pretty fun as this evolves.

So Many Questions

While the pre-recorded webinar was presented to attendees, I was busy answering questions from them. There were so many, that I and two others were barely able to keep up with the sheer onslaught! There were a significant number of questions surrounding certifications for DBAs and developers around Azure certification and how to know what certifications to get and what the path is to achieve certification. I’d like to first point out this great URL from Microsoft learning that can assist:

https://www.microsoft.com/en-us/learning/browse-all-certifications.aspx?certificationtype=role-based

This site, depending on the role and area you are interested in, will create a certification path for the specialist. I think it will provide the answers that many were looking for during the webinar, but if there are more questions around certifications, please don’t hesitate to post in the comments below.

A second set of questions were around what to study if going to migrate to Azure. The first recommendation is to become familiar with the SQL Server Migration Assistant. This tool is incredibly robust and versatile. I use it with customers to ease migration woes from not just SQL Server on-prem to the cloud, but also Oracle, DB2 and even WID, (Window Internal Databases, yeah, those are the databases behind Windows servers) migrations.

This tool will provide you with a full review of your current environment, what definite and potential issues you need to address before you can migrate over the DDL to an Azure DB in the cloud. Once you are happy, it can perform the migration. Once the migration of the DDL is complete, then it can then identify how much data and migrate the data over. As these are single insert processes, I recommend “unchecking” the few “whale in the pond” tables and manually migrate those tables with a different tool, such as AZ Copy via blob storage, Azure Fast Data Transfer or even a BCP process with blog storage.

Skills

Next, on professional skills outside of certification- Azure is on Linux and unlike other database platforms that have always run on Linux, you may not have a Linux admin that can support you, so you may need to know more starting out. Consider learning Linux administration fundamentals, not just how to manage a database on Linux.

For language skills, learn BASH and Python. Both languages are going to support you for the long haul. That’s not saying that your Power shell scripts aren’t going to be transitioned over to Linux hosts. Start working with the Azure Cloud Shell now, creating Cloud storage to migrate your scripts over and update them to support Azure. Learning BASH, as it is a more mature scripting language, (its just older and has a ton of other Unix scripting languages that have fed into it…) will make you more effective.

Python is taking over the world by storm, not just in data science, but in DevOps, too. I do feel that DBAs will be more involved in DevOps in the future and should acquire the skills to make them more valuable to these teams.

Hopefully the following links help get you started and thank you again for attending the webinar. Hopefully we’ll have another one very soon!

Azure Training from Microsoft Learning

Python edx classes

edx Introduction to Linux

Linux Certification from Microsoft Learning



Tags:  


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Microsoft Webinar on the Future of the DBA- Certifications], All Right Reserved. 2019.

In-table predicates

This note was prompted by a recent email asking about the optimizer’s method for estimating the selectivity of a predicate which compared two columns in the same table – for example:  “where orders.amount_invoiced = orders.amount_paid”. It’s been about 14 years since I wrote “Cost Based Oracle – Fundamentals” so my memory of what I wrote (and whether I even mentioned this case) was rather hazy, so I sent off a quick reply and decided to do a little checking.

It turned out that I’d already written a blog note with a throwaway comment about the estimates and a general workaround for optimizer problems caused by examples of this kind. The comment I made about the estimate was that the selectivity seems to be the smaller of the selectivities of (using the example above) “amount_paid = :unpeekable_bind” and “amount_invoice = :unpeekable_bind”. I’m fairly sure I’ve made similar comments several times in the past, but after replying to the email I started to wonder whether this would still be true if there were histograms on the columns. So I ran up a little test and here, to start things off, is the code to generate the data I used for testing:


rem
rem     Script:         column_equality_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Apr 2019
rem     Purpose:

create table t1(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t2(
        id      number(8,0),
        n1      number(6,0)
)
;

create table t3(
        n1      number(6,0),
        n2      number(6,0),
        v1      varchar2(50)
)
;

execute dbms_random.seed(0)

insert into t1
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t2
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                                  id,
        trunc(10 * abs(dbms_random.normal))     n1
from
        generator       v1
;

insert into t3 (n1, n2, v1)
select
        t1.n1,
        t2.n1,
        rpad(rownum,50)
from
        t1, t2
where
        t1.id = t2.id
;

commit;

begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1 for columns n1 size 254'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 254 for columns v1 size 1'
        );
end;
/

select
        table_name, column_name, num_distinct, density, histogram,
        low_value, high_value
from
        user_tab_cols
where
        table_name in ('T1','T2','T3')
and     column_name in ('N1','N2')
order by
        table_name, column_name
;


TABLE_NAME      COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM       LOW_VALUE  HIGH_VALUE
--------------- --------------- ------------ ---------- --------------- ---------- ----------
T1              N1                        38     .00005 FREQUENCY       80         C128

T2              N1                        38     .00005 FREQUENCY       80         C126

T3              N1                        38     .00005 FREQUENCY       80         C128
                N2                        38     .00005 FREQUENCY       80         C126


I’ve created two sets of 10,000 rows each of normally distributed data – but taken the absolute values so I’ve only got half the bell curve, and I’ve scaled up by a factor of 10 and truncated. This has given me two similar but slightly different sets of values which happen to cover 38 distinct values each.

I’ve then generated my test set by joining these two tables on the unique (though not declared as such) id column to give a table with the same number of rows and two skewed sets of data. The calls to dbms_stats create histograms on the skewed data sets, and I’ve reported a few significant numbers about the 4 relevant columns.

Looking at the column statistics we have num_distinct = 38 across the board – so my observation from paragraph 2 above would tend to suggest that the optimizer would report 10,000/38 = 263 as the cardinality estimate for the predciate “t3.n1 = t3.n2” (I’m fairly confident that in this case 1/num_distinct will be preferred over using the density from user_tab_cols). But here’s what we get from a call to explain plan:


explain plan for
select
        v1
from
        t3
where
        n1 = n2
;

select * from table(dbms_xplan.display);

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   564 | 32148 |    18   (6)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T3   |   564 | 32148 |    18   (6)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"="N2")

The estimate is 564 – which is a pretty good estimate in this case (the actual result was 552) as the two columns were randomly generated and there’s no correlation between them. Unfortunately this is quite a long way of my assumption of 263, so where did the optimizer get that number from?

Here’s a query (with result set) that you may recognise from an earlier post.


break on report skip 1
compute count of value on report
compute sum of t1_frequency on report
compute sum of t2_frequency on report
compute sum of product on report

column product format 999,999,999

with f1 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N1'
),
f2 as (
select
        endpoint_value                                                            value,
        endpoint_number - lag(endpoint_number,1,0) over(order by endpoint_number) frequency
from
        user_tab_histograms
where
        table_name  = 'T3'
and     column_name = 'N2'
)
select
        f1.value,
        f1.frequency    t1_frequency,
        f2.frequency    t2_frequency,
        f1.frequency * f2.frequency product
from
        f1, f2
where
        f2.value = f1.value
order by
        f1.value
;



     VALUE T1_FREQUENCY T2_FREQUENCY      PRODUCT
---------- ------------ ------------ ------------
         0          777          768      596,736
         1          806          753      606,918
         2          794          779      618,526
         3          808          763      616,504
         4          752          749      563,248
         5          627          729      457,083
         6          623          628      391,244
         7          584          616      359,744
         8          544          597      324,768
         9          512          546      279,552
        10          441          439      193,599
        11          409          342      139,878
        12          345          370      127,650
        13          318          300       95,400
        14          257          282       72,474
        15          244          242       59,048
        16          214          206       44,084
        17          172          193       33,196
        18          161          140       22,540
        19          113          114       12,882
        20          108           93       10,044
        21           95           81        7,695
        22           72           55        3,960
        23           54           56        3,024
        24           43           36        1,548
        25           38           31        1,178
        26           23           18          414
        27           18           23          414
        28            7           14           98
        29            9           13          117
        30           14           11          154
        31            4            2            8
        32            5            3           15
        33            1            3            3
        35            4            1            4
        37            2            2            4
---------- ------------ ------------ ------------
        36
                   9998         9998    5,643,754


I’m querying the histoggram information for the two columns, and where t3.n1 and t3.n2 have a value in common I’ve reported the two frequencies for that value and the product of the frequencies. For convenience I’ve included a count and a couple of sums to show that there isn’t a perfect match in the set of values for the two columns. The most important number at the bottom of the page, though, is the sum of the products of frequencies of common values. Take that value and divide by 10,000 and you get 564.3754 – compare that with the cardinality estimate of the predicate “t3.n1 = t3.n2”, it’s a perfect match (allowing for rounding).

The query against user_tab_histograms is the query I used to calculate the cardinality of a join where there were frequency histograms on the columns at both ends of the join. The optimizer’s estimate for “intra-table” predicates is consistent with its estimate for joins (in the special cases of “no histograms” and “two frequency histograms”, at least). Viewing it from a slightly different angle: the selectivity of the predicate “n1 = n2” can be derived as “the cardinality estimate for joining t3 to itself” divided by “the cardinality of the cartesian join” (the latter being num_rows * num_rows, of course).

Just as a closing demo – lets generate a plan for the appropriate self-join of t3 and check the cardinality estimate:


explain plan for
select
        t3a.v1, t3b.v1
from
        t3 t3a, t3 t3b
where
        t3a.n2 = t3b.n1
;

select * from table(dbms_xplan.display);


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |  5643K|   581M|   138  (83)| 00:00:01 |
|*  1 |  HASH JOIN         |      |  5643K|   581M|   138  (83)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T3   | 10000 |   527K|    13   (8)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T3A"."N2"="T3B"."N1")


As expected the (rounded) join cardinality is reported as 5,643K.

So the selectivity of the single table predicate “n1 = n2” will be (5,643,000 / (10,000 * 10,000) = 0.05643 and the cardinality estimate of the single table query will be 10,000 * 0.05643 = 564.3 QED.

I haven’t tested any other variations of types of histogram, degree of overlap of value ranges, etc. but I suspect that the general principle is probably going to give the selectivity as (or with the appearance of): “estimated cardinality of self-join” / “square of num_rows (allowing for nulls)”.

 

APEX Upgrade redux

I posted about my APEX upgrade to 19 yesterday, and someone was quick to point out to me that they believed I hadn’t covered all of the steps.

“What if your APEX instance needs to call web services?” they said. “You need to update your Access Control Lists.”

I hadn’t thought of that, so I logged onto one of my other APEX instances that was still at version 18, and checked the current ACLs



SYS> select acl,
  2         principal,
  3         privilege
  4  from   dba_network_acl_privileges
  5  order by acl, principal, privilege;

ACL                                                PRINCIPAL            PRIVILEGE
-------------------------------------------------- -------------------- ----------
/sys/acls/oracle-sysman-ocm-Resolve-Access.xml     ORACLE_OCM           resolve
NETWORK_ACL_192DBA180AEB40AD98A73ACCD309FF8F       APEX_180200          http
NETWORK_ACL_296C00CF7F2744BAB526D4C4E85FE189       GGSYS                resolve
NETWORK_ACL_296C00CF7F2744BAB526D4C4E85FE189       GSMADMIN_INTERNAL    resolve

4 rows selected.

I can see the potential issue here. On upgrade, I’ll have a new schema which would need the same ACLs granted as the existing APEX 18 schema. Then I went ahead and upgraded this instance to version 19, and lo and behold, check out this nice little touch in the upgrade.


SYS> select acl,
  2         principal,
  3         privilege
  4  from   dba_network_acl_privileges
  5  order by acl, principal, privilege;

ACL                                                PRINCIPAL            PRIVILEGE
-------------------------------------------------- -------------------- ----------
/sys/acls/oracle-sysman-ocm-Resolve-Access.xml     ORACLE_OCM           resolve
NETWORK_ACL_0F93A8F653EC43DC9D90457B1151A330       APEX_190100          http
NETWORK_ACL_192DBA180AEB40AD98A73ACCD309FF8F       APEX_180200          http
NETWORK_ACL_296C00CF7F2744BAB526D4C4E85FE189       GGSYS                resolve
NETWORK_ACL_296C00CF7F2744BAB526D4C4E85FE189       GSMADMIN_INTERNAL    resolve

5 rows selected.

The upgrade took care of the ACLs for me! That’s pretty cool.