Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

http://asktom.oracle.com/tkyte/hexdec

Base Conversion Routines

This set of routines is useful to convert between various 'bases' in Oracle. Once you install these functions, you will be able to perform operations such as:


SQL> select to_bin( 123 ) bin, to_hex( 123 ) hex, to_oct( 123 ) oct from dual
2 /

BIN HEX OCT
--------------- --------------- ---------------
1111011 7B 173

SQL>
SQL> select to_dec( '1111011', 2 ) base2, to_dec( '7B' ) base16,
2 to_dec('173',8) base8
3 from dual
4 /

BASE2 BASE16 BASE8
---------- ---------- ----------
123 123 123

Of course, with current releases - you would never user TO_HEX since TO_CHAR already does this (faster). Also TO_NUMBER would be used to convert HEX to decimal these days as well


ops$tkyte%ORA10GR2> select to_char( '1234', 'XXXX' ) from dual;

TO_CH
-----
4D2


Here is the original code:


create or replace function to_base( p_dec in number, p_base in number )
return varchar2
is
l_str varchar2(255) default NULL;
l_num number default p_dec;
l_hex varchar2(16) default '0123456789ABCDEF';
begin
if ( p_dec is null or p_base is null )
then
return null;
end if;
if ( trunc(p_dec) <> p_dec OR p_dec < 0 ) then
raise PROGRAM_ERROR;
end if;
loop
l_str := substr( l_hex, mod(l_num,p_base)+1, 1 ) || l_str;
l_num := trunc( l_num/p_base );
exit when ( l_num = 0 );
end loop;
return l_str;
end to_base;
/


create or replace function to_dec
( p_str in varchar2,
p_from_base in number default 16 ) return number
is
l_num number default 0;
l_hex varchar2(16) default '0123456789ABCDEF';
begin
if ( p_str is null or p_from_base is null )
then
return null;
end if;
for i in 1 .. length(p_str) loop
l_num := l_num * p_from_base + instr(l_hex,upper(substr(p_str,i,1)))-1;
end loop;
return l_num;
end to_dec;
/
show errors

create or replace function to_hex( p_dec in number ) return varchar2
is
begin
return to_base( p_dec, 16 );
end to_hex;
/
create or replace function to_bin( p_dec in number ) return varchar2
is
begin
return to_base( p_dec, 2 );
end to_bin;
/
create or replace function to_oct( p_dec in number ) return varchar2
is
begin
return to_base( p_dec, 8 );
end to_oct;
/

http://asktom.oracle.com/tkyte/update_cascade


UPDATE CASCADE PACKAGE


Generates needed package and triggers to support update cascade
in Oracle without removing or infringing on DECLARITIVE RI.

This package supports:

  • Tables with multi-part primary keys (primary key(a,c,b))
  • Update cascade to many child tables from one parent
  • Self-referencing integrity such as that found in the SCOTT.EMP table
    (mgr->empno)
  • Is completely application transparent. The application does not
    know it is happening
  • Versions 7.0 and above of the database.
  • Tuned and optimized to fully avoid full table scans on all tables
    (complete with utility to show you un-indexed foreign keys in a schema,
    Cascading an update to un-indexed foreign keys can be bad).

This solution has the following restrictions:

  1. All foreign keys to the parent table must point to the primary key
    constraint of the parent table
    . They cannot point to a unique
    constraint on the parent table, they must point to the primary key.
  2. No other unique constraints/indexes may be in place on the parent
    table other then the primary key constraint.
  3. Updates to primary keys that do not generate 'new' primary keys
    are not currently supported. For example, take the standard DEPT
    table. The update statement "update dept set deptno = deptno+10"

    will
    not work whereas the update "update dept set deptno = deptno+1"
    will.
    The first update will change 10->20, 20->30 and so on. Problem is
    that 10->20 is not generating a 'new' primary key. On the other hand,
    deptno=deptno+1 does not have this problem since 10->11, 20->21 and
    so on.

    NOTE: an update that affects a single row will never suffer from this
    problem.

  4. the owner of the parent table must also be the owner of the child
    tables.
  5. the owner of the parent table must run the following package in
    their schema. This package must be installed for each user that wants
    to generate update cascade support. It may be dropped after the cascade
    support has been generated.
  6. the owner of the parent table must have been granted
    create procedure and create trigger. these
    priveleges may not be inherited from a role.



Installing, Using, and seeing how the demo works


This package consists of four SQL scripts

  • uc.sql
  • demobld.sql
  • unindex.sql
  • generate.sql

UC.SQL

UC.SQL should be run by any user wanting to implement update cascade. It will
create

  • A package spec for update_cascade
  • A package body for update_cascade

Once this package is installed, you are able to implement update cascade on any
table via the pl/sql call:


PROCEDURE update_cascade.on_table
Argument Name Type In/Out Default?
------------------------------ ----------------------- ------ --------
P_TABLE_NAME VARCHAR2 IN
P_PRESERVE_ROWID BOOLEAN IN DEFAULT
P_USE_DBMS_OUTPUT BOOLEAN IN DEFAULT

Input Name Default Usage
p-table-name NO DEFAULT is the name of the parent table
p-preserve-rowid TRUE affects the generation of the code used to implement
the generated packages. If set to TRUE (the default)
the rowid of the updated parent row will not change
due to the update. If set to FALSE, the rowid
of the updated row WILL change BUT the code executes
in about 66% of the time.
p-use-dbms-output FALSE defaults to FALSE which means the update_cascade
package will execute (create) the packages/triggers.
If set to true, the generated code will be printed
using dbms_output.put_line (make sure to set
serveroutput on size 100000 before using TRUE if you
want to see the code).

For example....


SQL> REM will do the default cascade support on
SQL> REM the dept table
SQL> REM ROWIDS will be preserved (unchanged).
SQL> REM Package will be created in the database
SQL> exec update_cascade.on_table( 'dept' )

SQL> REM will do the same as above but will not
SQL> REM preserve the rowids of the parent table
SQL> REM (the rowids will change). This version
SQL> REM runs in about 2/3 of the time of the first
SQL> REM one
SQL> exec update_cascade.on_table( 'dept', false )

SQL> REM will do the same as above but will not
SQL> REM preserve the rowids of the parent table
SQL> REM (the rowids will change). This version
SQL> REM runs in about 2/3 of the time of the first
SQL> REM one. Also, the packages will be written
SQL> REM to the SCREEN, not into the database.
SQL> exec update_cascade.on_table( 'dept', false, true )

The above would generate and compile all of
the packages/triggers need to support
cascading updates on the dept table to any other table.
You would run this any time you

  • changed the primary key definition of the dept table
  • added a child table to the schema
    (eg: executed a create table and that table points to dept)
  • removed a child table from schema

NOTE: The user executing update_cascade must have been granted

  • CREATE TRIGGER
  • CREATE PROCEDURE

directly to themselves. They cannot just have the privilege via a role.

The other mode of calling update_cascade.on_table is as follows:


SQL> set feedback off
SQL> spool tmp.sql
SQL> exec update_cascade.on_table( p_table_name => 'dept', p_use_dbms_output =>TRUE )
SQL> spool off

The above would generate and print (as opposed to compile) all of the
packages/triggers/views needed to support cascading update on the dept table
to any other table. You would use this mode to

  • Avoid having to grant CREATE VIEW, CREATE TRIGGER, CREATE PROCEDURE
    directly to the person (they can get these priveleges via a role like DBA).
  • To inspect the generated code to understand what it does.
  • To generate a package that can be used to install update cascade support
    at another site without giving them the update_cascade package itself.

The other mode of calling update_cascade.on_table is as follows:


SQL> exec update_cascade.on_table( p_table_name => 'dept', p_preserve_rowid =>FALSE )

The above would generate faster versions of the udpate
cascade packages. They run in 2/3 of the time of the default version but
the rowid's of the updated parent records will change. This makes this
version less desirable with FORMS
. If you use forms heavily, use the
default mode so that rowids are preserved.
The triggers to cascade can
get away with a lot less work in this mode. The
triggers fire half the time they would in the default mode and an update that
would normally fire and affect 2x the number of rows is not needed. The
generated package code is streamlined as well (less actuall code is
generated, procedural code not just updates).

DEMOBLD.SQL

This script creates one user and three tables in that users schema. The user is
called UCDEMO.
WARNING: This script does a "drop user ucdemo cascade". It is
run as SYSTEM. Please review it before running it. Modify if you want
.

Once the script creates the user it will create six tables and populate them.
The tables are:

  • DEPT with a primary key
  • EMP with a primar key, DECLARATIVE RI to DEPT, DECLARATIVE RI to EMP
  • PROJECTS with a primary key, DECLARATIVE RI to EMP
  • T1 with a three part primary key
  • T2 with a three part primary key and a three part foreign key to T1
  • T3 with a three part primary key and a three part foreign key to T2

To begin the demo, you might issue:


SQL> update dept set deptno=deptno+1;
update dept set deptno=deptno+1
*
ERROR at line 1:
ORA-02292: integrity constraint (UCDEMO.SYS_C005184) violated - child record
found


Then to see the update cascade in action, you would:


SQL> @uc


Table created.

Table altered.

Package created.

Package body created.

SQL> exec update_cascade.on_table('dept');

PL/SQL procedure successfully completed.

SQL>

SQL> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON

SQL> select empno, deptno from emp where deptno = 10;

EMPNO DEPTNO
---------- ----------
7839 10
7782 10
7934 10

SQL> update dept set deptno = deptno+1;


4 rows updated.

SQL> select * from dept;

DEPTNO DNAME LOC
---------- -------------- -------------
11 ACCOUNTING NEW YORK
21 RESEARCH DALLAS
31 SALES CHICAGO
41 OPERATIONS BOSTON

SQL> select empno, deptno from emp where deptno in ( 10, 11 );

EMPNO DEPTNO
---------- ----------
7839 11

7782 11
7934 11

SQL>

As you can see, after the package is installed,
the updates cascade successfully to
the child tables. The effect of this is even more
dramatic when you do it to the
emp table. The MGR column of the EMP table points to
the EMPNO column of the EMP
table. In addition, the EMPNO column of the PROJECTS table points to the EMPNO
column of the EMP table. The following scenario is a good demo of more complex
integrity:



SQL> select empno, mgr from emp;

EMPNO MGR
---------- ----------
7839
7698 7839
7782 7839
7566 7839
7499 7698
7521 7698
7654 7698
7902 7566
7369 7902
7788 7566
7844 7698
7876 7788
7900 7698
7934 7782

14 rows selected.

SQL> update emp set empno = 8000 where empno = 7698;

update emp set empno = 8000 where empno = 7698
*
ERROR at line 1:
ORA-02292: integrity constraint (UCDEMO.SYS_C005186) violated - child record
found


SQL> exec update_cascade.on_table('emp')

PL/SQL procedure successfully completed.

SQL> update emp set empno = 8000 where empno = 7698;

1 row updated.

SQL> select empno, mgr from emp;

EMPNO MGR
---------- ----------
7839
7782 7839
7566 7839
7499 8000
7521 8000

7654 8000
7902 7566
7369 7902
7788 7566
7844 8000
7876 7788
7900 8000
7934 7782
8000 7839

14 rows selected.


UNINDEX.SQL

The lack of an index on a foreign key will adversely impact the performance
of a cascading update. For example, the emp table is created via:


CREATE TABLE EMP
(EMPNO NUMBER(4) primary key,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4) references emp,
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2) references dept);

The MGR column and the DEPTNO column are not indexed by default. So, for
example, if one were to issue:


SQL> update emp set empno = empno+1;

This would eventually issue:


update emp set mgr = SOME_NEW_VALUE
where mgr = SOME_OLD_VALUE;

Since the MGR field is not indexed, this update would do a full scan. It
would do this full scan once for each row in EMP that was updated.

Unindex.sql will generate output such as:


SQL> @unindex.sql

STAT TABLE_NAME COLUMNS COLUMNS
---- ------------------------------ -------------------- --------------------
**** EMP MGR
ok EMP DEPTNO DEPTNO
ok PROJECTS EMPNO EMPNO, PROJ_NO
ok T2 A A, B
ok T3 A, B A, B, C

SQL>

The four **** indicate that MGR is a foreign key in the EMP table that is
not indexed. It should be (also see chapter 6 in the server application
developers guide for other reasons why un-indexed foreign keys are bad).

GENERATE.SQL

This is a simple script that will generate all of the needed
update_cascade.on_table( 'table_name' ) statements that need to be executed
for all parent tables in a schema. For example, in the demo account it
would create a spool file that contains:


SQL> @src/update_cascade/generate

prompt Update Cascade on table: DEPT
execute update_cascade.on_table( 'DEPT' )

prompt Update Cascade on table: EMP
execute update_cascade.on_table( 'EMP' )

prompt Update Cascade on table: T1
execute update_cascade.on_table( 'T1' )

prompt Update Cascade on table: T2
execute update_cascade.on_table( 'T2' )



HOW it works:


When you update the primary key of a parent table, you might want to
cascade the update to the children. This is hard to do for many
reasons and can be problematic. This package works around the lack of
an update cascade option.

This package uses three triggers to perform it magic.

  • A before update trigger; used to reset some package variables
  • A before update, for each row trigger; used to capture the before
    and after images of the primary keys in pl/sql tables.
    It also 'undoes' the update to the primary key.
  • An After update trigger that does the following steps:
    • 'Clones' the parent records with their new primary key, eg:
      insert into parent select NEW_KEY, other_cols
      from parent where CURRENT_KEY = ( SELECT OLD_KEY
      FROM DUAL)

      for example, given "update dept set deptno=deptno+1", this would
      insert the values of 11, 21, 31, 41 into the dept table. 11 would
      have the values in the rest of the columns that 10 had. 21 would
      look like 20 and so on.

    • If p_preserve_rowids = TRUE, then the primary keys of the row that
      was cloned and the clone would be flip flopped. For example, if you
      issue: update dept set deptno = 11 where deptno = 10 we would make 10
      become the new value 11 and 11 become the old value 10.
    • Re-Parents the child records in all subordinate tables.
      Performs the equivalent of:
      update child set fkey = ( select new_key
      from DUAL )
      where fkey = ( select old_key from DUAL )
    • It then removes the 'cloned' parent records or the record with the
      old primary key value.

A look at the code

The follow annotated code is the generated packages and triggers you
would create by generating support for the dept table.

The following package spec is what would be generated for the
typical 'DEPT' table
found in the scott schema (when declaritive RI is used). See the annotations in
the spec for a description of what each entry means and how it is used. The
annotations are not found in the generated code, the generated code is not
commented.

This generated code preserves rowids. The code that preserves rowids will
be in bold. This code would not be present in the generated package if
rowid preservation was disabled.




SQL> exec update_cascade.on_table('dept',true,true);

The following is a typical package specification generated for a table.
The package spec name is always u || TABLE_NAME || p. The package name is
in mixed case (to prevent collisions with other user objects).


create or replace package "uDEPTp"
as
--


Rowcnt is used to collect the number of rows processed by a given update
statement. It is reset in the uDEPTp.reset routine in a before update
trigger. The 'inTrigger' variable is used to prevent recursive firing of
triggers when p_preserve_rowid = TRUE;


rowCnt number default 0;
inTrigger boolean default FALSE;

--


For each element in the primary key, a table type will be declared and then
an array of that type will be declared to 1.) hold the before image, 2.) the
after image, and 3.) an empty array used to zero out the previous two
arrays.


type C1_type is table of "DEPT"."DEPTNO"%type index by binary_integer;
--
empty_C1 C1_type;
old_C1 C1_type;
new_C1 C1_type;
--
--


Reset is the routine fired by the BEFORE UPDATE trigger that resets the
rowcnt variable and empties out the arrays from the previous invocation.


procedure reset;
--


Do cascade is the work horse routine. It performs the actual cascade when
fired from an AFTER UPDATE trigger.


procedure do_cascade;
--


Add Entry simply increments the rowcnt and collects the before/after images
of the primary keys. It also 'undoes' the update to the primary key by
accessing the :new and :old variables.


procedure add_entry
(
p_old_C1 in "DEPT"."DEPTNO"%type
,p_new_C1 in out "DEPT"."DEPTNO"%type
);
--
end "uDEPTp";
/


This is the package body generated. It implements the above specification

create or replace package body "uDEPTp"
as
--
procedure reset
is
begin
--


This line is present in all routines when p_preserve_rowids = TRUE. It
prevents recursive firing of the triggers.


if ( inTrigger ) then return; end if;
--
rowCnt := 0;
old_C1 := empty_C1;
new_C1 := empty_C1;
end reset;
--
procedure add_entry
(
p_old_C1 in "DEPT"."DEPTNO"%type
,p_new_C1 in out "DEPT"."DEPTNO"%type
)
is
begin
--
if ( inTrigger ) then return; end if;

--


This code saves the before and after images in pl/sql tables and 'undoes'
the primary key update by setting the new columns back to the old columns.


if (
p_old_C1 <> p_new_C1
) then
rowCnt := rowCnt + 1;
old_C1( rowCnt ) := p_old_C1;
new_C1( rowCnt ) := p_new_C1;
p_new_C1 := p_old_C1;
end if;
end add_entry;
--
procedure do_cascade
is
begin
--
if ( inTrigger ) then return; end if;
inTrigger := TRUE;

--


For every row that was updated we will perform the clone, cascade and
delete....


for i in 1 .. rowCnt loop


This insert clones the parent row, duping the old values with the new
primary key.


insert into DEPT (
"DEPTNO"
,"DNAME","LOC") select
new_C1(i)
,"DNAME","LOC"
from "DEPT" a
where ( "DEPTNO" ) =
( select old_C1(i)
from dual );
--


This code is generated only when p_preserve_rowids=true and will flip-flop
the old and new primary keys, hence preserving the rowid of the original
parent.


update "DEPT" set
( "DEPTNO" ) =
( select
decode( "DEPTNO", old_c1(i), new_c1(i), old_c1(i) )
from dual )
where ( "DEPTNO" ) =
( select new_C1(i)
from dual )
OR ( "DEPTNO" ) =
( select old_C1(i)
from dual );

--


Do a cascade update to all children tables.


update "EMP" set
( "DEPTNO" ) =
( select new_C1(i)
from dual )
where ( "DEPTNO" ) =
( select old_C1(i)
from dual );
--


Removing the old primary key value.


delete from "DEPT"
where ( "DEPTNO" ) =
( select old_C1(i)
from dual);
end loop;
--
inTrigger := FALSE;
reset;
exception
when others then
inTrigger := FALSE;
reset;
raise;
end do_cascade;
--
end "uDEPTp";
/


Lastly, we have the three triggers placed on
the parent table to effect the update
cascade. The first trigger simply 'resets' the package variables above.


create or replace trigger "uc$DEPT_bu"
before update of
"DEPTNO"
on "DEPT"
begin "uc$DEPT_pkg".reset; end;


The next trigger, the for each row trigger,
simply calls add_entry for each changed
row.


create or replace trigger "uc$DEPT_bufer"
before update of
"DEPTNO"
on "DEPT"
for each row
begin
"uc$DEPT_pkg".add_entry(
:old."DEPTNO"
,:new."DEPTNO"
);
end;


The last trigger, calls do_cascade to effect the change


create or replace trigger "uc$DEPT_au"
after update of
"DEPTNO"
on "DEPT"
begin "uc$DEPT_pkg".do_cascade; end;



Download


Cary on Joel on SSD

Joel Spolsky's article on Solid State Disks is a great example of a type of problem my career is dedicated to helping people avoid. Here's what Joel did:

  1. He identified a task needing performance improvement: "compiling is too slow."
  2. He hypothesized that converting from spinning rust disk drives (thanks mwf) to solid state, flash hard drives would improve performance of compiling. (Note here that Joel stated that his "goal was to try spending money, which is plentiful, before [he] spent developer time, which is scarce.")
  3. So he spent some money (which is, um, plentiful) and some of his own time (which is apparently less scarce than that of his developers) replacing a couple of hard drives with SSD. If you follow his Twitter stream, you can see that he started on it 3/25 12:15p and wrote about having finished at 3/27 2:52p.
  4. He was pleased with how much faster the machines were in general, but he was disappointed that his compile times underwent no material performance improvement.

Here's where Method R could have helped. Had he profiled his compile times to see where the time was being spent, he would have known before the upgrade that SSD was not going to improve response time. Given his results, his profile for compiling must have looked like this:

100%  Not disk I/O
0% Disk I/O
---- ------------
100% Total

I'm not judging whether he wasted his time by doing the upgrade. By his own account, he is pleased at how fast his SSD-enabled machines are now. But if, say, the compiling performance problem had been survival-threateningly severe, then he wouldn't have wanted to expend two business days' worth of effort upgrading a component that was destined to make zero difference to the performance of the task he was trying to improve.

So, why would someone embark upon a performance improvement project without first knowing exactly what result he should be able to expect? I can think of some good reasons:

  • You don't know how to profile the thing that's slow. Hey, if it's going to take you a week to figure out how to profile a given task, then why not spend half that time doing something that your instincts all say is surely going to work?
  • Um, ...

Ok, after trying to write them all down, I think it really boils down to just one good reason: if profiling is too expensive (that is, you don't know how, or it's too hard, or the tools to do it cost too much), then you're not going to do it. I don't know how I'd profile a compile process on a Microsoft Windows computer. It's probably possible, but I can't think of a good way to do it. It's all about knowing; if you knew how to do it, and it were easy, you'd do it before you spent two days and a few hundred bucks on an upgrade that might not give you what you wanted.

I do know that in the Oracle world, it's not hard anymore, and the tools don't cost nearly as much as they used to. There's no need anymore to upgrade something before you know specifically what's going to happen to your response times. Why guess... when you can know.

Profiling with my Boy

We have an article online called "Can you explain Method R so even my boss could understand it?" Today I'm going to raise the stakes, because yesterday I think I explained Method R so that an eleven year-old could understand it.

Yesterday I took my 11 year-old son Alex to lunch. I talked him into eating at one of my favorite restaurants, called Mercado Juarez, over in Irving, so it was a half hour in the car together, just getting over there. It was a big day for the two of us because we were very excited about the new June 17 iPhone OS 3.0 release. I told him about some of the things I've learned about it on the Internet over the past couple of weeks. One subject in particular that we were both interested in was performance. He likes not having to wait for click results just as much as I do.

According to Apple, the new iPhone OS 3.0 software has some important code paths in it that are 3× faster. Then, upgrading to the new iPhone 3G S hardware is supposed to yield yet another 3× performance improvement for some code paths. It's what Philip Schiller talks about at 1:42:00 in the WWDC 2009 keynote video. Very exciting.

Alex of course, like many of us, wants to interpret "3× faster" as "everything I do is going to be 3× faster." As in everything that took 10 seconds yesterday will take 3 seconds tomorrow. It's a nice dream. But it's not what seeing a benchmark run 3× faster means. So we talked about it.

I asked him to imagine that it takes me an hour to do my grocery shopping when I walk to the store. Should I buy a car? He said yes, probably, because a car is a lot faster than walking. So I asked him, what if the total time I spent walking to and from the grocery store was only one minute? Then, he said, having a car wouldn't make that much of a difference. He said you might want a car for other reasons, but he wouldn't recommend it just to make grocery shopping faster.

Good.

I said, what if grocery shopping were taking me five hours, and four of it was time spent walking? "Then you probably should get the car," he told me. "Or a bicycle."

Commit.

On the back of his menu (photo above: click to zoom), I drew him a sequence diagram (A) showing how he, running Safari on an iPhone 3G might look to a performance analyst. I showed him how to read the sequence diagram (time runs top-down, control passes from one tier to another), and I showed him two extreme ways that his sequence diagram might turn out for a given experience. Maybe the majority of the time would be spent on the 3G network tier (B), or maybe the majority of the time would be spent on the Safari software tier (C). We talked about how if B were what was happening, then a 3× faster Safari tier wouldn't really make any difference. Apple wouldn't be lying if they said their software was 3× faster, but he really wouldn't notice a performance improvement. But if C were what was happening, then a 3× faster Safari tier would be a smoking hot upgrade that we'd be thrilled with.

Sequence diagrams, check. Commit.

Now, to profiles. So I drew up a simple profile for him, with 101 seconds of response time consumed by 100 seconds of software and 1 second of 3G (D):


Software 100
3G 1
-------------
Total 101

I asked him, if we made the software 2× faster, what would happen to the total response time? He wrote down "50" in a new column to the right of the "100." Yep. Then I asked him what would happen to total response time. He said to wait a minute, he needed to use the calculator on his iPod Touch. Huh? A few keystrokes later, he came up with a response time of 50.5.

Oops. Rollback.

He made the same mistake that just about every forty year-old I've ever met makes. He figured if one component of response time were 2× faster, then the total response time must be 2× faster, too. Nope. In this case, the wrong answer was close to the right answer, but only because of the particular numbers I had chosen.

So, to illustrate, I drew up another profile (E):


Software 4
3G 10
-------------
Total 14

Now, if we were to make the software 2× faster, what happens to the total? We worked through it together:


Software 4 2
3G 10 10
------------------
Total 14 12

Click. So then we spent the next several minutes doing little quizzes. If this is your profile, and we make this component X times faster, then what's the new response time going to be? Over and over, we did several more of these, some on paper (F), and others just talking.

Commit.

Next step. "What if I told you it takes me an hour to get into my email at home? Do I need to upgrade my network connection?" A couple of minutes of conversation later, he figured out that he couldn't answer that question until he got some more information from me. Specifically, he had to ask me how much of that hour is presently being spent by the network. So we did this over and over a few times. I'd say things like, "It takes me an hour to run my report. Should I spend $4,800 tuning my SQL?" Or, "Clicking this button takes 20 seconds. Should I upgrade my storage area network?"

And, with just a little bit of practice, he learned that he had to say, "How much of the however-long-you-said is being spent by the whatever-it-was-you-said?" I was happy with how he answered, because it illustrated that he had caught onto the pattern. He realized that the specific blah-blah-blah proposed remedies I was asking him about didn't really matter. He had to ask the same question regardless. (He was answering me with a sentence using bind variables.)

Commit.

Alex hears me talk about our Method R Profiler software tool a lot, and he knows conceptually that it helps people make their systems faster, but he's never known in any real detail very much about what it does. So I told him that the profile tables are what our Profiler makes for people. To demonstrate how it does that, I drew him up a list of calls (F), which I told him was a list of visits between a disk and a CPU. ...Just a list that says the same thing that a sequence diagram (annotated with timings) would say:


D 2
C 1
D 6
D 4
D 8
C 3

I told him to make a profile for these calls, and he did (H):


Disk 20
CPU 4
---------
Total 24

Excellent. So I explained that instead of adding up lists in our head all day, we wrote the Profiler to aggregate the call-by-call durations (from an Oracle extended SQL trace file) for you into a profile table that lets you answer the performance questions we had been practicing over lunch. ...Even if there are millions of lines to add up.

The finish-up conversation in the car ride back was about how to use everything we had talked about when you fix people's performance problems. I told him the most vital thing about helping someone solve a performance problem is to make sure that the operation (the business task) that you're analyzing right now is actually the most important business task to fix first. If you're looking at anything other than the most important task first, then you're asking for trouble.

I asked him to imagine that there are five important tasks that are too slow. Maybe every one of those things has its response time dominated by a different component than all the others. Maybe they're all the same. But if they're all different, then no single remedy you can perform is going to fix all five. A given remedy will have a different performance impact upon each of the five tasks, depending on how much of the fixed thing that task was using to begin with.

So the important thing is to know which of the five profiles it is that you ought to be paying attention to first. Maybe one remedy will fix all five tasks, maybe not. You just can't know until you look at the profiles. (Or until you try your proposed remedy. But trial-and-error is an awfully expensive way to find out.)

Commit.

It was a really good lunch. I'll look forward to taking my 9-year-old (Alex's little brother) out the week after next when I get back from ODTUG.

Alive and Prutsing

As you might have noticed I haven’t been exactly busy posting new stories on my blog. The obvious reason for this is that I have been way too busy doing other things such as prutsing with Oracle11g Release 2. Two weeks ago I taught my first NF11g class, which includes all major new release 2 [...]

Oracle OpenWorld Unconference

One of the less “traditional” tracks at Oracle OpenWorld is the Oracle Unconference. The Unconference allows for a much more informal setting for sessions and generally is more interactive and audience driven compared to the usually amounts of PowerPoint sessions contain which can cause eyes to bleed. This year I’ve signed up to [...]

Cost and value

I just read a post by Seth Godin entitled "If Craigslist cost $1". I've used Craigslist and I agree with him that even a small charge for the service would likely clean it up a whole lot. At $1 per listing, most people with a legitimate need would still choose to use the service. But, for scammers and those with not so virtuous purposes, the small charge and thus the requirement for verifiable identification for the money exchange, would push them out of the game. Plus, the money coming in for using the service could go towards making the service better or making the owners richer or be used for philanthropic purposes or whatever.

The post made me think about cost and value. What if the cost to use the service was higher ($5, $10, $20)? At what point would the cost outweigh the value? Well, certainly the higher the cost to use the service is, the higher the cost of the items sold on the service will be, right? I mean, I don't think I'd want to pay $10 to use the service if I'm going to sell my item for $10. Maybe if I'm selling my item for $20, I'd be willing to pay half of that to get it sold.

My actual, real-life use of Craigslist has typically been for a few basic reasons:

1) I no longer use the item(s) but they are in good enough shape that I think someone else would/could use them.
2) Having someone come pick up the item and take it away is much easier than trying to figure out how to get it to my local Goodwill (stuff like furniture and larger items).
3) I bought something on a whim and either never used it or only used it once or twice before realizing that I really didn't want or need it.

The real bottom-line for me has typically been that I just wanted to get rid of something that I was no longer using but believed the item to be in good enough condition that it "shouldn't" be thrown away. (Waste not, want not!) So, in most every case, even if I only asked a few dollars, the point was that I was able to get the items out of my house and off to someone else who wanted them. I didn't have to do anything except put up an ad, answer a few calls and then exchange the item(s) for cash when a buyer appeared.

It wasn't about the money that I received for the item. Even if I hadn't sold the item, I would have gotten rid of it somehow, so the money was just a "nice to have" benefit. The value of the service was that it was easy for me and I believed it was a win-win for both me and the buyer. I'd have been willing to pay a fee to use the service.

But really, this isn't about Craigslist. It's about the cost (price you're willing to pay) of something versus the value having or using that something has to you. Another example is software. Some of my favorite software tools have cost me little to nothing to obtain. I use them constantly and have been very happy to pay for them (most everything I'm thinking of has cost less than $30 each). But there are other products that cost *a lot* ($1,000 or more) and while I really might like to have them, I'm not willing to pay that cost. I'm particularly not willing to pay when there is a comparable product that is available for free or at a very low cost. The first example that comes to mind is TOAD vs SQL Developer. [sidebar] I am not advocating either product nor am I comparing features or virtues of either. [/sidebar] Since SQL Developer is free, I'd very likely choose it instead of the costed product and use it until I found that it didn't meet my needs to such a degree that the cost of buying TOAD would be outweighed by the benefit of having it. Even then, I'd hesitate unless the feature I needed was so key/critical that I could easily identify how having that feature would save me money in the long haul.

But, if TOAD was priced much lower, say under $300 (instead of the base edition being nearly $900), I don't think I'd hesitate to buy it. This is where I wonder how companies decide to price their products. Is it truly a "what the market will bear" pricing strategy or what? I likely have a poor way of looking at it, but if I had a software product that I think everyone should use, then I'd be willing to sell it at a lower cost in hopes that I'd sell more and thus make up the difference. In other words, I'd rather sell 1000 copies at $300 each than 100 copies at $3000 each. I'd say there is an excellent reason why pricing isn't done this way and there's also a reason why I'm not in sales and marketing. ;)

How about you? How does cost and value fit into your personal buying decisions?

SMART goals

I was just reviewing the written goals for a project I've started working on and after looking at a few of them, I was reminded of how important it is to make sure your goals are "SMART".

For example, one of the project goals is: To create a standard framework that supports high levels of service. Hmmm... When I read that goal, I found myself thinking of the concept of SMART goals that I learned a long time ago. Somehow this goal doesn't seem to fit the SMART paradigm.

So, what is a SMART goal?

SMART is just a mnemonic that can help you remember how to effectively formulate a goal. Here's a brief description:

Specific - The What, Why, and How.
What are you going to do? What do you want to accomplish?
Why is it important (reasons, purpose, benefits)?
How will you accomplish the goal?

Measurable - Concrete criteria for measuring progress.
If you can measure properly, you can see change occur as you progress toward the goal. Ask questions like "how much?" or "how many?".

Attainable - Takes enough effort to be a bit of a stretch, but not so far out of reach that is nearly impossible to complete.

Relevant - "Do-able"...not "easy". The skills are available, the learning curve is not vertical and the goal is within bounds of the overall plan.

Time-Bound - A clear target date or time period to work towards. Without the time element, the commitment to begin action is often too vague.

I think the project goals I've been reviewing have most of the details that would cover the SMART elements elsewhere in the project docs, but I'm glad for the opportunity to review this method for stating goals.

Latency Hiding For Fun and Profit

Yep, another post with the word ‘latency’ written all over it.

I’ve talked a lot about latency, and how it is more often than not completely immutable. So, if the latency cannot be improved upon because of some pesky law of physics, what can be done to reduce that wasted time? Just three things, actually:

  1. Don’t do it.
  2. Do it less often.
  3. Be productive with the otherwise wasted time.

The first option is constantly overlooked – do you really need to be doing this task that makes you wait around? The second option is the classic ‘do things in bigger lumps between the latency’ – making less roundtrips being the classic example. This post is about the third option, which is technically referred to as latency hiding.

Everybody knows what latency hiding is, but most don’t realise it. Here’s a classic example:

I need some salad to go with the chicken I am about to roast. Do I:

(a) go to the supermarket immediately and buy the salad, then worry about cooking the chicken?

OR

(b) get the chicken in the oven right away, then go to the supermarket?

Unless the time required to buy the salad is much longer than the chicken’s cook-time, the answer is always going to be (b), right? That’s latency hiding, also known as Asynchronous Processing. Let’s look at the numbers:

Variable definitions:

Supermarket Trip=1800s

Chicken Cook-Time=4800s

Calculations:

Option (a)=1800s+4800s=6600s (oh man, nearly two hours until dinner!)

Option (b)=4800s (with 1800s supermarket time hidden within it)

Here’s another example: You have a big code compile to do, and an empty stomach to fill. In which order do you execute those tasks? Hit ‘make’, then grab a sandwich, right?

As a side note, this is one of my classic character flaws – I just live for having tasks running in parallel this way. Not a flaw, I hear you say? Anyone that has tried to get software parallelism  (such as Oracle Parallel Execution) knows the problem – some tasks finish quicker than expected, and then there’s a bunch of idle threads.  In the real world, this means that my lunch is often very delayed, much to the chagrin of my lunch buddies.

OK, so how does this kind of thing work with software? Let’s look at a couple of examples:

  1. Read Ahead
  2. Async Writes

Read ahead  from physical disk is the most common example of (1), but it equally applies to result prefetching in, say, AJAX applications. Whatever the specific type, it capitalises on parallel processing from two resources. Let’s look at the disk example for clarification.

Disk read ahead is where additional, unrequested, reads are carried out after an initial batch of real requested reads. So, if a batch job makes a read request for blocks 1,2,3 and 4 of a file, “the disk” returns those blocks back and then immediately goes on to read blocks 5,6,7,8, keeping them in cache. If blocks 5,6,7,8 are then subsequently requested by the batch job after the first blocks are processed, they can immediately be returned from cache, thus hiding the latency from the batch job. This has the impact of hiding the latency from the batch job and increases throughput as a direct result.

Async writes are essentially the exact opposite of read-ahead. Let’s take the well-known Oracle example of async writes, that of the DBWR process flushing out dirty buffers to disk. The synchronous way to do this is to generate a list of dirty buffers and then issue a series of synchronous writes (one after the other) until they are all complete. Then start again by looking for more dirty buffers. The async I/O way to do the same operation is to generate the list, issue an async write request (which returns instantly), and immediately start looking for more dirty buffers. This way, the DBWR process can spend more useful time looking for buffers – the latency is hidden, assuming the storage can keep up.

By the way, the other horror of the synchronous write is that there is no way that the I/O queues can be pushed hard enough for efficient I/O when sending out a single buffer at a time. Async writes remedy that problem.

I’ve left a lot of the technical detail out of that last example, such as the reaping of return results from the async I/O process, but didn’t want to cloud the issue. Oops, I guess I just clouded the issue, just ignore that last sentence…

4th Planboard DBA Symposium: Registration now open

On November 17 Planboard will run her 4th Dutch DBA Symposium and the registration is now open. This “for Dutch DBA’s, by Dutch DBA’s” symposium has become the place to be for the serious DBA who wants to share his or her knowledge with other DBA’s in an open environment with plenty of networking time [...]