Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Little things worth knowing: OSWatcher Analyser Dashboard

I have written a few articles about Tracefile Analyzer (TFA) in the recent past. As you may recall from these posts, a more comprehensive TFA version than the one provided with the installation media is available from My Oracle Support (MOS for short). As part of this version, you get a lot of useful, additional tools, including OSWatcher. I love OSWatcher, simply because it gives me insights that would be very hard to get with sar for example. SAR tends to be the least common denominator on most (proper) operating systems and it’s better than nothing, but please read on and let me explain why I like OSWatcher so much.

Feel free to head back to my earlier posts if you like to get some more details about “stock TFA” and “MOS TFA”. These are terms I coined by the way, you won’t find them in a MOS search.

Say hello to the OSWatcher Analyser dashboard

In this article I’d like to introduce the OSWatcher Analyser dashboard to you. I think it’s probably the best way (certainly a very convenient one) to get an overview of what is going on at the operating system level. And it includes pretty pictures that are so easy to understand! OSWatcher (more precisely, oswbb) should be running on the system as part of “MOS Tracefile Analyzer”. In other words, it has to be installed.

The environment for this post

As always, before beginning the write-up, here are some details about the system I’m using. My virtual RAC system is based on KVM, and I’m running Oracle Linux 7.4 with UEK 4. The database reasonably current at version 18.3.0.0.180717 (but that shouldn’t matter as this post isn’t about the database!), and I have upgraded TFA to 18.3.3.0.0. This is the latest TFA version at the time of writing.

[oracle@rac18pri1 ~]$ tfactl print version
TFA Version : 18.3.3.0.0

[oracle@rac18pri1 ~]$ tfactl toolstatus

.------------------------------------------------------------------.
|                  TOOLS STATUS - HOST : rac18pri1                 |
+----------------------+--------------+--------------+-------------+
| Tool Type            | Tool         | Version      | Status      |
+----------------------+--------------+--------------+-------------+
| Development Tools    | orachk       |   12.2.0.1.3 | DEPLOYED    |
|                      | oratop       |       14.1.2 | DEPLOYED    |
+----------------------+--------------+--------------+-------------+
| Support Tools Bundle | darda        | 2.10.0.R6036 | DEPLOYED    |
|                      | oswbb        |        8.1.2 | RUNNING     |
|                      | prw          | 12.1.13.11.4 | NOT RUNNING |
+----------------------+--------------+--------------+-------------+
| TFA Utilities        | alertsummary |   12.2.1.1.0 | DEPLOYED    |
|                      | calog        |   12.2.0.1.0 | DEPLOYED    |
|                      | dbcheck      |   18.3.0.0.0 | DEPLOYED    |
|                      | dbglevel     |   12.2.1.1.0 | DEPLOYED    |
|                      | grep         |   12.2.1.1.0 | DEPLOYED    |
|                      | history      |   12.2.1.1.0 | DEPLOYED    |
|                      | ls           |   12.2.1.1.0 | DEPLOYED    |
|                      | managelogs   |   12.2.1.1.0 | DEPLOYED    |
|                      | menu         |   12.2.1.1.0 | DEPLOYED    |
|                      | param        |   12.2.1.1.0 | DEPLOYED    |
|                      | ps           |   12.2.1.1.0 | DEPLOYED    |
|                      | pstack       |   12.2.1.1.0 | DEPLOYED    |
|                      | summary      |   12.2.1.1.0 | DEPLOYED    |
|                      | tail         |   12.2.1.1.0 | DEPLOYED    |
|                      | triage       |   12.2.1.1.0 | DEPLOYED    |
|                      | vi           |   12.2.1.1.0 | DEPLOYED    |
'----------------------+--------------+--------------+-------------'

Note :-
  DEPLOYED    : Installed and Available - To be configured or run interactively.
  NOT RUNNING : Configured and Available - Currently turned off interactively.
  RUNNING     : Configured and Available.

[oracle@rac18pri1 ~]$ 

As you can see, oswbb is part of the “Support Tools Bundle” and actively running out of the box.

Apart from OSWatcher.sh it is important for OSWatcherFM.sh to run as well, as it clears the archive directory where OSWatcher stores its results. Of course that doesn’t relieve you from keeping an eye on your filesystem usage numbers :) Both of these are started by default.

Invoking OSWBB

Assuming there is data for analysis (eg OSWatcher has run for a bit), you can invoke oswbb via tfactl. It will go through a somewhat lengthy parse phase if you don’t narrow your search down to a specific time, but eventually presents you with options:

[oracle@rac18pri1 ~]$ tfactl oswbb

Starting OSW Analyzer V8.1.2
OSWatcher Analyzer Written by Oracle Center of Expertise
Copyright (c)  2017 by Oracle Corporation

Parsing Data. Please Wait...

Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.10.0600.dat ...
Parsing file rac18pri1_iostat_18.10.10.0700.dat ...
Parsing file rac18pri1_iostat_18.10.10.0800.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0400.dat ...
Parsing file rac18pri1_ps_18.10.16.1900.dat ...
Parsing file rac18pri1_ps_18.10.16.2000.dat ...


Parsing Completed.


Enter 1 to Display CPU Process Queue Graphs
Enter 2 to Display CPU Utilization Graphs
Enter 3 to Display CPU Other Graphs
Enter 4 to Display Memory Graphs
Enter 5 to Display Disk IO Graphs

Enter GC to Generate All CPU Gif Files
Enter GM to Generate All Memory Gif Files
Enter GD to Generate All Disk Gif Files
Enter GN to Generate All Network Gif Files

Enter L to Specify Alternate Location of Gif Directory
Enter Z to Zoom Graph Time Scale (Does not change analysis dataset)
Enter B to Returns to Baseline Graph Time Scale (Does not change analysis dataset)
Enter R to Remove Currently Displayed Graphs

Enter X to Export Parsed Data to Flat File
Enter S to Analyze Subset of Data(Changes analysis dataset including graph time scale)
Enter A to Analyze Data
Enter D to Generate DashBoard

Enter Q to Quit Program

Please Select an option:

As you can see, there are plenty of options available. In the interactive mode you are currently using it’s possible to display all sorts of charts for CPU, memory, disk, and network.

Another option – and one I like a lot, is to generate all the data in a dashboard.

Creating the dashboard

If you choose option “D” oswbb will go ahead and create a dashboard. You are prompted to enter a name before it goes off and creates it. The location where it saves it is not immediately obvious … In my case – using RAC – the result is stored in $ORACLE_BASE/tfa/repository/suptools/$(hostname)/oswbb/oracle/oswbb/analysis

I suppose that’s the same location in Oracle Restart and single instance, although I haven’t been able to verify that claim.

Within the “analysis” directory you find all your information combined in a subdirectory. It has the same name you assigned earlier when you requested the dashboard generation. It’s probably easiest to transfer the entire directory to a workstation for analysis.

Within the top level directory you find 2 files – analysis.txt as well as another directory containing the HTML representation of the data gathered.

[martin@host dashboard1]$ ls -l
total 14588
-rw-r--r-- 1 martin martin 14931077 Oct 16 22:07 analysis.txt
drwxr-xr-x 7 martin martin     4096 Oct 16 22:07 dashboard
[martin@host dashboard1]$

Let’s start with the latter to get a first impression.

The HTML output

I have included a couple of screenshots that you might find interesting about the system under observation. If this was a “real” system and not an undersized set of VMs I’d probably be quite nervous by just looking at this data … there is no reason for concern though, this is just my lab! And I did my best to hurt the VMs, otherwise there wouldn’t be anything to show here :)

overviewhttps://martincarstenbach.files.wordpress.com/2018/10/overview.png?w=150 150w, https://martincarstenbach.files.wordpress.com/2018/10/overview.png?w=300 300w" sizes="(max-width: 757px) 100vw, 757px" />

As you can see, all the main components are colour-coded. A number of important charts are presented as well. CPU looks particularly bad. You can enlarge the images by clicking on them.

The dashboard provides you with detailed information for CPU, memory, I/O and networking. I’d like to show you a few of these now.

CPU

Clicking on the red CPU button, I get more details about CPU utilisation on the system.

cpuhttps://martincarstenbach.files.wordpress.com/2018/10/cpu.png?w=150 150w, https://martincarstenbach.files.wordpress.com/2018/10/cpu.png?w=300 300w" sizes="(max-width: 750px) 100vw, 750px" />

There are actually a few more charts, but they didn’t fit on the screen. I like the fact that I am immediately presented with critical findings at the top of the page, and there is more information in form of charts. Again, clicking on the chart provides a larger picture.

Heading over to the text file (or by clicking on the “details” button not shown in the figure) I mentioned earlier I get more details:

############################################################################
# Section 4.2: CPU UTILIZATION: PERCENT BUSY
# CPU utilization should not be high over long periods of time. The higher 
# the cpu utilization the longer it will take processes to run.  Below lists 
# the number of times (NUMBER) and percent of the number of times (PERCENT) 
# that cpu percent busy was High (>95%) or Very High (100%). Pay attention 
# to high spanning multiple snaps as this represents the number of times cpu
# percent busy remained high in back to back snapshots
                                       NUMBER  PERCENT
------------------------------------------------------
Snaps captured in archive                 5568   100.00
High (>95%)                                 48     0.86
Very High (100%)                            42     0.75
High spanning multiple snaps                24     0.43

CPU UTILIZATION: The following snaps recorded cpu utilization of 100% busy:
SnapTime                      
------------------------------
Wed Oct 10 11:08:56 BST 2018
Wed Oct 10 11:09:29 BST 2018
Wed Oct 10 20:14:54 BST 2018
[...]

Using these data points I can further drill down into the information gathered by OSWatcher in the archive directory to look at specific output.

Memory

The memory button is also in bright red, so something must be wrong in that area.

memoryhttps://martincarstenbach.files.wordpress.com/2018/10/memory.png?w=150 150w, https://martincarstenbach.files.wordpress.com/2018/10/memory.png?w=300 300w" sizes="(max-width: 758px) 100vw, 758px" />

The system doesn’t seem to be in best shape. This is entirely my fault: I have assigned too little memory to my VMs, and here is proof!

Changing the scope

After the first investigation using all the data, you may want to narrow the scope down a bit. Or you might already know that an issue was reported last night between 2 AM and 4 AM. Narrowing the scope down for the purpose of creating a (more detailed) dashboard can be done using the “S” flag in the initial menu, as shown here:

Please Select an Option:S

Specify Analysis Start Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 10 07:00:03 2018  :Oct 11 21:00:00 2018

Specify Analysis End Time. Valid entry between Oct 10 07:00:03 2018 and Oct 16 21:31:59 2018
Example Format To Enter Time: Oct 16 21:31:59 2018  :Oct 12 01:00:00 2018

Dates accepted. Verifying valid begin/end data points...

Validating times in the archive...


Recalibrating data...
Scanning file headers for version and platform info...

Parsing file rac18pri1_iostat_18.10.11.2000.dat ...

[...]

Parsing file rac18pri1_ps_18.10.12.0000.dat ...

Enter a unique analysis directory name or enter  to accept default name:

There are other options to narrow the scope down: menu option “Z” allows you to specify a date/time range for interactive use without changing the dataset. In interactive mode you can change the scope using “Z”, followed by the creation of any chart of interest (menu items 1-5). You can reset the “zoom” by pressing “B”.

Summary

Did I say I really like OSWatcher? It’s really helpful. If you install it together with TFA you don’t even need to concern yourself with writing a startup script, it’ll just be started. If you have to, you can go back in time (within reason) and investigate O/S performance statistics. Unlike SAR, it will give you a 30 second granularity, which is a lot better than the rather coarse 10 minute interval.

Admittedly, SAR keeps a month worth of data, so both tools really complement each other quite nicely.

Happy troubleshooting!

Problem Solving

Here’s a little question that popped up on the Oracle-L list server a few days ago:

I am facing this issue running this command in 11.2.0.4.0 (also in 12c R2 I got the same error)

SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
ORA-01873: a precisão precedente do intervalo é pequena demais

 

How do you go about finding out what’s going on ? In my case the first thing is to check the translation the error message (two options):

SQL> execute dbms_output.put_line(sqlerrm(-1873))
ORA-01873: the leading precision of the interval is too small

SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
                                                                                                       *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small

That didn’t quite match my guess, but it was similar, I had been guessing that it was saying something about precision – but it doesn’t really strike me as an intuitively self-explanatory message, so maybe a quick check in $ORACLE_HOME/rdbms/mesg/oraus.msg to find the error number with cause and action will help:


01873, 00000, "the leading precision of the interval is too small"
// *Cause: The leading precision of the interval is too small to store the
//  specified interval.
// *Action: Increase the leading precision of the interval or specify an
//  interval with a smaller leading precision.

Well, that doesn’t really add value – and I can’t help feeling that if the leading precision of the interval is too small it won’t help to make it smaller. So all I’m left to go on is that there’s a precision problem of some sort and it’s something to do with the interval, and probably NOT with adding the interval to the timestamp. So let’s check that bit alone:


SQL> SELECT NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL;
SELECT NUMTODSINTERVAL(2850166802000/1000, 'SECOND') FROM DUAL
                                    *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small


So the interval bit is the problem. Since the problem is about “precision”, let’s try messing about with the big number. First I’ll do a bit of cosmetic tidying by doing the division to knock off the trailing zeros, then I’ll see what happens when I divide by 10:

SQL> SELECT NUMTODSINTERVAL(285016680, 'SECOND') from dual;

NUMTODSINTERVAL(285016680,'SECOND')
---------------------------------------------------------------------------
+000003298 19:18:00.000000000

So 285 million works, but 2.85 billion doesn’t. The value that works give an interval of about 3,298 days, which is about 10 years, so maybe there’s an undocumented limit of 100 years on the input value; on the other hand the jump from 285 million to 2.85 billion does take you through a critical computer-oriented limit: 231 – 1, the maximum signed 32 bit integer (2147483647) so lets try using that value, and that value plus 1 in the expression:


SQL> SELECT NUMTODSINTERVAL(power(2,31), 'SECOND') from dual;
SELECT NUMTODSINTERVAL(power(2,31), 'SECOND') from dual
                       *
ERROR at line 1:
ORA-01873: the leading precision of the interval is too small


SQL> SELECT NUMTODSINTERVAL(power(2,31)-1, 'SECOND') from dual;

NUMTODSINTERVAL(POWER(2,31)-1,'SECOND')
---------------------------------------------------------------------------
+000024855 03:14:07.000000000

1 row selected.

Problem identified – it’s a numeric limit of the numtodsinterval() function. Interestingly it’s not documented in the Oracle manuals, in fact the SQL Reference manual suggests that this shouldn’t be a limit because it says that “any number value or anything that can be cast as a number is legal” and in Oracle-speak a number allows for roughly 38 digits precision.

Whilst we’ve identified the problem we still need a way to turn the input number into the timestamp we need – the OP didn’t need help with that one: divide by sixty and convert using minutes instead of seconds:


SQL> SELECT TO_TIMESTAMP('1970-01-01 00:00:00.0','YYYY-MM-DD HH24:MI:SS.FF') + NUMTODSINTERVAL(2850166802000/1000/60, 'MINUTE') FROM DUAL;

TO_TIMESTAMP('1970-01-0100:00:00.0','YYYY-MM-DDHH24:MI:SS.FF')+NUMTODSINTER
---------------------------------------------------------------------------
26-APR-60 01.00.02.000000000 AM

1 row selected

Job done.

Compressed LOB–my table got bigger?

We had an interesting question on AskTOM the other day about LOB compression. Someone was very concerned that after applying compression to the LOBS in the column of their table, the table actually got larger! Cue much confusion and panic, but after a little exploration, the explanation was pretty simple.

By default, when you create a LOB column in a table, the default storage definition is ENABLE STORAGE IN ROW. As per the documentation:

If you enable storage in row, then the LOB value is stored in the row (inline) if its length is less than approximately 4000 bytes minus system control information

After getting some more information from the person who asked the question, the majority of their LOBs (they were CLOBs in this instance) sat around the 10 to 20kilobyte mark.  What this means is that in uncompressed form, the LOB data would be deemed too large to be stored alongside the rest of the row – it would be moved to the separate LOB segment that holds the data for the LOB column. But after compressing the LOB, some of those LOBs then dropped to less than the ~4kilobyte threshold and thus would be now stored in the table itself.  If you have a lot of such LOBs, then the table segment could indeed grow, even though the overall size of the table (ie, table data plus LOB data) should indeed shrink with the compression.

We can see that with a simple demo – I’ll put some very compress-able data into the LOB, but start with the segment being the default of not compressed:


SQL> create table t1
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    nocompress);

Table created.

SQL>

SQL> insert into t1
  2  select rownum, rpad('b',12000,'b')
  3  from   dual
  4  connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

Now I’ll copy that table into a replica, the only difference between that the LOB column will be marked as compressed.


SQL> create table t2
  2  ( x int,
  3    c clob
  4  )
  5  lob (c) store as securefile (
  6    enable      storage in row
  7    compress);

Table created.

SQL>

SQL> insert into t2
  2  select * from t1;

10000 rows created.

SQL>
SQL> commit;

Commit complete.

Now we can compare the size of the LOB segments as well as the size of table


SQL>  select segment_name, bytes from user_segments
  2   where segment_name = 'T1' or
  3   segment_name = ( select segment_name from user_lobs
  4                    where table_name = 'T1' );

SEGMENT_NAME                        BYTES
------------------------------ ----------
T1                                 655360
SYS_LOB0000268696C00002$$       277020672

2 rows selected.

SQL>
SQL>  select segment_name, bytes from user_segments
  2   where segment_name = 'T2' or
  3   segment_name = ( select segment_name from user_lobs
  4                    where table_name = 'T2' );

SEGMENT_NAME                        BYTES
------------------------------ ----------
T2                                3145728
SYS_LOB0000268699C00002$$          131072

2 rows selected.

SQL>
SQL>
SQL>

So yes, the table (in red) has grown, but the size of the overall table (where “overall” means table plus LOB data), has shrunk thanks the compression of the LOB data.

Ansible tips’n’tricks: assessing your runtime environment

One thing that I frequently need to do is test for a certain condition, and fail if it is not met. After all, I want to write those playbooks in a safe way.

Here is an example: I need to ensure that my playbook only runs on Oracle Linux 7. How can I do this? Ansible offers a shell and a command module (make sure you read the notes in the command module documentation!), so I could simply write something testing for the output of, let’s say, /etc/os-release.

This is totally possible, although I believe it’s a bit messy and there is a more elegant way requiring far less coding. Ansible maintains a whole raft of variables it gathers when you run a playbook. Here is an example (I am again using the “debug” stdout_callback as described in my earlier blog posts):

[martin@controller ansible]$ ansible-playbook -i inventory.yml test.yml

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [say hello] ***************************************************************
ok: [server1] => {}

MSG:

hello world

PLAY RECAP *********************************************************************
server1                    : ok=2    changed=0    unreachable=0    failed=0   

When invoking the play “blogpost” (I have defined my host “server1” to be part of a group named “blogposts” in the local inventory file), I can see that the first task is to gather facts. This is unusual, since I haven’t defined any such task:

[martin@controller ansible]$ cat test.yml 
---
- hosts: blogpost

  tasks:
    - name: say hello
      debug:
        msg: hello world

Nevertheless, ansible gathers these facts, and they are most useful. But which of these exist, and what are their values? It is not too hard to find out. The following command invokes the setup module (using a local inventory mapping “server1” to group “blogpost”) specifically against host “server1”:

[martin@controller ansible]$ ansible -i inventory.yml -m setup server1 | head -n 25
server1 | SUCCESS => {
    "ansible_facts": {
        "ansible_all_ipv4_addresses": [
            "192.168.abc.24"
        ],
        "ansible_all_ipv6_addresses": [
            "fe80::5054:ff:fe91:abcd"
        ],
        "ansible_apparmor": {
            "status": "disabled"
        },
        "ansible_architecture": "x86_64",
        "ansible_bios_date": "04/01/2014",
        "ansible_bios_version": "1.10.2-2.fc27",
        "ansible_cmdline": {
            "BOOT_IMAGE": "/vmlinuz-4.1.12-94.3.9.el7uek.x86_64",
            "LANG": "en_US.UTF-8",
            "crashkernel": "auto",
            "quiet": true,
            "rd.lvm.lv": "rootvg/swaplv",
            "rhgb": true,
            "ro": true,
            "root": "/dev/mapper/rootvg-rootlv"
        },
        "ansible_date_time": {

There is a lot more output! You can query pretty much everything using these built-in variables. So what about our distribution? Let’s check the output of the setup module for a string containing the word “distribution”:

[martin@controller ansible]$ ansible -i inventory.yml -m setup server1 | grep -i distribution
        "ansible_distribution": "OracleLinux", 
        "ansible_distribution_file_parsed": true, 
        "ansible_distribution_file_path": "/etc/oracle-release", 
        "ansible_distribution_file_search_string": "Oracle Linux", 
        "ansible_distribution_file_variety": "OracleLinux", 
        "ansible_distribution_major_version": "7", 
        "ansible_distribution_release": "NA", 
        "ansible_distribution_version": "7.5", 

So it looks like there is something we could use here. I can re-write my playbook now to check for Oracle Linux 7 as follows:

[martin@controller ansible]$ cat test2.yml 
---
- hosts: blogpost

  tasks:
    - name: print debuginfo
      debug:
        var: "{{ item }}" 
      with_items:
        - ansible_distribution
        - ansible_distribution_major_version

    - name: ensure we run Oracle Linux 7
      fail:
        msg: This playbook only runs on Oracle Linux 7
      when: (ansible_distribution != "OracleLinux" and ansible_distribution_major_version|int != 7)

This does work all right when running against an Oracle Linux 7 system.

[martin@controller ansible]$ ansible-playbook -i inventory.yml test2.yml

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [server1]

TASK [print debuginfo] *********************************************************
ok: [server1] => (item=ansible_distribution) => {
    "ansible_distribution": "OracleLinux", 
    "item": "ansible_distribution"
}
ok: [server1] => (item=ansible_distribution_major_version) => {
    "ansible_distribution_major_version": "7", 
    "item": "ansible_distribution_major_version"
}

TASK [ensure we run Oracle Linux 7] ********************************************
skipping: [server1]

PLAY RECAP *********************************************************************
server1                    : ok=2    changed=0    unreachable=0    failed=0   

The “skipping” message indicates this task [ensure we run Oracle Linux 7] has not fired and we can proceed with life. Pointing this playbook to another environment not running Oracle Linux 7, the play fails:

[martin@controller ansible]$ ansible-playbook -i inventory.yml test2.yml

PLAY [blogpost] ****************************************************************

TASK [Gathering Facts] *********************************************************
ok: [linuxdev]

TASK [print debuginfo] *********************************************************
ok: [linuxdev] => (item=ansible_distribution) => {
    "ansible_distribution": "Fedora", 
    "item": "ansible_distribution"
}
ok: [linuxdev] => (item=ansible_distribution_major_version) => {
    "ansible_distribution_major_version": "28", 
    "item": "ansible_distribution_major_version"
}

TASK [ensure we run Oracle Linux 7] ********************************************
fatal: [linuxdev]: FAILED! => {
    "changed": false
}

MSG:

This playbook only runs on Oracle Linux 7


PLAY RECAP *********************************************************************
linuxdev                   : ok=2    changed=0    unreachable=0    failed=1

So there we go! This example can easily be extended to check for other things, such as a combination of running UEK on Oracle Linux, etc.

Happy scripting!

Ignorance is Bliss, Until it Isn’t

My blog was hacked last week, deliberately and maliciously.  I was hacked because my lacking knowledge in the area of website management and saved by my control issues as a Database Administrator.  It was a valuable lesson to learn in regards to being a woman in technology and new technical skills.

If you ask most women, they’ll most likely say that they haven’t had too many difficulties with men harassing them, especially if they’re over 40.  In truth, its not that they haven’t, we often become so immune to it, until other women recall their own stories, then suddenly recall issues.  We’ve also learned that when we bring it up, someone will most likely ask us if we’re over-reacting or if we could have misunderstood the situation, so we’re unsure why we should bother with it.

Due to this, I under-estimated the anger of a someone that was causing difficulty for a woman I was mentoring and had advised.  When the woman, we’ll call “Sarah”, first started her recent job, “John” was really nice to her.  He went out of his way to help her get to know the lay of the land at the company and was the first to answer questions.  She admitted she isn’t sure how she would have gotten along without him.  After six months on the job, she started dating the man she’s about to marry.  As soon as they started dating, John began to behave differently around Sarah.  He went from brushing her off, to speaking in a condescending manner in meetings, but she thought she might be over-reacting.  She finally asked to meet with him and he admitted he was hurt that she was dating someone and thought they’d had a “connection”.  She carefully and clearly set the boundary that they were coworkers- ONLY.   The negative interactions escalated until she reached out to me asking for advice on finding a new position.  She really loved her job and had been there for over a year at this point.  I worked with her, advising her how to track the issues in writing and report it to HR.  Due to this, a number of meetings occurred to address the behavior with John.  Unfortunately, he wasn’t willing to address his behavior with Sarah and he was terminated.  As the conversations were  extensive, Sarah did let it slip that I was the one assisting her in how to work through the challenge and  John chose soon to start blaming me for why Sarah hadn’t done what he wanted.

I didn’t put two and two together when I received my first email and honestly, I get weird emails from time to time.  I simply deleted it.  The next one, I sent to spam and ignored them.  Seriously guys in tech, these idiots are why you can’t have a nice conversation with women everywhere.  We never know when one of these freaks are going to come out of the woodwork and how much they’re going to go off the deep end.  If you get frustrated that women won’t have a simple conversation with you in public, don’t get upset with her, but with these guys that go to ridiculous lengths to get vengeance and justify their behavior because they didn’t get what they wanted.

This guy had just enough skills, discovering I had an out of date PHP version on my blog.  He used this to get into my blog and hack it with malware.  The old PHP version is my fault and my lacking skills in website administration.  My saving grace is that I am a damn good DBA and had multiple backup software on dbakevlar.com because…well, I’m a DBA.  The backup software that had saved me in previous times of challenges, Site Backup Pro, failed me and my website provider in this scenario.  After the full, fresh install of WordPress, it wasn’t able to recover to a working state.  Luckily, my second daily backup, Updraft Pro, DID work and I was able to recover everything back to before the damage.

What John didn’t have was superior enough skills to cover his tracks.  After they siphoned off all my varying IP addresses from my travels that had logged into the site, they were able to pinpoint John’s and after two days, had the evidence they needed to go after him for putting people at risk online.  John is out of a job at Sarah’s company and now in trouble with my provider, Bluehost.

Lessons learned?  I really needed to up my website administration skills.  Check.

I learned a lot the last week on how to manage my own website and realize that after eight years, I have a lot of garbage that needed to be cleaned up on my site and proceeded to do so.  I also lost all of danceswithwinnebagos.com, as it didn’t have the Updraft backup configured, only the Site Backup Pro and I may have to build that one back up from scratch… </p />
</p></div>

    	  	<div class=

Faking Histograms

This is a short index of articles I’ve written on how to create the different types of histogram that the optimizer uses:

  • Faking a frequency histogram    How to create frequency histograms (using a numeric column for the example)
  • Histogram Tip  An example of creating a simple character-based frequency histogram (published in the IOUG Tips booklet 2014).
  • Faking a height-balanced histogram  How to create a height-balanced histogram (using a numeric column for the example).
  • Hybrid Fake: How to create a hybrid histogram (using a character column for the example).
  • Extended Histogram:  faking values into a histogram for a column group – only special because we need to derive the value stored.
  • Top frequency:  I haven’t yet worked out how to fake a Top Frequency histogram. Since it’s little more than a frequency histogram where the optimizer knows there’s a further small percentage (less than one bucketful) of other data, this doesn’t worry me; if necessary I’ll just create a “good enough” frequency histogram and set a suitable density for the remainder.

And a couple of miscellaneous things about histograms

  • Big number problem – older versions of Oracle (pre 12c) can go wrong with data values more than 15 digits long
  • Long strings problem – until 12c Oracle stored at most 32 bytes of a string in the endpoint_actual_value column.
  • Hybrid/Top-N problem – a bug, fixed in 12.2 with a patch for 12.1.
  • Upgrade threat – a step you need to take to upgrade from 11.2.0.3 if you have histograms on char() columns
  • Upgrade threat 2 (Oracle-L) – if you’ve got a big history of histograms then the upgrade from 11.2.0.3 (or earlier) could take a long time

 

Soup

When my mother-in-law comes round to Sunday lunch we often have roast chicken – and a serious error in estimating the requirement for the vegetable bed I was roasting on led to the discovery of home-made soup. (I did warn you that my post-operative posts would be light-weight)

Ingredients

  • Remnants of cooked chicken
    • (or 250ml, 1/3rd pint, one cup of  boiled water with a vegetable or chicken stock cube)
  • One large carrot
  • One small parsnip
  • One medium onion
    • The three vegetables should be similar in weight: roughly 200g, a bit less than 1/2 lb.
    • (Or just about any leftover vegetables from a roast dinner – I’ve even used leeks in cheese sauce)
  • Greek yoghurt (One tablespoon)
  • Choice of herbs and seasoning.

 

Method

  • Discard any large areas of fatty skin from the chicken carcase
  • Break up the carcase and place in saucepan with 500ml (2 cups, 3/4 pint) water
  • Bring to the boil and simmer for about 30 minutes with saucepan lid on.
  • Strain into a fresh saucepan, discard the remnants of chicken

 

  • Top, tail and peel the carrot, cut into discs
  • Top, tail and peel turnip, cut into small chunks
  • Peel and dice the onion.
  • Mix the stock and vegetables in a fresh saucepan
  • add herbs and seasoning
    • (I like a teaspoon of chopped tarragon – fresh from the garden)
  • Simmer with lid on for about 20 minutes
    • (until the carrots are softened)

 

  • Tip contents into blender
  • Blend until smooth.
  • Add a rounded tablespoon of plain Greek yoghurt, and blend
  • Serve – 2 large or 3 small portions

Despite looking a bit boring before it goes into the blender the soup tends to come out a surprisingly cheerful sunshine yellow thanks to the carrot. If you’ve managed to get the same results as I do then, because of the yoghurt I think, the texture will be almost like a thick foam or very light mousse.

I typically end up making the stock immediately after lunch is over, then keep it in the fridge for a couple of days before making the soup; that does mean you can skim off any excess fat before using the stock for the soup. And if there’s any gravy left over from lunch that’s a bonus to go in the soup.

 

Accurately Interpreting Real Application Clusters IOPS with Automatic Workload Repository. So Easy to Get So Wrong.

This blog post has been necessary for quite some time but I just now finally got around to posting it. What I’m going to blog about is a common problem I run into in my dealings with Oracle Database Administrators (DBAs). It’s about IOPS data in Automatic Workload Repository (AWR) reports. Please don’t roll your eyes. Not everyone gets this right. I’ll explain…

I cannot count how many times I’ve had DBAs cite some IOPS number from their workload only to later receive an AWR report from them that shows a mere fraction of what they think their IOPS load is. This happens very frequently!

I’m going to explain why this happens and then show how to stop getting confused about the data.

At issue is the AWR report generated by awrgrpt.sql which is a RAC AWR report. DBAs will run this script, generate a report, open it and scroll down to the System Statistics – Per Second section so they can see “Physical Reads/s”. That’s where the problem starts. That doesn’t mean it’s the DBAs fault, per se. Instead, I blame Oracle for using the column heading “Physical Reads/s” because, well, that’s 100% erroneous.

A Case Study

To make this topic easier to understand, I set up a small SLOB test. The SLOB driver script (runit.sh) produces both the RAC (awrgrpt.sql) and non-RAC (awrrpti) AWR reports. First, I’ll describe why SLOB can make this so easy to understand.

The following screenshot shows the slob.conf I used. You’ll notice that I loaded a “scan table” of 1GB for each schema. Not shown is the fact that I loaded 64 schemas.

For this type of testing I used the SLOB “batch” approach as opposed to a fixed-time test.  As Figure 1 shows, I set the WORK_LOOP parameter to 10, which establishes that each of the sessions will perform 10 iterations of the “work loop”–a “batch” of work as it were. To make the SLOB test perform nothing but table scans, I set SCAN_PCT to 100 and UPDATE_PCT to 0.

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png?w=500&h=166" alt="" width="500" height="166" srcset="https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png?w=500&h=166 500w, https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png?w=150&h=50 150w, https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png?w=300&h=100 300w, https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png?w=768&h=255 768w, https://kevinclosson.files.wordpress.com/2018/10/slob_conf1.png 926w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 1: SLOB Configuration File.

The runit.sh script produces both the RAC and non-RAC flavor of AWR report in compressed HTML format. The files are called awr.html.gz awr_rac.html.gz.

The RAC report differs from the non-RAC AWR report in both obvious and nonobvious ways. Obviously, a RAC report includes per-instance statistics for multiple instances, however, the RAC report also includes categories of statistics not seen in the non-RAC report. It is, in fact, one of the classes of statistics–only present in the RAC AWR report–that causes the confusion for so many DBAs.

Understandable Confusion – Words Matter!

Figure 2 shows a snippet of the RAC AWR report generated during the SLOB test. The screenshot shows the crux of the matter. To check on IOPS, many DBAs open the RAC AWR report and scroll to the Global Activity Load Profile where the per-second system statistics section supposedly reports physical reads per second. I’m not even going to spend a moment of your time to split the hairs that need splitting in this situation because the rest of the RAC report clarifies the erroneous column heading.

Simply put, the simple explanation for this simple problem is that Oracle simply chose an incorrect–if not simply over-simplified–column heading in this section of the RAC AWR report.

Logical and Physical I/O

Figure 2 shows the erroneous column heading. In this section of the RAC AWR report, both logical I/O and physical I/O are reported. In this context, all logical I/O in Oracle are single-block operations. What’s being reported in this section is that there were 68,115 logical reads per second of which 64,077 required a fetch from storage as the result of a cache miss. If Oracle suffers a miss on a logical I/O (which is an operation to find a cached, single block in SGA block buffer pool), the resultant action is to get that block from storage. Whether the missed block is “picked up” in a multi-block read or a single-block read is not germane in this section of the report. That said, I still think the column heading is very poorly worded. It should be “Blocks Read/s”.

 

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statistics2.png?w=500&h=278" alt="" width="500" height="278" srcset="https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statistics2.png?w=500&h=278 500w, https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statisti... 150w, https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statisti... 300w, https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statisti... 768w, https://kevinclosson.files.wordpress.com/2018/10/rac-awr-system-statisti... 879w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 2: RAC AWR Report. Global Activity Load Profile.

What are IOPS? Operations! IOPS are Operations!

That heading made me think of Charlton Heston swinging off the back of a garbage truck yelling, “It’s Operations, IOPS is Operations”. Some may get the Soilent Green reference. Enough of my feeble attempt at humor.

The acronym IOPS stands for I/O operations per second. The word operations is critical. An I/O operation in, Oracle database, is a system call to transfer data to or from storage. Further down in the RAC AWR is the IOStat by Function section which uses a term that maps precisely to IOPS–requests.  Another good term for an I/O operation is a request–a request of the operating system to transfer data–be it a read or a write. An operation–or a request–to transfer data comes in widely varying shapes and sizes. Figure 3 shows that the SLOB test described above actually performed approximately 8,000 requests–or, operations–per second from storage in the read path.

I’ll reiterate that. This workload performed 8,000 IOPS.

That’s a far cry less than suggested in Figure 2.

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/p3.png?w=500&h=214" alt="" width="500" height="214" srcset="https://kevinclosson.files.wordpress.com/2018/10/p3.png?w=500&h=214 500w, https://kevinclosson.files.wordpress.com/2018/10/p3.png?w=150&h=64 150w, https://kevinclosson.files.wordpress.com/2018/10/p3.png?w=300&h=128 300w, https://kevinclosson.files.wordpress.com/2018/10/p3.png?w=768&h=329 768w, https://kevinclosson.files.wordpress.com/2018/10/p3.png 822w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 3: RAC AWR Report. IOStat by Function.

Further down in the RAC AWR report, is the Global Activity Statistics section. This section appears in both the RAC and non-RAC reports. Figure 4 shows a snippet of the Global Activity Statistics section from the RAC report generated during the SLOB. Here again we see the unfortunate misnomer of the statistic that actually represents the number of blocks retrieved from storage. Figure 4 shows the same 64,000 “physical reads” seen in Figure 2, however, there are adjacent statistics in the report to help make information out of this data. Figure 4 ties it all together. The workload performed some 8,000 read requests from storage. What was read from storage? Well, 64,000 blocks per second where physically read from storage.

Let’s Go Shopping

I have a grocery-shopping analogy in mind. Think of a requests as each time you transfer an item from the shelf to your cart. Some items are simple, single items like a bottle of water and others are multiple items in a package such as a pallet of bottled water. In this analogy, the pallet of bottled water is a multi-block read and the single bottle is, unsurprisingly, a single-block read. The physical reads statistic is the total number of water bottles–not items–placed into the cart. The number of item’s placed in the cart per second was 8,000 by way of plopping something into the cart 8,000 times per second. I hope the cart is huge.

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/p2.png?w=500&h=83" alt="" width="500" height="83" srcset="https://kevinclosson.files.wordpress.com/2018/10/p2.png?w=500&h=83 500w, https://kevinclosson.files.wordpress.com/2018/10/p2.png?w=150&h=25 150w, https://kevinclosson.files.wordpress.com/2018/10/p2.png?w=300&h=50 300w, https://kevinclosson.files.wordpress.com/2018/10/p2.png?w=768&h=128 768w, https://kevinclosson.files.wordpress.com/2018/10/p2.png 843w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 4: Global Activity Statistics. RAC AWR Report.

As mentioned earlier in this post, the RAC AWR report is really just a multi-instance report. In this case I generated a multi-instance report that happened to capture a workload from a single-instance. That being the case, I can also read the non-RAC AWR report because it too covers the entire workload. If everyone, always, read the non-RAC AWR report there would never be any confusion about the difference between reads and payload (read requests and blocks read)–at least not since Oracle Database 11.2.0.4.

Starting in the final patchset release of 11g (11.2.0.4), the Load Profile section (which does not appear in the RAC AWR report) clearly spells out the situation with the “Physical read” misnomer. Figure 5 shows the Load Profile in the non-RAC report generated by the SLOB test. Here we can see that late in 11g there were critical, parenthesized, words added to the output. What was once reported only as “Physical reads” became “Physical read(blocks)” and the word requests was added to help DBAs determine both IOPS (requests per second are IOPS) and the payload, Again, we can clearly see that the test had an I/O workload that consisted of 8,000 requests per second for 64,000 blocks per second for a read throughput of 500MB/s.

 

#000000;" src="https://kevinclosson.files.wordpress.com/2018/10/p5.png?w=500&h=289" alt="" width="500" height="289" srcset="https://kevinclosson.files.wordpress.com/2018/10/p5.png?w=500&h=289 500w, https://kevinclosson.files.wordpress.com/2018/10/p5.png?w=150&h=87 150w, https://kevinclosson.files.wordpress.com/2018/10/p5.png?w=300&h=174 300w, https://kevinclosson.files.wordpress.com/2018/10/p5.png?w=768&h=445 768w, https://kevinclosson.files.wordpress.com/2018/10/p5.png 779w" sizes="(max-width: 500px) 100vw, 500px" />

Figure 5: Non-RAC AWR. Load Profile Section. Simple and Understandable I/O Statistics.

Wrapping It All Up

I hope a few readers make it to this point in the post because there is a spoiler alert. Below you’ll find links to the actual AWR reports generated by the SLOB test. Before you read them I feel compelled to throw this spoiler alert out there:

I set db_file_multiblock_read_count to 8 with the default Oracle Database block size.

That means processes were issuing 64KB read requests to the operating system.

With that spoiler alert I’m sure most readers are seeing a light come on. The test consists of 64 sessions performing 64KB reads. The reports tell us there are 8,000 requests per second (IOPS) and 64,077 blocks read per second for a read throughput of 500MB/s.

500MB / 8,000 == 64KB.

 

Links to the AWR reports:

https://github.com/therealkevinc/AWRS/blob/master/awr.html

https://github.com/therealkevinc/AWRS/blob/master/awr_rac.html

Click the following link and githup will give you a ZIPed copy of these AWR reports: https://github.com/therealkevinc/AWRS/archive/master.zip

 

 

LOBs vs Data – when compression occurs

Just a quick tip for SECUREFILE lobs.

You may be familiar with basic compression on data in tables. It is a dictionary attribute of the table, which is then taken into account when you perform:

  • a SQL Loader direct load, or
  • an INSERT with the APPEND hint, or
  • a bulk bind INSERT in PLSQL using the APPEND_VALUE hint.

Whichever of the above you perform does not really matter – the key thing is that when you set the COMPRESS attribute on a table, this only applies on operations subsequent to the moment at which you altered the table. It does not compress the existing data. Hence when you alter a table to add the attribute, it is instantaneous.


SQL> create table t (x int, c clob ) lob (c) store as securefile;

Table created.

SQL>
SQL> insert into t
  2  select rownum, dbms_random.string('x',20000)
  3  from dual
  4  connect by level < 100000;

99999 rows created.

SQL>
SQL> commit;

Commit complete.

SQL>
SQL> set timing on
SQL> alter table t compress;

Table altered.

Elapsed: 00:00:00.00

But do not make the mistake of assuming this is the case if you opt to set the compress attribute for a SECUREFILE LOB. Setting this attribute on a table will immediately read and update all of the existing LOB data in the table, which of course could be a very resource intensive operation.


SQL> alter table t modify lob (c ) ( compress );

Table altered.

Elapsed: 00:00:23.32

Whilst your best option is probably to specify LOB compression when you create the table, if you need to do it after the fact, be aware that you’ll probably want to schedule it in a quiet time.

dbms_log

I’ve been a long time (though occasional) user of the undocumented dbms_system package, typically using it to write messages or insert break lines in trace files (or the alert log). Thanks to an email from Cary Millsap I’ve recently discovered that the procedures for writing to trace files have been copied to a separate dbms_log package – which is nice because some of the things in dbms_system shouldn’t be made available to general code, for example the procedure kcfrms which resets a number of the “max time” columns in various dynamic performance views. It can be very useful occasionally – you might even want to call it just before or just after every AWR snapshot – but I’d rather that no-one else was able to call if I thought that I needed to  do so.

The dbms_log package is also (currently – even in 18.3) undocumented but maybe one day soon it will appear in the PL/SQL Packages and Types reference manual. The procedures available in the package are as follows:

SQL> desc dbms_log
PROCEDURE KSDDDT
PROCEDURE KSDFLS
PROCEDURE KSDIND
 Argument Name			Type			In/Out Default?
 ------------------------------ ----------------------- ------ --------
 LVL				BINARY_INTEGER		IN
PROCEDURE KSDWRT
 Argument Name			Type			In/Out Default?
 ------------------------------ ----------------------- ------ --------
 DEST				BINARY_INTEGER		IN
 TST				VARCHAR2		IN
  • ksdddt moves to a new line in the trace file writes the date and moves to the next line but won’t do anything if the trace file has not already been opened; so if you want a datestamp at the top of the output you actually have to start with a ksdwrt or ksdind call.
  • ksdfls flushes any pending writes to the trace file / alert log and closes the file – which isn’t relevant in my example but would make a difference to when you see the text that has been written from inside a pl/sql block.
  • ksdind writes an “indent” of lvl colon (‘:’) symbols to the trace file. This is a one-off effect, it doesn’t set an indent for all future lines it merely writes the ‘:’ so that the next call to ksdwrt appends its text after the colons.
  • ksdwrt writes tst to the trace file if dest = 1, to the alert log if dest = 2 and to both if dest = 3, adding a new-line at the end of the text.

Here’s a fragment of code calling the procedures to write something to my trace file:


execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Starting')
execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Underlining')
execute dbms_log.ksdind(20)
execute dbms_log.ksdwrt(1,'Indented')
execute dbms_log.ksdwrt(1,'Not Indented')
execute dbms_log.ksdind(30)
execute dbms_log.ksdddt
execute dbms_log.ksdwrt(1,'Finished')
execute dbms_log.ksdind(30)
execute dbms_log.ksdfls


Here’s the text that appears in the trace files:


Starting

*** 2018-10-04T16:31:15.525515+01:00 (ORCL(3))
Underlining
::::::::::::::::::::Indented
Not Indented
::::::::::::::::::::::::::::::
*** 2018-10-04T16:31:15.532881+01:00 (ORCL(3))
Finished
::::::::::::::::::::::::::::::

Note how the call to ksdddt in line 1 of code didn’t write a date into the trace file because it wasn’t yet open. The call to ksdwrt in line 2 writes ‘Starting’ and moves to a new line so we get a blank line when the call to ksdddt in line 3 moves to a new line and writes the date. At line 5 we “indent 20”, so the ksdwrt at line 6 starts after the string of colons, then moves to a new line where the indent is not repeated. We indent again at line 8, which leaves us at the end of a line, so when we call ksdddt it moves to the start of the next line and writes the date there – we don’t get a blank line.

Footnote: when I saw Cary Millsap’s note I assumed that the procedures had been copied across in a recent version of Oracle; in fact dbms_log has been around since at least 11.2.0.4

Footnote 2: While checking my library for references to dbms_system I came across a script I’d used to create a local version of dbms_system that allowed me to execute a call to “dbms_system.set_bool_param_in_sesssion(‘#_SILVER_BULLET’, true)”. I used it at the IOUG conference in 2006 to demonstrate that if you set this “very hidden” parameter to true then some of your queries could run faster.  (What the call actually did was enable query rewrite and disable function-based indexes so that a special time-wasting index I’d created couldn’t get used.)