Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part III: Link To The Full README file for Beta pgio v0.9.

If you are interested in a head start on pgio, the following is a link to the full README file which has some loading and testing how-to:

The pgio text README file version 0.9 Beta

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part II: Bulk Data Loading.

Bulk Data Loading With pgio Version 0.9 Beta

Now that pgio (the SLOB Method for PostgreSQL) is in Beta users’ hands I’m going to make a few quick blog entries with examples of pgio usage. The following are screen grabs taken while loading 1 terabyte into the pgio schemas. As the example shows, pgio (on a system with ample storage performance) can ready a 1 terabyte data set for testing in only 1014 seconds (roughly 3.5TB/h loading rate).

$ cat pgio.conf
UPDATE_PCT=0
RUN_TIME=120
NUM_SCHEMAS=4
NUM_THREADS=2
WORK_UNIT=255
UPDATE_WORK_UNIT=8
SCALE=1G

DBNAME=pg10
CONNECT_STRING=”pg10″

CREATE_BASE_TABLE=TRUE

$ echo ”
> UPDATE_PCT=0
> RUN_TIME=120
> NUM_SCHEMAS=64
> NUM_THREADS=16
> WORK_UNIT=255
> UPDATE_WORK_UNIT=8
> SCALE=16G
> DBNAME=pg10
> CONNECT_STRING=\”pg10\”
> CREATE_BASE_TABLE=TRUE ” > pgio.conf
$
$ sh ./setup.sh

Job info: Loading 16G scale into 64 schemas as per pgio.conf->NUM_SCHEMAS.
Batching info: Loading 16 schemas per batch as per pgio.conf->NUM_THREADS.
Base table loading time: 23 seconds.
Waiting for batch. Global schema count: 16. Elapsed: 0 seconds.
Waiting for batch. Global schema count: 31. Elapsed: 243 seconds.
Waiting for batch. Global schema count: 46. Elapsed: 475 seconds.
Waiting for batch. Global schema count: 61. Elapsed: 697 seconds.
Waiting for batch. Global schema count: 64. Elapsed: 935 seconds.

Group data loading phase complete. Elapsed: 1014 seconds.
$
$ psql pg10
psql (10.0)
Type “help” for help.

pg10=# \d+
List of relations
Schema | Name | Type | Owner | Size | Description
——–+——————+——-+———-+————+————-
public | pgio1 | table | postgres | 16 GB |
public | pgio10 | table | postgres | 16 GB |
public | pgio11 | table | postgres | 16 GB |
public | pgio12 | table | postgres | 16 GB |
public | pgio13 | table | postgres | 16 GB |
public | pgio14 | table | postgres | 16 GB |
public | pgio15 | table | postgres | 16 GB |
public | pgio16 | table | postgres | 16 GB |
public | pgio17 | table | postgres | 16 GB |
public | pgio18 | table | postgres | 16 GB |
public | pgio19 | table | postgres | 16 GB |
public | pgio2 | table | postgres | 16 GB |
public | pgio20 | table | postgres | 16 GB |
public | pgio21 | table | postgres | 16 GB |
public | pgio22 | table | postgres | 16 GB |
public | pgio23 | table | postgres | 16 GB |
public | pgio24 | table | postgres | 16 GB |
public | pgio25 | table | postgres | 16 GB |
public | pgio26 | table | postgres | 16 GB |
public | pgio27 | table | postgres | 16 GB |
public | pgio28 | table | postgres | 16 GB |
public | pgio29 | table | postgres | 16 GB |
public | pgio3 | table | postgres | 16 GB |
public | pgio30 | table | postgres | 16 GB |
public | pgio31 | table | postgres | 16 GB |
public | pgio32 | table | postgres | 16 GB |
public | pgio33 | table | postgres | 16 GB |
public | pgio34 | table | postgres | 16 GB |
public | pgio35 | table | postgres | 16 GB |
public | pgio36 | table | postgres | 16 GB |
public | pgio37 | table | postgres | 16 GB |
public | pgio38 | table | postgres | 16 GB |
public | pgio39 | table | postgres | 16 GB |
public | pgio4 | table | postgres | 16 GB |

Sneak Preview of pgio (The SLOB Method for PostgreSQL) Part I: The Beta pgio README File.

The pgio kit is the only authorized port of the SLOB Method for PostgreSQL. I’ve been handing out Beta kits to some folks already but I thought I’d get some blog posts underway in anticipation of users’ interest.

The following is part of the README.txt for pgio v0.9 (Beta). SLOB users will find it all easy to understand. This is the section of the README that discusses pgio.conf parameters:

UPDATE_PCT

The percentage of SQL that will be UPDATE DML

RUN_TIME

runit.sh run duration in seconds

NUM_SCHEMAS

pgio data is loaded into either a big single schema or multiple. NUM_SCHEMAS directs setup.sh to create and load NUM_SCHEMAS schemas.

NUM_THREADS

For setup.sh:  This parameter controls the number of concurrent
data loading streams.

For runit.sh:  This parameter controls how many sessions will attach to
each NUM_SCHEMAS schema.

For example, if NUM_SCHEMAS is set to 32 and NUM_THREADS is set
to 8, setup.sh will load data into 32 schemas in batches of
8 concurrent data loading streams. On the other hand, if set to the
same 32 and 8 respectively for NUM_SCHEMAS and NUM_THREADS then
runit.sh will run 8 sessions accessing each of the 32 schemas for a
total of 256 sessions.

WORK_UNIT

Controls the bounds of the BETWEEN clause for each  SELECT statement as it executes. For example, if set to 255 each SELECT will visit 255 random blocks. Smaller values require more SQL executions to drive the same IOPS.

UPDATE_WORK_UNIT

The UPDATE DML corollary for WORK_UNIT. This allows for a mixed SELECT/UPDATE workload where SELECT statements can visit more blocks than UPDATES.

SCALE

The amount of data to load into each schema. Values can be N as a number of 8KB blocks or N modified with [MG] for megabytes or gigabytes. For example, if set to 1024 setup.sh will load 8MB  ( 1024 * 8192 bytes) into each schema whereas set to 1024M will load  1024 megabytes into each schema.

DBNAME

The PostgreSQL database that holds the pgio objects

CONNECT_STRING

This parameter is passed to the psql command. The pgio kit expects that .pgpass and all other authentication is configured. For example, if CONNECT_STRING is set to “pg10” then the following command must succeed in your pgio environment:

$ psql pg10

CREATE_BASE_TABLE

The loader, setup.sh, creates a dense “seed” table as the source from  which to load the test tables. This seed table persists after setup.sh exits. If this parameter is set to true the seed table will not be regenerated.

 

 

 

 

Running Code as SYS From Another User not SYSDBA

I have been embroiled in a twitter thread today about the post i made in this blog yesterday around granting privileges to a user and who should do the granting. Patrick today asked a further question: How do you make....[Read More]

Posted by Pete On 22/05/18 At 08:42 PM

Who Should Grant Object Rights?

Patrick Jolliffe posted a question via a tweet back in April but due to personal health pressures with a close relative of mine I have not had the time to deal with much over the last few months. I did....[Read More]

Posted by Pete On 21/05/18 At 07:08 PM

5 years

I’ve joined Pythian exactly 5 years ago. Back then I was a performance guy, with no real DBA and operational support experience. I knew enough about technologies though. Still getting up to speed with common admin tasks took me almost a year. At some point later it became a bit boring: dealing with out of space issues and other noise during on-call can’t be appealing for long. After ~2 years I had an opportunity to join a big project which required experience in things I was familiar with. It lasted for almost a year, and it was a useful experience.

Then I got back to normal DBA work for a short period of time, after which I’ve joined another big project. This one was tricky at the beginning, and continues to be one of a kind for almost 2 years. Initially I was tasked to help with performance issues in an application I’ve never seen before, running on top of application server I’ve never touched. It took me few weeks to figure out unknown bits and pieces, and a few months to get stuff fixed (as it usually happens, negotiations how to make necessary changes take time).

After the app was fixed, I joined another part of a big project which is dealing with Oracle eBS. The work I did there was mostly operational support, helping with whatever things I can handle: almost anything but installation/patching/cloning which are things that seems to be high priority in eBS world. I did performance troubleshooting tasks as well, and it happened so that now I do performance work mostly for more than half a year. I really like it: I use my top skills to get to the bottom of issues, then fix it, get feedback pretty quick, and learn something along the way sometimes.

Such a long time on a dedicated project led me to a move to eBS team within Pythian.
Again, I need to catch up on stuff I know partially … same as 5 years ago </p />
</p></div>

    	  	<div class=

Announcing July/August Australian Dates: “Oracle Indexing Internals and Best Practices” Seminar

I’m very excited to announce new Australian dates for my highly acclaimed “Oracle Indexing Internals and Best Practices” seminar. This is a must attend seminar of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications. It’s a fun, but intense, […]

Attribute clustering….super cool

I’ve spoken about attribute clustering before here, here and here. So from that you can probably glean that I’m a fan.

I recently spoke about an example of this as well during my AskTOM Office Hours session which you can watch below:

After posting that video, I had a follow-up question which I thought would be worth exploring. My examples so far on attribute clustering were either loading a table from empty, or issuing an “alter table move” on a table that already contained data. In both those situations, it is reasonable to assume that clustering is going to help because it aligns conceptually with what we’d expect to happen, namely

  • read all of the data
  • re-organize all of the data so that it is clustered by the nominated clustering key(s)
  • reload all of the data

But what if we do not have that luxury of moving all of the data around? What if I have table that is already clustered, but now I need to load more data into it? As we know, we only see clustering in effect if we perform a direct load operation. But a direct load operation only creates new blocks. That would suggest we cannot cluster the data, no?

Lets explore this scenario with an example.

As a control to this experiment, I’ll perform some index lookups to a “conventional” table, that is, random data without any clustering attributes.

I’ll create a table with the data arranged in random order, and then add the same amount of data again. (I could do this in a single operation, but I’m keeping it in alignment with experiments I’ll perform shortly to mimic the concept of load data and then appending new data).


SQL> create table t as
  2  select rownum id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> insert /*+ APPEND */ into t
  2  select rownum+100000 id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

100000 rows created.

SQL>
SQL> commit;

Commit complete.

I’ve got 200,000 randomly distributed rows for 100 distinct CUST_ID values. Now I’ll add an index and perform a simple lookup to examine the cost. (Note: The statistics you see are not from an initial execution, but from subsequent executions to eliminate parsing and other costs)


SQL> create index ix on t ( cust_id );

Index created.

SQL>
SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1502  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

So our control figure is approximately 1500 logical I/O’s.

In the previous blog posts on this topic, I then added attribute clustering definitions and perform an ‘alter table move’ to reorganize all of the data into a nice clustered arragement. This time, I’ll repeat the above experiment in a different manner. The sequence of steps will be:

  1. load the data,
  2. add the clustering attributes,
  3. reorganize the data,
  4. add an index,
  5. observe the expected benefits of clustering,
  6. add more data, which we might expect could not be clustered because the table is already clustered now,
  7. perform more index lookups and see what the cost is.

If clustering is only useful for an initial data load, then we’d expect to see a significant degradation when the final index lookups are performed in (7). First of all, just to reinforce the coolness of attribute clustering just one more time, here are steps (1) through (5). Remember – the control figure here is 1500 LIO’s.


SQL> create table t as
  2  select rownum id, rpad('x',100) data, mod(rownum,100) cust_id
  3  from dual
  4  connect by level <= 100000
  5  order by dbms_random.value;

Table created.

SQL>
SQL> create table t2 as select * from t;

Table created.

SQL>
SQL> alter table t ADD clustering by linear order(cust_id);

Table altered.

SQL> alter table t move;

Table altered.

SQL>
SQL> create index ix on t ( cust_id );

Index created.

SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         20  consistent gets
          2  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Ooooh yeahhhhh… 20 LIOs!!! But now the real test comes. You can see I also created a table called T2, being a clone of T in its randomised data state. Now I am going to append the contents on T2 to my (clustered) T table.


SQL> insert /*+ APPEND */ into t select * from t2;

100000 rows created.

SQL> commit;

Commit complete.

One thing that does look promising is notice the “SORT ORDER BY” step in the execution plan. That suggests that even though this is appending new data to an existing set of data, some clustering might be taking place.


SQL> explain plan for insert /*+ APPEND */ into t select * from t2;

Explained.

SQL> select * from table(dbms_xplan.display());

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------
Plan hash value: 528765404

-------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |      |   100K|    10M|       |  2905   (1)| 00:00:01 |
|   1 |  LOAD AS SELECT                  | T    |       |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |      |   100K|    10M|       |  2905   (1)| 00:00:01 |
|   3 |    SORT ORDER BY                 |      |   100K|    10M|    11M|  2905   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL            | T2   |   100K|    10M|       |   442   (1)| 00:00:01 |
-------------------------------------------------------------------------------------------------

11 rows selected.

But the real test comes in the index lookup. Let’s repeat that.


SQL>
SQL> set autotrace on stat
SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
----------------------------------------------------------------------------------------------------
x

1 row selected.


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         43  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

And we’re still awesomely good. The amount of data for CUST_ID = 17 has doubled, and the LIOs approximately doubled as well. So we are still seeing benefits of clustering even though we are appending to a table with previously clustered data.

Just how close to optimal is this? I’ll perform an ‘alter table move’ against the entire table, and re-measure:


SQL> alter table t move online;

Table altered.

SQL> select /*+ index(t (cust_id)) */ max(data)
  2  from t
  3  where cust_id = 17;

MAX(DATA)
---------------------------------------------------------------------------
x

1 row selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         37  consistent gets
          0  physical reads
          0  redo size
        641  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Hence optimal clustering would have yielded 37 LIOs for this data, so 43 is very close to that, and a huge step up in efficiency over the 1500 LIOs from the base experiment. In effect, we have “clumps” of clustered data in the table now – so larger, low frequency dataset loads will give better results than smaller, higher frequency loads. But still…very cool stuff that you should be taking advantage of in your databases.

You need scaling huh? Maybe it’s just ego

I’ve just come back from OracleCode Singapore.  It was a great event – the venue was awesome and the attendees were engaged and interested in the content. But there was one thing that I found amusing (disturbing perhaps?) is the number of times I had people approach me on the topic of scaling.  Conversation would typically run along the lines of:

“What is your recommendation for scaling?”

which almost suggests that scaling is of itself, the end solution here.  Not “Here is function X, and I need it to scale”, or “My business requirement is X, and it needs to scale” but just “I need to scale”

So I’d push back and ask more questions:

  • Scale what?
  • What data are you capturing?
  • Where is it coming from?
  • What speed? What volume?
  • What are you plans with the data you are capturing?
  • How do you intend to process the data?
  • Is it transient? Are you planning on storing it forever? Is it sensitive information?

And the scary thing is – more often than not, those were questions for which they did not have answers to (yet?). I’d ask the questions, and very quickly the conversation would be returned to:

“Do I need sharding?”
”Should I use a NoSQL solution?”
“What ‘aaS’ option should I be using to achieve my scaling needs”
”How many nodes do I need?
”What server configuration is best?”

I’m seeing this more and more – that the technological approach to achieve a business requirement is seen AS the business requirement. I hate to be brutal (well…that’s a lie, I like being brutal Smile) but here’s the thing – Stop being so damn focussed on scaling until you have an idea of what your true performance requirements are!

Don’t get me wrong – there are systems out there that need to be architected from the ground up that will have to deal with scaling challenges that have perhaps never been tackled before.  But read those last few words again: “never been tackled before”.  Do you know what that also means?  It means it applies to an intsy wintsy tiny percentage of IT systems.  If it wasn’t, then surprise surprise – those challenges have been tackled before.  Why does everyone in the IT industry think that the system they are about to build will need the same architectural treatment as those 0.00001% of systems in the world that truly do.

Because in almost all of my time in IT, for the other 99.999% of systems out there – the two critical solutions to scaling systems to meet (and well and truly exceed) the performance requirements to meet the business needs are pretty simple:

1) don’t write crappy code,

2) don’t store data in a crappy way

That’s it.  When you can definitively demonstrate that

a) your code is well written,

b) your data is being stored in a means to best serve business requirements

and your application still cannot meet performance needs, then yes, it’s time to talk about architectural options for scaling. But more and more I see folks ignoring (a) and (b), or worse, just assuming that they are implicit and guaranteed to happen, and leaping straight into “I need a 10,000 node, geo-disperse, NoSQL, cached, compressed, mem-optimized, column-based, non-ACID, distributed blah blah blah” for my system to work.

Here’s a reality check – you don’t.  Save yourself a lot of hassles and start simple and focus on quality. You’ll find things will probably scale just fine.

If you’ve made it this far through the post and you think I’m just ranting…well, that’s true Smile but let me also answer the next obvious question:

“So how do we make sure we write good code? How do we make sure we store our data intelligently?”

That’s why we (developer advocates) are here. We’re here to help you succeed. So check out our resources, reach out to us via social media channels, and we’ll help you every step of the journey.

My Performance & Troubleshooting scripts (TPT) for Oracle are now in GitHub and open sourced

I have uploaded my TPT-oracle scripts to GitHub and have formally open sourced them under Apache 2.0 license as well. This allows companies to embed this software in their toolsets and processes & distribute them without a worry from legal departments.

The repository is here:

Now you can “git clone” this repository once and just “git pull” every now and then to see what updates & fixes I have made.

Also if you like my scripts, make sure you “Star” this repository in Github too – the more stars it gets, the more updates I will commit! ;-)

https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/github_... 300w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/github_... 768w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/github_... 50w, https://i2.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/github_... 1600w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

While “git clone” is a recommended method for getting your own workstation copy of the repository now, your servers might not have git installed (and no direct internet access), so you can still download a zipfile of everything in this repo too:

You can still directly access individual scripts too using links like the ones below. For example, if you want to run fish.sql to display an awesome SQL fish in sqlplus, you can download this:

https://i0.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/fish.pn... 289w, https://i0.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/fish.pn... 768w, https://i0.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/fish.pn... 48w, https://i0.wp.com/blog.tanelpoder.com/wp-content/uploads/2018/05/fish.pn... 1600w" sizes="(max-width: 800px) 100vw, 800px" data-recalc-dims="1" />

Or if you want to run something from a subdirectory, like ash/dashtop.sql for showing ASH top from the historical ASH data in DBA_HIST views, you can download this script from the ASH subdirectory:

Example output below:

SQL> @ash/dashtop sql_opname,event2 username='SYS' DATE'2018-04-19' DATE'2018-04-20'

    Total
  Seconds     AAS %This   SQL_OPNAME           EVENT2                                     FIRST_SEEN          LAST_SEEN
--------- ------- ------- -------------------- ------------------------------------------ ------------------- -------------------
     4930      .1   83%                        ON CPU                                     2018-04-19 18:00:04 2018-04-19 23:48:08
      430      .0    7%   SELECT               ON CPU                                     2018-04-19 18:01:04 2018-04-19 23:49:48
      290      .0    5%   SELECT               acknowledge over PGA limit                 2018-04-19 18:00:34 2018-04-19 23:23:50
       60      .0    1%   UPSERT               ON CPU                                     2018-04-19 18:00:04 2018-04-19 22:00:15
       50      .0    1%   UPSERT               acknowledge over PGA limit                 2018-04-19 18:00:04 2018-04-19 23:13:47
       30      .0    1%   CALL METHOD          ON CPU                                     2018-04-19 18:00:24 2018-04-19 21:03:19
       30      .0    1%                        control file sequential read               2018-04-19 18:56:42 2018-04-19 21:47:21
       30      .0    1%                        log file parallel write                    2018-04-19 21:03:19 2018-04-19 22:13:39
       20      .0    0%   CALL METHOD          acknowledge over PGA limit                 2018-04-19 18:00:24 2018-04-19 22:01:55
       20      .0    0%   DELETE               db file sequential read                    2018-04-19 20:46:54 2018-04-19 22:00:35
       20      .0    0%   SELECT               db file sequential read                    2018-04-19 22:01:05 2018-04-19 22:01:35
       10      .0    0%   INSERT               ON CPU                                     2018-04-19 19:50:28 2018-04-19 19:50:28
       10      .0    0%   INSERT               acknowledge over PGA limit                 2018-04-19 20:43:12 2018-04-19 20:43:12
       10      .0    0%   SELECT               db file scattered read                     2018-04-19 23:03:55 2018-04-19 23:03:55
       10      .0    0%                        LGWR any worker group                      2018-04-19 21:03:19 2018-04-19 21:03:19
       10      .0    0%                        control file parallel write                2018-04-19 21:05:59 2018-04-19 21:05:59

16 rows selected.

Now that I have this stuff in Github, I plan to update my scripts a bit more regularly – and you can follow the repository to get real time updates whenever I push something new.

As a next step I’ll convert my blog from WordPress to static hosting (Hugo) hopefully over this weekend, so you might see a few blog template/webserver glitches in the next few days.