Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Update to RANDOM_NINJA. Enables you to create even more random production like test data using pl/sql.

Had a chance to finish the data generators that I had on my list for release 1.4 of
RANDOM_NINJA
. New data domain additions include random science data, random game data, investment data and medical data.

Read below for a full list of the current functionality. More will be added for release 1.5.

Oracle Database 18c

Yup…it’s arrived!

New name obviously, because we’ve jumped to our new naming model to align with the calendar year as opposed to version number.  You might be thinking “So what?” but it’s a significant change in the way we getting software to customer.  Mike Dietrich blogged about this at length here so I won’t repeat what has been said, but in TL;DR form:

More frequent releases, with smaller amounts of change per release

In this way, the approach lets us focus more of solidifying existing features, and being able to quickly respond to bugs that arise.

So 18c is more an incremental release on 12.2 (in fact, internally it was referred to as “12.2.0.2” for most of it’s build cycle) focussed on stability and hardening of existing features.

Don’t worry, we still managed to pack some cool new things in there – which you can read about in the New Features guide.

I’ll be doing some blog posts and videos on the 18c content soon, but here’s something to whet your appetite Smile

SQL Server Operations Studio

Since I reformatted my Surface Pro 4 so I could have my Docker running again, I also needed to recover my local SQL Server instance I like to have available on my PC.

This was a perfect time to get everything up and running, but instead of my standard way of doing this, use SQL Server Operations Studio, (SSOS).

The Lesson

Most SQL Server DBAs have a GUI took to assist in managing their database environment and unlike Oracle DBAs, (at least before multi-tenant and the cloud) it was common for SQL Server DBAs, (on average) to have considerably more databases to manage on average than Oracle DBAs have.  Some of this is due to the architectural differences, such as a single SQL Server having many user databases vs. Oracle having one database instance with many schemas inside a database.  If you visualize this difference, you realize that the maintenance and backups of each user database would be higher than the single database instance for the Oracle DBA.  These types of differences required the SQL Server DBA to rely on automation and tools to ease their day-to-day demands.

From Microsoft, the SQL Server Management Studio, (aka SSMS and who’s ancestor was called Enterprise Manager) is provided with the installation or as an add on in recent versions.  Numerous vendors have provided incredible products to monitor, manage and support 100’s to 1000’s of databases, including Idera, who I proudly serve as a 2018 ACE.

There are still presentations offered at SQL Saturdays offering tips and tricks with SSMS, even though it’s a 32-bit product, which tells you how important a management tool is to a DBAs sanity, (remember, I made my name on the Oracle side with their Enterprise Manager, so Oracle DBAs may be hesitant to admit it, but they depend on one almost as often!)  With the fact that it is a 32-bit tool and the importance of the tool, Microsoft launched SQL Server Ops Studio.  It reminds me of the love child between SSMS and the original Oracle SQL Developer, which isn’t a bad thing.  Consider what Oracle SQL Developer is today, so that’s where I’m going here.

Not Your Father’s SSMS

The first thing you notice is that you are now working with a modern application that is 64-bit and built for extensibility.  No, it doesn’t have the legacy features that many DBAs will require to get them to transition over immediately, but you can build metrics and widgets to do much of it right now.  It also supports today’s hybrid-  both Azure and On-premises environments, which is something SSMS just isn’t prepared for.

I’ve already reached out to the SSOS group on Twitter to find out how I can offer my feedback to help.  I was part of the Customer Advisory Board for Enterprise Manager before I joined Oracle on the EM team, so I’m aware that this experience could be exceptionally helpful.  It is crucial to have customer feedback to build a product to fulfill what an cloud management and infrastructure tool features.

So, if you haven’t downloaded it and tested it out, consider it.  I’ll be blogging a few posts here and there to offer some insight on my own experiences with the product going forward.



Tags:  , , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBA Kevlar [SQL Server Operations Studio], All Right Reserved. 2018.

The post SQL Server Operations Studio appeared first on DBA Kevlar.

Will I Be The Next President Of The UK Oracle User Group?

I’ve decided to put myself forward to be President Elect of the UK Oracle User Group (UKOUG). The position of President Elect is, in effect, President-in-waiting. You shadow the current president before taking over the role when their term comes to an end. In this case, that will be in a year.

https://mwidlake.files.wordpress.com/2018/02/screenhunter_298-feb-16-11-... 136w, https://mwidlake.files.wordpress.com/2018/02/screenhunter_298-feb-16-11-... 280w" sizes="(max-width: 272px) 100vw, 272px" />

I think this is a very sensible manner in which to introduce a new person into the role of President. The UKOUG is one of the largest Oracle user groups in the world. It is in effect a small company with permanent staff and a large number of interested parties, the members. About 1000 companies have at least one membership with the UKOUG, some hold several (as each membership comes with conference passes). The position of President comes with 3 main duties:

  • Representing all members of the users group – end users, partners, sponsors. There are two other positions on the board of Member Advocate, so the president is one of three (out of a total of 6) representing the membership.
  • Being the ambassador for the UKOUG. This is partly being the “friendly public face” of the organisation but, as President, you represent the UKOUG to other user groups, Oracle Corporation and the press.
  • To ensure that the UKOUG meets it’s requirements as a company and has the correct governance in place. For the UKOUG a lot of the governance is about ensuring the board is selected or appointed correctly, legal requirements are met, and that the user group is run in an open and fair manner.

Why would I want to take this on? It is not a paid position, it is voluntary.

(I should maybe be a little clearer here on pay – voted positions on the board, i.e. member advocate and president, are not salaried. But expenses are paid and there is provision for some payment for specific project work, or if the demands of a role exceeds a number of hours in a given month. But you would be unable to live on it, no matter how frugal you are!)

Well, as many of you know, I’ve been an active volunteer for the UKOUG for a long time, it’s actually over 10 years. I present at nearly every annual conference, at a couple of the Special Interest Groups (SIGs) each year and I’ve chaired or deputy chaired SIGs since 2009. I don’t just do the “standing up and being noticed” stuff, I help out with the organisational work. I was in charge of the Database content at Tech14 & Tech15 and all the content of Tech16. I’ve sat on strategy committees, reviewed submissions, analysed speaker scores… I’m currently editor of the UKOUG magazine, Oracle Scene. I know some people think of me as “that guy from the UKOUG”. Maybe being President would be less work!

https://mwidlake.files.wordpress.com/2018/02/screenhunter_299-feb-16-11-... 150w" sizes="(max-width: 278px) 100vw, 278px" />

When the UKOUG announced that the position of President Elect was open, it seemed natural to try and take that final step up the Volunteer ladder to become a member of the board.

When it comes down to it, I love being in the Oracle community. I’ve made so many friends across the globe through not just the UKOUG but by going to the conferences & meetings of other national Oracle User Groups. I have learnt so much from user groups, not just from lectures but directly from the people I meet. The majority of people who get involved in user groups are not only intelligent and wanting to learn, they are also willing to share and teach.

Another part of my wanting to be the President (eventually) is that I don’t think the UKOUG is perfect. The organisation does evolve and change as the technology and market shifts. But I’d like to try and shake things up a bit and slightly alter where it’s focus currently is. I won’t say any more on that for now.

There are also big changes for some Oracle customer, namely Cloud, Chatbots, AI and the fact that hardware is shifting. Solid State storage and Oracle’s own in-memory tech is making some things possible that were impossible with the old physical storage and row-based processing. But soon we will have storage that is an order of magnitude faster than current SSD, almost as fast as main memory.

Oddly enough, one problem I see a lot is that there is too much focus on some of those new areas. Many people are still running systems where cloud and SSD are not part of their world. Yes, they would probably all like to move forward but if the systems they have can’t move on, they still need to get the most out of them now. User groups are not just for those chasing the latest-greatest, they are just as much for those who need help keeping the wheels on. I think the user group needs to reach slightly back before we can help them forward.

Many of you won’t be able to vote for me as only members of the UKOUG can vote. But if you can, I’d appreciate your vote. And I will need those votes.

https://mwidlake.files.wordpress.com/2018/02/screenhunter_295-feb-15-14-... 478w, https://mwidlake.files.wordpress.com/2018/02/screenhunter_295-feb-15-14-... 120w" sizes="(max-width: 239px) 100vw, 239px" />

There is one slight oddity. I am the only person standing for the position of President Elect (the position of Member Advocate is also open and being voted for at the moment, for which there are three candidates). However, there is still a vote, I will not take the position uncontested. The vote is a yes/no/abstain one, so you can either support my bid to be the President Elect or voice your opposition. There are issues with yes/no votes but over all the UKOUG board felt that as the user group is run on democratic principles, the members should be able to have their say over if they feel I am suitable to eventually become their President or not. If the number of votes are low, it edges things in the favour of “no” so I still need to campaign.

(If you can vote, you can do so Here)

As for the contest for the position of Member Advocate, I’ve voted for Neil Chandler. I know Neil well and he is just as passionate about the UKOUG as I am and I know he will work hard to keep it moving forward and improving.

Let’s see what happens come the conclusion of voting in March.

Docker and Windows 10 1709 Patch, The End

So if you read my last post on my challenges with Docker containers running on Windows 10 after the 1709 patch, I thought I was close to solving it once I was granted the admin password to disable and uninstall Sophos Endpoint.

Once I uninstalled Sophos, I noted I ended up with a different line number error.  The question was, has the problem shifted from Sophos to a new issue or is it more complex than just the one application?  We’d already seen that it was complicated, including the Hyper-V with the 1709 patch that was part of the problem.

As I researched the network issue farther, noticed that, although I’d uninstalled Docker and Sophos, there were considerable files leftover and registry entries that had roots throughout the system.  I also became aware of how many changes were cross referencing and sharing the same DLLs.

I’m an expert of scouring Oracle from a Windows server, but there’s nothing like a reminder of the ease of management with Linux and power of an “rm -rf” command vs. the challenges of shared files and registry entries.

In the end, it became apparent that I could be shooting myself in the foot trying to find the trip wire that caused the complex problem that I was experiencing, (along with lacking logging from Docker) as well as the time it was taking to get me back up and running my containers.

At 4pm last night, I decided to backup my files to my external SSD and then reformat my Microsoft Surface Pro 4.  All my licensing is stored for my PC in my account when I simply run a erase and reload the same machine.  I chose to erase all of my data and apps.

Once I reloaded, I chose the following:

  1.  Locked the Windows updates before the 1709 patch and I’m not letting any patches be applied until a new patch comes out that fixes this.  Trust me, after all this research, I know where to look… <br />
</li></ol></div>

    	  	<div class=

AskTOM Office Hours for DBA’s

We had the first AskTOM Office Hours Q&A for Database Administrators yesterday.  Thanks to everyone that showed up, and thanks for the questions.

If you missed it, you can catch a replay here

Pete Finnigan Presented About Oracle Database Vault and Oracle Security

I have not added much here on my site for some time due to a serious health issue taking a lot of my time with a close family member. So please bear with me if you email or contact me....[Read More]

Posted by Pete On 15/02/18 At 08:44 PM

(DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO)

Do you have complex connection strings with DESCRIPTION_LIST, DESCRIPTION, ADDRESS_LIST, ADDRESS and a nice combination of FAILOVER and LOAD_BALANCE? You probably checked the documentation, telling you that FAILOVER=YES is the default at all levels, but LOAD_BALANCE=YES is the default only for DESCRIPTION_LIST. But when disaster recovery and availability is concerned, the documentation is not sufficient. I want to test it. And here is how I do it.

I don’t want to test it with the real configuration and stop the different instances. And I don’t need to. My way to test an address list is to define a tnsnames.ora with the connection string, such as the following:

NET_SERVICE_NAME=
NET_SERVICE_NAME=
(DESCRIPTION_LIST=
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

I used localhost because I know it’s there and I don’t want to wait for the TCP timeout. But I use fake ports, which do not exist. So finally, a connection will never be established but I will be able to see all that are tried. I check them with strace on the connect() system call, with the following script:


for i in {1..10}
do
TNS_ADMIN=/tmp strace -T -e trace=connect sqlplus -s -L sys/oracle@NET_SERVICE_NAME as sysdba <<< "" 2>&1 | awk '
/sa_family=AF_INET, sin_port=htons/{
gsub(/[()]/," ") ; printf "%s ",$5
}
END{
print ""
}
'
done | sort | uniq

So, I used meaningful numbers for my fake ports: 101 and 102 for the addresses in the first description of the description list, and 201 and 202 for the address list in the second description. The awk script shows the sequence that was tried. And, because of the random round robin, I run them in a loop several times to see all patterns, aggregated by sort|uniq

So here is the result from the connection string above using the defaults for load balancing and failover:

101 102 201 202
201 202 101 102

The sequence within the address list is always in order (101,102 and 201,202) because LOAD_BALANCE=NO is the default there. But I have two combinations for the descriptions because LOAD_BALANCE=YES is the default in DESCRIPTION_LIST. Finally, all adresses are tried because FAILOVER=YES is the default at all levels.

LOAD_BALANCE

If I define LOAD_BALANCE at all levels, such as:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(DESCRIPTION=(FAILOVER=YES)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(FAILOVER=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

The result shows that all combinations can be tried in any order:

101 102 201 202
101 102 202 201
102 101 201 202
102 101 202 201
201 202 101 102
201 202 102 101
202 201 101 102
202 201 102 101

By running it in a large loop you will confirm that any address will be tried at most once.

FAILOVER

Now, If I set FAILOVER=NO within the first description:

NET_SERVICE_NAME=
(DESCRIPTION_LIST=(FAILOVER=YES)(LOAD_BALANCE= NO)
(DESCRIPTION=(FAILOVER= NO)(LOAD_BALANCE=YES)
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=101))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=102))
)
(DESCRIPTION=(LOAD_BALANCE=NO )
(CONNECT_DATA=(SERVICE_NAME=pdb1))
(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=YES)
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=201))
(ADDRESS=(PROTOCOL=TCP)(HOST=localhost)(PORT=202))
)
)
)

the first attempt can be 101 or 102 (because of LOAD_BALANCING) but only one will be tried in this address list, because of no failover. Then, the second description is attempted (because FAILOVER=YES at description list level) and with all addresses there (because of LOAD_BALANCING=YES). The result of all possible combinations is:


101 201 202
102 201 202
102 202 201

So here it is. You can test any complex connection description to check what will be the possible connections and in which order they will be tried. From this, you can infer what will happen with a real configuration: the wait for TCP timeout for addresses tested on hosts that are not up, and the load balancing given be the different possible combinations.

 

Cet article (DESCRIPTION_LIST=(DESCRIPTION=(ADDRESS_LIST=(FAILOVER=YES)(LOAD_BALANCE=NO) est apparu en premier sur Blog dbi services.

Project RICO2 and the history of APEX upgrade that went terribly wrong.

In my last blog post I explained a XOR alghorithm that is used to count Oracle database block checksum.

I also wrote, that sometimes you are facing problems, that are unresolvable without a low-level knowledge. This is the story of this kind of situation. The story of misread documentation. The story of haste and hex.

About a year ago, I got a call from one company that did an APEX upgrade for one of their customers. Very big customer. Quite a big database and a very important one, working 24/7.

They told me that they upgraded APEX on a database with one PDB and a week later they tried to apply some patches and restarted the database. After the restart they got the following error:

SQL> alter pluggable database kowalsky open;
alter pluggable database kowalsky open
*
ERROR at line 1:
ORA-00604: error occurred at recursive SQL level 1
ORA-01405: fetched column value is NULL

Oracle Support told them that they messed up APEX upgrade and that they should restore a database from before the upgrade and perform it correctly. They were right. But the customer was not happy to find out that they are about to lose a week of production data.

And this is where a true fun begins!!!

To investigate which query is the problem, you can use the following syntax:

SQL> alter session set events '1405 trace name errorstack level 1';

Session altered.

The trace file showed that this query was a problematic one:

----- Error Stack Dump -----
ORA-01405: fetched column value is NULL
----- Current SQL Statement for this session (sql_id=) -----
select version from registry$ where status not in (9, 99) and namespace='SERVER' and cid='APEX'

After tracing the session with event 10046 on level 12 I found the execution plan for this query:

STAT #140575638426528 id=1 cnt=1 pid=0 pos=1 obj=1508 op='TABLE ACCESS BY INDEX ROWID REGISTRY$ (cr=2 pr=0 pw=0 time=10 us cost=1 size=25 card=1)'
STAT #140575638426528 id=2 cnt=1 pid=1 pos=1 obj=1509 op='INDEX UNIQUE SCAN REGISTRY_PK (cr=1 pr=0 pw=0 time=6 us)'

Here we can see, that column "version" should contain some value for ‚APEX’ but it is null. We could try to put this value in a datafile but it is not easy to extend a row manually. The other thing we could do is to locate blocks of index REGISTRY_PK and make sure that the query will not return any rows (like change value APEX to APEK or something like that).

Or just try to change the status of this row to value 9 or 99.

Back then I used BBED to do it, but there were other cases when I had to do it with a bunch of scripts and a hex editor because I couldn’t compile BBED. So I decided to write my own copy of BBED to have something useful when needed </p />
</p></div>

    	  	<div class=

Full page logging in Postgres and Oracle

In my opinion, the volume of logging (aka redo log, aka xlog, aka WAL) is the most important factor for OLTP performance, availability and scalability, for several reasons:

  • This is the only structure where disk latency is a mandatory component of response time
  • This is a big part of the total volume of backups
  • This is sequential by nature, and very difficult to scale by parallelizing

In this post, I look at the volume of logging generated by some DML in Postgres and Oracle. I know Oracle quite well and just start to look at Postgres. The comparison here is not a contest but a way to better understand. For example, the default behavior of Postgres, with full_page_writes=on, is very similar to Oracle ‘begin backup’ mode. The comparison makes no sense for most of Postgres DBAs, but probably helps Oracle DBAs to understand it.

Measure WAL segment writes

Here is how I measured the volume of transaction log written: start the Postgres server with ‘strace -f’ and parse with ‘awk’ the open(), write() and close() calls:

sudo su postgres <<'END'
export PGDATA=/u01/pgdata
/usr/pgsql-10/bin/pg_ctl stop
strace -e trace=open,close,write,recvfrom -f /usr/pgsql-10/bin/pg_ctl start 2>&1 | awk '
/^[^[]/{
$0="[pid MAIN] "$0
}
/strace: Process [0-9][0-9]* attached/{
sub(/^.*strace: /,"strace: ") ; "ps -o cmd -hp " $3 |& getline proc[$3"]"] ; print "" ; print $0,proc[$3"]"]
}
/open[(].*pg_wal[/].* = [0-9]*$/{
z=$0 ; gsub(qq," ") ; fd_wal[$2 $NF]=$4
}
/checkpoint;/{
total_written_wal=0
}
/write[(]/{
#pid=$2 ; sub("]","",$2) ; "ps -o cmd -hp " p |& getline proc[p"]"]
z=$0 ; gsub("[(,]"," ") ; if ( fd_wal[$2 $4]>0 ) { written_wal[$2 $4]=written_wal[$2 $4]+$NF ; total_written_wal=total_written_wal+$NF } next
}
/close[(]/{
pid=$2 ; sub("[^0-9]","",pid) ;
z=$0 ; gsub("[()]"," ") ; if ( ( fd_wal[$2 $4]!="" ) && ( written_wal[$2 $4] > 0 ) ) {
printf " ( written %d bytes to %s -> total WAL segments: %.2f MB ) cmd=%s\n",written_wal[$2 $4],fd_wal[$2 $4],total_written_wal/1024/1024 , proc[$2] ; fd_wal[$2 $4]=""
} next
}
' qq='"'
END

Do not do that in production. This is experimentation in a lab. Do not attach strace to a critical process in production.

There’s probably an easier way to get the same information, maybe with postgres activity statistics, or through a size counting archive_command, so please don’t hesitate to comment. Anyway, from the ‘write()’ calls I am sure that I’m counting exactly what I want: the volume of logging written to disk. As an Oracle DBA used to LogWriter and its slave threads managing all writes, I started to trace only the WAL writer process but quickly realized that part ot the logging is directly written by my server process.

Postgres: insert

I create a table with some numbers and a 100 bytes character string.

create table demo as select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(0,0);

The first operation I test is the insert of 1 million rows.

insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
( written 4349952 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 4.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 8192 bytes to pg_wal/000000010000000A0000005F -> total WAL segments: 9.00 MB ) cmd=postgres: wal writer process
( written 17735680 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 20.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3309568 bytes to pg_wal/000000010000000A00000060 -> total WAL segments: 31.40 MB ) cmd=postgres: wal writer process
( written 33783808 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 36.03 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 3997696 bytes to pg_wal/000000010000000A00000061 -> total WAL segments: 39.80 MB ) cmd=postgres: wal writer process
( written 49676288 bytes to pg_wal/000000010000000A00000062 -> total WAL segments: 51.19 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 65273856 bytes to pg_wal/000000010000000A00000063 -> total WAL segments: 66.06 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 79364096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.04 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 6660096 bytes to pg_wal/000000010000000A00000064 -> total WAL segments: 82.39 MB ) cmd=postgres: wal writer process
( written 88285184 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 98.02 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 14491648 bytes to pg_wal/000000010000000A00000065 -> total WAL segments: 106.82 MB ) cmd=postgres: wal writer process
( written 101703680 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 113.99 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 17825792 bytes to pg_wal/000000010000000A00000066 -> total WAL segments: 117.19 MB ) cmd=postgres: wal writer process
( written 115769344 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 128.20 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 18661376 bytes to pg_wal/000000010000000A00000067 -> total WAL segments: 135.09 MB ) cmd=postgres: wal writer process
( written 19824640 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 144.17 MB ) cmd=postgres: wal writer process
( written 131350528 bytes to pg_wal/000000010000000A00000068 -> total WAL segments: 148.16 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 27435008 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: wal writer process
( written 140132352 bytes to pg_wal/000000010000000A00000069 -> total WAL segments: 159.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
INSERT 0 1000000

You can see that my ‘strace|awk’ script is running in the background and has counted about 160 MB of logging, partially from the ‘postgres: wal writer process’ and partly from ‘postgres: demo demo 192.168.56.122(38013)’ serving my connection.

The relation size as stored on disk is about 150 MB;

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 19231 | 1e+06 | 0 | 150
(1 row)

This makes sense. An insert has to write all new data into the log in order to be able to recover the pages until they are checkpointed.

Note that I have no index on this table for this test.

Postgres: update

I’m now updating one column for all rows.

update demo set b=b+1;
( written 150528000 bytes to pg_wal/000000010000000A0000006A -> total WAL segments: 4.01 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 162693120 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 17.84 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 29769728 bytes to pg_wal/000000010000000A0000006B -> total WAL segments: 28.44 MB ) cmd=postgres: wal writer process
...
( written 84287488 bytes to pg_wal/000000010000000A00000081 -> total WAL segments: 343.65 MB ) cmd=postgres: wal writer process
( written 453705728 bytes to pg_wal/000000010000000A00000082 -> total WAL segments: 347.36 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
UPDATE 1000001

I touched only a small part of the volume in bytes, but I touched all rows and all pages. An, even if only a few bytes are modified, Postgres logs the whole page to protect from fractured blocks in case of crash (pages partially written). So that’s about 150 MB. But postgres do not update rows in-place. The whole row is inserted in its new version, which means the whole volume again, which is another 150 MB. If we look at the size of the table, we can see 300MB of pages:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 38462 | 1.21882e+06 | 0 | 300
(1 row)

So this update has generated even more logging: 347 MB.

Postgres: sparse update

Now updating only 1 row out of ten, still one column only:

update demo set b=b+1 where mod(a,10)=1;
( written 89923584 bytes to pg_wal/000000010000000A00000083 -> total WAL segments: 13.88 MB ) cmd=postgres: wal writer process
( written 469123072 bytes to pg_wal/000000010000000A00000084 -> total WAL segments: 22.98 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 563576832 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.07 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 130940928 bytes to pg_wal/000000010000000A0000008D -> total WAL segments: 151.27 MB ) cmd=postgres: wal writer process
UPDATE 100000
analyze demo;

So, 10% of the rows had to be copied to their new version, which brings the table size to additional 15 MB.

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-------------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 1.07267e+06 | 0 | 315
(1 row)

For these additional 15 MB, half of the table pages had to be modified (the current version having to point to the new version), and the logging generated was 150 MB. Because of MVCC at tuple level, doing something similar to ‘chained rows’ and ‘row migration’ for all updates, and because of full page logging, even sparse updates generate a lot log writes.

Postgres: delete

Here is a delete of those million rows:

delete from demo;
( written 576364544 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.44 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
( written 134930432 bytes to pg_wal/000000010000000A0000008E -> total WAL segments: 6.73 MB ) cmd=postgres: wal writer process
( written 589225984 bytes to pg_wal/000000010000000A0000008F -> total WAL segments: 18.70 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
...
( written 162054144 bytes to pg_wal/000000010000000A00000099 -> total WAL segments: 184.70 MB ) cmd=postgres: wal writer process
( written 740352000 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 189.80 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
DELETE 1000001
( written 163217408 bytes to pg_wal/000000010000000A0000009A -> total WAL segments: 196.22 MB ) cmd=postgres: wal writer process

Marking tuples as deleted does not increase the table:

analyze demo;
ANALYZE
select relname, relnamespace, reltype, reloftype, relowner, relam, relfilenode, reltablespace, relpages, reltuples, relallvisible, relpages*8/1024 MB from pg_class where relname = 'demo';
relname | relnamespace | reltype | reloftype | relowner | relam | relfilenode | reltablespace | relpages | reltuples | relallvisible | mb
---------+--------------+---------+-----------+----------+-------+-------------+---------------+----------+-----------+---------------+-----
demo | 2200 | 25157 | 0 | 16385 | 0 | 25155 | 0 | 40385 | 275837 | 0 | 315
(1 row)

But all current tuples have to be marked as deleted and not visible once the transaction is committed. This touches all pages for the current version, which is more than 150 MB of logging here.

Postgres: vacuum

After two updates and a delete, I have old tuples in this table. It seems that VACUUM does not generate any logging:

vacuum demo;
( written 762445824 bytes to pg_wal/000000010000000A0000009B -> total WAL segments: 14.67 MB ) cmd=postgres: demo demo 192.168.56.122(38013)
VACUUM

My guess (but remember that I am a newbie in Postgres) is that in case of a crash occurring before the next checkpoint we will just have to vacuum again. But this is not what was answered in the postgres-general list a few years ago.

Note that full page logging is not necessary for all changes, but only for the first change after the page was read from disk after a checkpoint. This is sufficient to cover future writes failures because recovery will start from there. Once we have full page logged, change vector is sufficient for further recovery. However, I had the same amount of WAL, 15 MB, when vacuuming after a checkpoint.

Oracle: insert

Let’s do some similar things in Oracle, which MVCC implementation is completely different: at block level, with undo logging.

SQL> create table demo as select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('0 to 0');
Table created.

I have exposed in a previous post how I get the delta values from V$MYSTAT join V$STATNAME using (STATISTIC#) for ‘redo size’, so no need to strace here. But we can see the same result by measuring the writes to redo log groups (do not double count the multiplexed members).

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000001 rows inserted.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,342 155,218,876 4,380,448 104,411

This is about 150MB, which is the volume of the table:

SQL> exec dbms_stats.gather_table_stats(user,'DEMO');
PL/SQL procedure successfully completed.
SQL> select table_name,num_rows,blocks,(blocks*block_size/1024/1024) MB, avg_row_len from user_tables join dba_tablespaces using(tablespace_name) where table_name='DEMO';
 
TABLE_NAME NUM_ROWS BLOCKS MB AVG_ROW_LEN
------------------------------ ---------- ---------- ---------- -----------
DEMO 1000001 19280 150.625 131

Conclusion for inserts: all databases have to log the whole data inserted in order to be protected from instance crash. Note that Oracle has a way to insert directly into the file, bypassing the buffer cache, and then reduce the logging required for crash recovery. But I’m not doing bulk inserts here.

Oracle: update

The update in Oracle is done in-place. There is no need to copy the whole row (except in the rare cases where the row increases and do not fit into the block). However, the old value of the column must be copied for MVCC, into the UNDO segment. This is why we see 46 MB of ‘undo change vector size’ here.

SQL> update demo set b=b+1;
1000001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,777 105,301,308 48,641,772 82,221

The UNDO is only the change vector, not the full block. If you read about copies of full blocks to rollback segments, it is a confusion either from veterans of Oracle 5, or a misunderstanding of flashback features. The UNDO being stored in segments, written first into buffer cache, it is protected by redo logging, so about 46 MB of redo is actually the redo vector of undo vectors. The other 54 MB of redo is the new value of the update.

Oracle: sparse update

The logging of change vectors rather than full pages is even cheaper with sparse updates:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
56,583 15,414,328 6,111,608 36,921

The volume of undo and redo generated is only 15 MB here, including 6 MB of undo vectors. This is really optimized and this is one reason why you should update only the columns changed (and not use the default non-dynamic update of Hibernate for example).

Oracle: delete

The delete has to mark all rows as deleted and because the space can immediately be reused then whole row must be logged into the UNDO, and this has to be logged into the REDO, so the delete generates lot of logging:

SQL> delete from demo;
1000001 rows deleted.
SQL> commit;
Commit complete.
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,124,823 403,755,892 240,302,088 1,093,821

I have no indexes here. With indexes, all index entries have to be marked as deleted, and this generates undo and redo vector because MVCC in Oracle is at block level: each block modification – for table or index – have to be logged.

Deleting a lot of rows is an expensive operation in Oracle. For bulk purges, it is often better to truncate and insert /*+ append */ when possible (as in non-atomic materialized view refresh). Partitioning helps for that for example to purge old data when partitioned on date.

Postgres without full page logging

Given the huge overhead, is full page logging really required? There are plans to avoid it, mentioned in the Postgres ToDo wiki, or at least to keep it only short term for crash recovery and not media recovery. Another possibility is to implement a checksum on the blocks so that fractured blocks can be detected. Then, when detected, the fractured blocks may not need full page logging to recover them if we can restore a previous backup. This takes longer to recover, but can be acceptable given the low probability of this kind of failure. In addition to that, when you have a physical standby synchronized with log-shipping, you have a easy way to recover without having to restore files. But you need a checksum to detect the problem.

Without a checksum, the problem is the detection of partial writes. But if you trust your storage and if you failover to the standby in case of a crash, you may accept to set full_page_writes=off and this is what I did here.


insert into demo select generate_series a,generate_series b,generate_series c,generate_series d,generate_series e,generate_series f,lpad('x',100,'x') g from generate_series(1,1000000);
...
( written 125255680 bytes to pg_wal/000000010000000A000000E3 -> total WAL segments: 140.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
INSERT 0 1000000

The insert still have to log all new data: 140 MB.


update demo set b=b+1;
...
( written 72613888 bytes to pg_wal/000000010000000A000000F2 -> total WAL segments: 213.02 MB ) cmd=postgres: wal writer process
UPDATE 1000001

The update has to log only what is modified, but because of Postgres MVCC implementation, the whole row has to be written in its new version, and the old ones have their pointer updated: 210 MB here.


update demo set b=b+1 where mod(a,10)=1;
( written 305709056 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 1.96 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 72613888 bytes to pg_wal/000000010000000A000000F3 -> total WAL segments: 5.62 MB ) cmd=postgres: wal writer process
( written 75718656 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: wal writer process
( written 310665216 bytes to pg_wal/000000010000000A000000F4 -> total WAL segments: 9.65 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
UPDATE 100000

The sparse update benefits from logging only the changed rows: 10 MB here. This one is even smaller than with Oracle because there’s no UNDO to write here: the old values stay in-place.


delete from demo;
( written 323256320 bytes to pg_wal/000000010000000A000000F5 -> total WAL segments: 11.27 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 338829312 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 26.92 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 76562432 bytes to pg_wal/000000010000000A000000F6 -> total WAL segments: 31.41 MB ) cmd=postgres: wal writer process
( written 345415680 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 39.73 MB ) cmd=postgres: demo demo 192.168.56.122(38109)
( written 83410944 bytes to pg_wal/000000010000000A000000F7 -> total WAL segments: 40.41 MB ) cmd=postgres: wal writer process
DELETE 1000001

The delete is cheap when full_page_writes=off because there’s only the visibility is changed but data remains (until committed and vacuumed). If you have a lot of rows to delete, then consider to set full_page_writes=off and be sure to have a backup to restore in case of crash.

Oracle full page logging in backup mode

So, Oracle by default does not need to protect from fractured blocks, because they can be detected. If the storage crashes while a block is partially written, the block is corrupt. Thanks to the checksum, this corruption will be detected during recovery (or even earlier depending on DB_BLOCK_CHECKSUM and DB_LOST_WRITE_PROTECT). The redo is not sufficient, as it contains only change vectors, but you can recover from the last backup and Oracle can do a simple block recover. This recovery can also be done from the standby database.

However, full page logging exists in Oracle. When running backup from a non-Oracle tool, not aware of block checksum, you need to enclose the copy or snapshot between ‘begin backup’ and ‘end backup’. You do this because online backup may read partially updated blocks, and without the checksum, cannot detect it. A corrupt backup is not very useful and this is why this backup mode will generate more redo to be able to recover them. This is very similar to full page logging: the redo generated for the first modification of the buffer will store the whole block. Next modifications, until buffer is checkpointed, will need only the change vectors.

I think the first article I’ve ever written was a description of the Oracle backup mode. And it is still visible thanks to archive.org only because it was published on… Google Knol!

So, here is the same run with Oracle in backup mode.

Insert does not change a lot as it fills full blocks:

SQL> insert into demo select rownum a,rownum b,rownum c,rownum d,rownum e,rownum f,lpad('x',100,'x') g from xmltable('1 to 1000000');
1000000 rows created.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
141,376 156,527,072 4,380,448 124,195

Full update of one column generates same undo, but more than 2x redo because of full page logging:

SQL> update demo set b=b+1;
1000001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
170,778 238,317,632 48,641,772 104,640

Sparse update is exactly the same as full update because this 10% touches all pages:

SQL> update demo set b=b+1 where mod(a,10)=1;
100001 rows updated.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
319,622 240,502,284 17,832,196 192,815

Delete generates even more because there’s all the the UNDO in addition to all data pages:

SQL> delete from demo;
1000001 rows deleted.
 
SQL> commit;
Commit complete.
 
SQL> @ _mystat_diff.sql
 
db block changes redo size undo change vector size redo entries
---------------- ---------------- ----------------------- ----------------
2,125,285 558,510,928 240,303,768 1,143,131

So what?

Beyond the very different implementation of Postgres and Oracle, we can see that we have flexibility: the large logging generated by Postgres by default may be reduced in some cases, and the minimal logging which is the default for Oracle may be larger in some situations. The most important, as for all technologies, is to understand how it works. Only then you can do the right choice to balance between performance, availability, and cost. Understand how it works means: read the docs (how it is supposed to work) and test (how it actually works). With Oracle there’s additional information from a huge community testing and using it for decades. With Postgres, as with all Open Source projects, the source code with comments is an amazing documentation.

 

Cet article Full page logging in Postgres and Oracle est apparu en premier sur Blog dbi services.