Postgres unique constraint

I’ll start a series on Postgres vs. Oracle access paths because I know Oracle and I learn Postgres. While preparing it, I came upon some surprises because I’m so used to Oracle that I take some behavior as granted for any SQL databases. I recently posted a tweet about one of them, comparing latest Postgres version to earliest Oracle version I have on my laptop.
The goal of the tweet was exactly what I said above: show my surprise, using Oracle 7 as a reference because this is the version where I started to learn SQL. And there’s no judgment behind this surprise: I can’t compare a software I use for more than 20 years with one I’m just learning. I have a big admiration for the Oracle design and architecture choices. But I’ve also a big admiration for what the Postgres community is doing.

In my tweet I’ve updated a primary key. I think I’ve never designed in real life a primary key that has to be updated later. For each table we need a key that is immutable to identify rows for referential integrity constraints, or for replication. The value must be known from the first insert (which means the columns are declared not null) and the value is never updated. It makes sense to use a primary key for that as it is unique and not null.

Actually, a better case would be a simple unique constraint where we just exchange two rows. A real-life example is a list of items, having probably a surrogate key as the primary key, and a unique key including an item number. When the user wants to move up one item, we just run an update on two rows, exchanging their numbers. The unique constraint just ensures that we have only distinct values so that a select … order by will always return the values in the same order.

All similar cases have the same consequence: when you process row by row the update, the uniqueness may be violated. But at the end of the statement, the constraint is still valid.

Here is the initial example with updating all rows:


create table demo as select generate_series n from generate_series(1,2);
SELECT 2
alter table demo add constraint demo_pk primary key(n);
ALTER TABLE
select * from demo;
n
---
1
2
(2 rows)
 
begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)

This works. I’ve inserted the rows in ascending order of n. Decreasing the value doesn’t violate the uniqueness at any time because it reads rows from the beginning to the end.

However, when we increase the value, we have a duplicate value until we process the next row. And by default, Postgres fails:

update demo set n=n+1;
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.

The bad thing is that the behavior of the application depends on the physical order of the rows and the order where they are processed. This violates the Codd rule about physical independence. In addition to that, the SQL statements should behave as processing the set of rows rather than low-level row-by-row processing.

But there is also a very good thing: because the constraint is validated row by row, you know which value violates the constraint (here: “DETAIL: Key (n)=(1) already exists” ).

So my statement failed and this in Postgres seems to fail the whole transaction:

commit;
ROLLBACK

My second surprise is that the failure of one statement cancels the whole transaction. I see no error at commit, but it simply tells me that it has done a rollback instead of the commit.

deferrable

So, I compared with Oracle where this statement is always successful, because temporary violations that are resolved later, within the same statement, do not violate the constraint. I compared it with the oldest version I have on my laptop (Oracle 7.3) to show that it is something I’ve never seen as a new feature because I started with Oracle 7. And this kind of thing is the reason why I like SQL. Doing the same with a procedural language requires an intermediate update to be sure that there is no duplicate at any time.

The Postgres community is very responsive, especially when we may think that something works better in Oracle than Postgres (which was not the case here and which was not the goal of my tweet anyway – but tweets are short and may not express the tone properly).

Quickly a solutions were proposed: deferred constraint (example in this blog post).

I know deferred constraints in Oracle. They are similar in Postgres and here is the solution proposed:


alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially deferred;
ALTER TABLE
begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)
 
update demo set n=n+1;
UPDATE 2

That seems good. Because the constraint validation is deferred, the update is successful.

However, this is not what I want. I want the previous statement to succeed, but I want the following statement to fail:

insert into demo values(1);
INSERT 0 1

Because constraint is deferred, this statement is successful and it is only at commit that it fails:

commit;
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.

Why do I think this is not the good solution? First, because I want the statement to fail as soon as possible. And in addition to that, I want the commit to be fast. Doing expensive things at commit should be avoided, if possible. It is the point where all work is supposed to be done and you just want to save it (make it durable and visible to others).

deferrable initially immediate

Actually, the solution is to declare the constraint as deferrable, but not deferred.

alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially immediate;
ALTER TABLE

This says that it is deferrable, but not deferred (except if you decide to set the constraint deferred for your transaction). That way it accepts temporary constraint violation if they are resolved at the end of the statement.

Now, my update statement is sucessful:

begin transaction;
BEGIN
update demo set n=n-1;
UPDATE 2
select * from demo;
n
---
0
1
(2 rows)
update demo set n=n+1;
UPDATE 2

Any other statement that violates the constraint fails immediately:

insert into demo values(1);
ERROR: duplicate key value violates unique constraint "demo_pk"
DETAIL: Key (n)=(1) already exists.
commit;
ROLLBACK

Documentation

The nice thing is that this is documented! I didn’t find it immediately because it is in the ‘Compatibility’ part of the ‘create table’ documentation. I’m not yet used to the Postgres documentation. I stopped at the ‘DEFERRED’ definition which mentions: A constraint that is not deferrable will be checked immediately after every command

But later Compatibility adds something more specific to the unique constraint:

Non-deferred Uniqueness Constraints
When a UNIQUE or PRIMARY KEY constraint is not deferrable, PostgreSQL checks for uniqueness immediately whenever a row is inserted or modified. The SQL standard says that uniqueness should be enforced only at the end of the statement; this makes a difference when, for example, a single command updates multiple key values. To obtain standard-compliant behavior, declare the constraint as DEFERRABLE but not deferred (i.e., INITIALLY IMMEDIATE). Be aware that this can be significantly slower than immediate uniqueness checking.

That’s another good point. Postgres documentation is clear and gives the right solution. We just have to read it to the end.

A side note for my French speaking readers here to mention that the Postgres documentation has been translated into French by Guillaume Lelarge, who also translated Markus Winand book and website. Translation is as good as the original in both cases.

Performance

The documentation mentions ‘significantly slower’. Here is a test on 100000 rows with non deferable constraint:

create table demo as select generate_series n from generate_series(1,100000);
SELECT 100000
alter table demo add constraint demo_pk primary key(n);
ALTER TABLE
vacuum demo;
VACUUM
select * from pgstatindex('demo_pk');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
2 | 1 | 2260992 | 3 | 1 | 274 | 0 | 0 | 89.83 | 0
(1 row)

Here is the update n=n-1 where all rows are updated but none violates the constraint at any time:

explain (analyze,verbose,costs,buffers)update demo set n=n-1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=0.00..1693.00 rows=100000 width=10) (actual time=425.699..425.699 rows=0 loops=1)
Buffers: shared hit=578646 read=1202 dirtied=1267
-> Seq Scan on public.demo (cost=0.00..1693.00 rows=100000 width=10) (actual time=0.013..16.186 rows=100000 loops=1)
Output: (n - 1), ctid
Buffers: shared hit=443

This update has read 578646+1202=579848 buffers.

Now creating the deferrable constraint:

alter table demo drop constraint demo_pk;
ALTER TABLE
alter table demo add constraint demo_pk primary key(n) deferrable initially immediate;
ALTER TABLE
vacuum demo;
VACUUM
select * from pgstatindex('demo_pk');
version | tree_level | index_size | root_block_no | internal_pages | leaf_pages | empty_pages | deleted_pages | avg_leaf_density | leaf_fragmentation
---------+------------+------------+---------------+----------------+------------+-------------+---------------+------------------+--------------------
2 | 1 | 2260992 | 3 | 1 | 274 | 0 | 0 | 89.83 | 0

And do the n=n+1 update:

explain (analyze,verbose,costs,buffers)update demo set n=n+1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=0.00..2135.00 rows=100000 width=10) (actual time=481.868..481.868 rows=0 loops=1)
Buffers: shared hit=679405 read=760 dirtied=825
-> Seq Scan on public.demo (cost=0.00..2135.00 rows=100000 width=10) (actual time=0.268..16.329 rows=100000 loops=1)
Output: (n + 1), ctid
Buffers: shared hit=885
Planning time: 0.237 ms
Trigger PK_ConstraintTrigger_75314 for constraint demo_pk: time=174.976 calls=99999
Execution time: 663.799 ms

This read more buffers and we can see that an internal trigger (PK_ConstraintTrigger_75314) has been run to re-check the unique constraint at the end of the statement. But only 17% more here for this special case where all rows are updated.

However, a more realistic test case exchanging only two values is much cheaper:


explain (analyze,verbose,costs,buffers) update demo set n=case when n=2 then 2000 when n=2000 then 2 end where n in (2,2000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Update on public.demo (cost=8.85..16.60 rows=2 width=10) (actual time=0.079..0.079 rows=0 loops=1)
Buffers: shared hit=23
-> Bitmap Heap Scan on public.demo (cost=8.85..16.60 rows=2 width=10) (actual time=0.016..0.055 rows=2 loops=1)
Output: CASE WHEN (n = 2) THEN 2000 WHEN (n = 2000) THEN 2 ELSE NULL::integer END, ctid
Recheck Cond: (demo.n = ANY ('{2,2000}'::integer[]))
Heap Blocks: exact=3
Buffers: shared hit=9
-> Bitmap Index Scan on demo_pk (cost=0.00..8.85 rows=2 width=0) (actual time=0.009..0.009 rows=4 loops=1)
Index Cond: (demo.n = ANY ('{2,2000}'::integer[]))
Buffers: shared hit=6
Planning time: 0.137 ms
Trigger PK_ConstraintTrigger_75322 for constraint demo_pk: time=0.005 calls=1
Execution time: 0.120 ms

In my opinion, the overhead here is totally acceptable, especially given the fact that this re-check displays exactly which value violates the constraint in case there is a duplicate.

But I’m going too fast here. I’ve not even started my blog series about access paths where I’ll explain the cost of the execution plans, starting from the most simple: Seq Scan. Follow my blog or twitter to get informed. There will be nothing about ‘which is better, Oracle or Postgres?’. But I’m convinced that knowing the difference helps to understand how it works, and to design an application that has the correct behavior if ported from one to the other.

 

Cet article Postgres unique constraint est apparu en premier sur Blog dbi services.