Row Locks With Joins Can Produce Surprising Results in PostgreSQL

Here's a database riddle: you have two tables with data connected by a foreign key. The foreign key field is set as not null and the constraint is valid and enforced. You execute a query that joins these two tables and you get no results! How is that possible? We thought it wasn't possible, but a recent incident revealed an edge case we never thought about.

In this article I show how under some circumstances row locks with joins can produce surprising results, and suggest ways to prevent it.

Weird Join<br><small>Image by abstrakt design</small> — Weird Join
Image by abstrakt design

Table of Contents

The Problem
Possible Solutions
Prevention

The Problem

Imagine you work in the DMV and you are in charge of managing car ownership. You have two tables:

db=# CREATE TABLE owner (
    id int PRIMARY KEY,
    name text NOT NULL
);
CREATE TABLE

db=# CREATE TABLE car (
    id int PRIMARY KEY,
    owner_id int NOT NULL,
        CONSTRAINT car_owner_id_fk FOREIGN KEY (owner_id) REFERENCES owner(id)
);
CREATE TABLE

Add a car and some potential owners:

db=# INSERT INTO owner (id, name) VALUES
    (1, 'haki'),
    (2, 'jerry'),
    (3, 'george')
RETURNING *;
 id │ name
────┼───────
  1 │ haki
  2 │ jerry
  3 | george
(3 rows)

INSERT 0 2

db=# INSERT INTO car (id, owner_id) VALUES(1, 1) RETURNING *;
 id │ owner_id
────┼──────────
  1 │        1
(1 row)

INSERT 0 1

You have three owners - "haki", "jerry" and "george", and a single car with id 1 which is currently owned by "haki".

Changing Ownership

When a car is sold, both parties need to log in to some system and change the ownership of the car. However, before they can do that, the system needs to do some checks on the current owner. For example, the system might need to verify that there are no outstanding fines or court orders. The system might also want to make sure the current owner and the new owner are not the same person!

To start the ownership change process you start by fetching the car along with its current owner, and obtain a lock on the row:

db=# BEGIN;
BEGIN

db=*# SELECT *
FROM car JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1
FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 1
id            │ 1
name          │ haki

You obtain a lock to make sure that no other session is attempting to change the ownership of the car at the same time. You can now safely check whatever you need to check about the current owner of the car and decide if you can proceed with the transfer.

Row locks in PostgreSQL

Check out How to Get Foreign Keys Horribly Wrong for more about SELECT ... FOR UPDATE and the NO KEY option and Handling Concurrency Without Locks for more on managing concurrency in databases.

After you made sure the request is legitimate, to actually change the owner of the car from "haki" (1) to "jerry" (2) you update the owner_id of the car and commit the transaction:

db=*# UPDATE car SET owner_id = 2 WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2

db=*# COMMIT;
COMMIT

Great, so far so good!

Changing Ownership Concurrently

This is the process we have so far for changing ownership of a car:

Fetch the car along with its current owner and obtain a lock.
Do some checks on the owner.
Update the car ownership and commit.

An actual DMV application will process many requests like this. So let's go through the same process again, but this time with two concurrent sessions attempting to update the ownership of the same car at the same time:

-- First session
db=# BEGIN;
BEGIN

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2
id            │ 2
name          │ jerry

db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# COMMIT;
COMMIT

-- Second session
db=# BEGIN;
BEGIN;

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
(0 rows)

That's very strange! The first session locks the car row and updates the owner to "jerry". In the meantime, the second session attempts to lock the same car and is forced to wait - that's fine. However, when the first session commits and the lock is released, the second session gets nothing in return! We know that the car exists in the database. We know that both the new and the old owner exists in the database as well, so no matter what version of the row we get, the join should hold. Why aren't we getting results?

Replace the inner join with an outer join to see if that gives a clue about what's going on:

db=# BEGIN;
BEGIN

db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2
id            │ 2
name          │ jerry

db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# COMMIT;
COMMIT

db=# BEGIN;
BEGIN

db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3
id            │ ¤
name          │ ¤

We now get results, but they don't make a lot of sense. Like before, the first session locks the row, sets the car owner to 3 and commits. The second session waits, and when the lock is released we get the new version of the car row (owner is 3), but no data from the join to the owner table. We know that owner 3 exists in the owner table, so how come we get this weird partial results?

To truly understand why this is highly unexpected, consider the constraints we have on the car table:

car.owner_id has a not null constraint
car.owner_id has a foreign key constraint

Both of these constraints are valid and enforced by the database. This essentially means that under no circumstances do we expect the inner join between car and owner to return nothing, yet it does!

The Illusive Concurrency Issue

We run our application with the default isolation level Read Committed. The documentation on the Read Committed isolation level explains:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time.

Under Read Committed a statement can see changes committed by other transactions, and uncommitted changes executed previously in the same transaction. But what happens when another transaction changes a row we are attempting to lock?

However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress).

This makes perfect sense. When we attempt to acquire a lock on a row that is currently being locked by another transaction, we are forced to wait. OK, so what happens when the lock is released by the other transaction?

If the first updater commits, [...] it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition

Once the lock is released, the search condition is re-evaluated based on the new version of the row, and if it's still a match, we get the result back.

Here is a breakdown of what's actually happened in our query:

First process obtains a lock on car 1 and changes owner from 1 to 3.
Second process attempts the same - it executes the query and finds car 1 and owner 1. However, there is a lock on car 1 so it waits.
First process commits, releasing the lock on car 1.
Second process re-evaluates the result-set - the owner for car 1 is no longer 1, it's 3. The owner it has in the result-set is 1, which is no longer a match. Now, instead of fetching the new owner 3, it discards the row (or returns nulls if it's an outer join).

After the lock is released by the first session, my intuition was that "now the second session can proceed to execute the query", but that is not what happens. What actually happens here is that part of the query executes before the lock, and another part after! The query is essentially paused mid-execution until the lock is released.

PostgreSQL Internals

Re-evaluation of the predicate after the lock is released is happening in EvalPlanQual.

This is why we get the updated owner from the car table, but the fields from the owner table are all nulls:

─[ RECORD 1 ]─┬────────
id            │ 1   -- car.id
owner_id      │ 3   -- car.owner_id
id            │ ¤   -- owner.id
name          │ ¤   -- owner.name

The documentation does not specifically address how different isolation levels handle joined tables, but it does offer a warning about complex searches in Read Committed:

it is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database. This behavior makes Read Committed mode unsuitable for commands that involve complex search conditions

💡 If you lock a row in a query that includes a join, and in the same transaction you also update one of the foreign keys you joined to, a concurrent transaction might get partial or no results.

Possible Solutions

Now that we have a some understanding of what's going on, we can explore possible solutions. We'll also present things we thought about, but didn't work out.

To change ownership we started by fetching the car and the associated owner, but we only lock the car. What if we also lock the associated owner, would that give us the updated row?

db=# BEGIN;
BEGIN;

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2
id            │ 2
name          │ jerry

db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# COMMIT;
COMMIT

db=# BEGIN;
BEGIN;

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
-- Session is locked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
(0 rows)

Nope! Locking both the car and the associated owner row does not solve the problem at all. This makes sense because by the time the second session gets a hold of the data, the owner is already different. Locking the original owner should not have any effect.

As a side note, if we wanted to allow a car to be without an owner, we would have made car.owner_id nullable. In this case, our query would have had to use an outer join, and SELECT FOR UPDATE cannot be used with outer joins:

db=# BEGIN;
BEGIN

db=*# SELECT * FROM car
LEFT JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car, owner;
ERROR:  FOR NO KEY UPDATE cannot be applied to the nullable side of an outer join

Locking the related row doesn't solve nor prevent the problem.

Use a Stricter Isolation Level

We mentioned before that our system is running in the default isolation level Read Committed. At this level, the only guarantee we get is that we can't have dirty reads, that is, we can't see uncommitted rows by another session. The Read Committed isolation level is the weakest isolation level provided by PostgreSQL and it is what most systems use. What if we use a stricter isolation level?

The Repeatable Read isolation level never sees either uncommitted data (dirty reads) or changes committed by concurrent transactions during the transaction's execution (repeatable reads). This is different from Read Committed that can in fact see changes committed by other sessions during its own execution. Let's try changing ownership with the Repeatable Read isolation level:

db=# BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2
id            │ 2
name          │ jerry

db=*# UPDATE car SET owner_id = 3
WHERE id = 1 RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# COMMIT;
COMMIT

db=# BEGIN ISOLATION LEVEL REPEATABLE READ;
BEGIN

db=*# SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
ERROR:  could not serialize access due to concurrent update

Using the Repeatable Read isolation level we get an error saying that something has changed in the row we are attempting to update. Getting an error is better than operating on inaccurate data, but it also means we need to handle this very specific error.

The strongest isolation level is the Serializable isolation level. At this level the database is operating as if everything is executed in a serial order. The results of running with the serializable isolation level is the same as Repeatable Read - the potential concurrency issue is detected and the database raises an error.

We wanted to avoid changing the isolation level because we didn't want to handle these specific errors and we were worried that the strict guarantees provided by the stronger isolation levels may cause some unexpected issues in other places.

Split the Query

At this point we realize that the problem we are having is not with the data on the car table - we lock the row and we get the updated data, which is what we expect. Our problem is with the data from the joined table. The most straight forward solution is to remove the join and issue two separate queries instead:

db=# BEGIN;
BEGIN

db=*# SELECT * FROM car WHERE car.id = 1
FOR NO KEY UPDATE;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2

db=*# SELECT * FROM owner WHERE id = 2;
─[ RECORD 1 ]─┬────────
id            │ 2
name          │ jerry

db=*# UPDATE car SET owner_id = 3 WHERE id = 1
RETURNING *;
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# COMMIT;
COMMIT

db=# BEGIN;
BEGIN

db=*# SELECT * FROM car WHERE car.id = 1
FOR NO KEY UPDATE;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 3

db=*# SELECT * FROM owner WHERE id = 3;
─[ RECORD 1 ]─┬────────
id            │ 3
name          │ george

Using this approach we first acquire a lock on the car row. If the row is locked we wait and get the updated value. Only then, using the updated owner id, we query the details of the owner and carry on with the process.

Use a Sub-Query

Splitting the query works but it requires two round-trips to the database - one to lock the row and another to fetch the owner. We usually like to avoid unnecessary round-trips to the database, so our next step was to inspect the execution plan of the query:

db=# BEGIN;
BEGIN

db=*# EXPLAIN SELECT * FROM car
JOIN owner ON car.owner_id = owner.id
WHERE car.id = 1 FOR NO KEY UPDATE OF car

                                    QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────
LockRows  (cost=0.31..16.37 rows=1 width=56)
  ->  Nested Loop  (cost=0.31..16.36 rows=1 width=56)
        ->  Index Scan using car_pkey on car  (cost=0.15..8.17 rows=1 width=14)
              Index Cond: (id = 1)
        ->  Index Scan using owner_pkey on owner  (cost=0.15..8.17 rows=1 width=42)
              Index Cond: (id = car.owner_id)

The execution plan provides some interesting details on the lock! Notice the "LockRows" node - this is where the rows are locked and marked for re-revaluation. This confirms our initial suspicion - the lock is obtained during the execution of the query, after the join was performed.

What if we can re-arrange the query so that the lock is obtained earlier, before the join is executed. Let's move the lock to a sub-query:

db=# BEGIN;
BEGIN;

db=*# EXPLAIN
WITH c AS (
    SELECT * FROM car WHERE id = 1
    FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;
                                    QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────
 Nested Loop  (cost=8.33..16.39 rows=1 width=44)
   CTE c
     ->  LockRows  (cost=0.15..8.18 rows=1 width=14)
           ->  Index Scan using car_pkey on car  (cost=0.15..8.17 rows=1 width=14)
                 Index Cond: (id = 1)
   ->  CTE Scan on c  (cost=0.00..0.02 rows=1 width=8)
   ->  Index Scan using owner_pkey on owner  (cost=0.15..8.17 rows=1 width=36)
         Index Cond: (id = c.owner_id)

The lock is obtained on the results of the CTE (common table expression) which is only fetching the car. Only once the lock is obtained, the database proceeds with the nested loop to join data from the owner table. Looks promising!

Let's execute this concurrently and see if we get the expected results:

db=# BEGIN;
BEGIN

db=*# WITH c AS (
    SELECT * FROM car WHERE id = 1
    FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;

─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 1
id            │ 1
name          │ haki

db=*# UPDATE car SET owner_id = 2 WHERE id = 1;
UPDATE 1

db=*# COMMIT;
COMMIT

db=# BEGIN;
BEGIN

db=*# WITH c AS (
    SELECT * FROM car WHERE id = 1
    FOR NO KEY UPDATE
)
SELECT * FROM c
JOIN owner ON c.owner_id = owner.id;
-- Session is blocked
--
--
--
--
--
--
--
--
--
--
-- Lock is released by first session
─[ RECORD 1 ]─┬────────
id            │ 1
owner_id      │ 2
id            │ 2
name          │ jerry

Hello Jerry! We now get exactly the result we expected. Using a sub-query we forced the database to lock the row before joining the owners table, therefore, we get the up-to-date owner after the first session updated the owner and the lock was released.

If for some reason you are unable to use the WITH clause, you can achieve the same using a sub-query:

db=# BEGIN;
BEGIN;

db=*# EXPLAIN
SELECT * FROM (
    SELECT * FROM car
    WHERE id = 1 FOR NO KEY UPDATE
) c
JOIN owner ON c.owner_id = owner.id;

                                      QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────
 Nested Loop  (cost=0.31..16.38 rows=1 width=44)
   ->  Subquery Scan on c  (cost=0.15..8.19 rows=1 width=8)
         ->  LockRows  (cost=0.15..8.18 rows=1 width=14)
               ->  Index Scan using car_pkey on car  (cost=0.15..8.17 rows=1 width=14)
                     Index Cond: (id = 1)
   ->  Index Scan using owner_pkey on owner  (cost=0.15..8.17 rows=1 width=36)
         Index Cond: (id = c.owner_id)

Using a sub-query, the lock is obtained on the inner query, and only after the lock is acquired the database can proceed with the join.

Prevention

This issue didn't come out of thin air, we actually ran into it in production. The funny thing is that it didn't even happen in a process where we normally expect highly concurrent load. The issue happened because the user could accidentally click a button in the UI very quickly which triggered multiple similar requests to the server. We have lots of assertions sprinkled in the code so both the system and the users weren't actually affected in any way.

Once we figured out the bad pattern we started to think about ways to prevent it. In the past we've used Django checks to detect and report on specific patterns, but this time it was harder to do. This pattern is not easy to detect - it requires advanced understanding of the code and the context in which every statement is executed. This sounds like a good job for you know what...

After some back and forth with an LLM we were able to identify several places that can potentially be impacted, and patched them. In all cases the solution was to issue separate queries instead of a join. Small price to pay for correct processes!

Row Locks With Joins Can Produce Surprising Results in PostgreSQL

The strange edge case that triggered our assertions

The Problem

Changing Ownership

Changing Ownership Concurrently

The Illusive Concurrency Issue

Possible Solutions

Use a Stricter Isolation Level

Split the Query

Use a Sub-Query

Prevention

Discussions

Similar articles

How to Get or Create in PostgreSQL

Future Proofing SQL with Carefully Placed Errors

Practical SQL for Data Analysis

How to Get the First or Last Value in a Group Using Group By in SQL

The Many Faces of DISTINCT in PostgreSQL

Want me to send you an email when I publish something new?

Share to show you care

Discussions

Similar articles

How to Get or Create in PostgreSQL

Future Proofing SQL with Carefully Placed Errors

Practical SQL for Data Analysis

How to Get the First or Last Value in a Group Using Group By in SQL

The Many Faces of DISTINCT in PostgreSQL

Want me to send you an email
when I publish something new?