Testing Postgres race conditions with synchronization barriers

jwilliams · 140 days ago

This post confuses me a little. With my tests I try not to "reach inside" systems unless it's quite a specific integration test. Especially databases. In this case I feel like we're just... testing known PostgreSQL behavior?

Or to put another way; as others have observed, this could be solved with atomic updates and in some case SERIALIZABLE. These are right tools for balance operations - and if they’re used I’m not sure they need testing in this manner?

piskov · 140 days ago

That whole article should have been:

Use transactions table (just a name, like orders)

On it have an Insert trigger.

It should make a single update with simple “update … set balance += amount where accoundId = id”. This will be atomic thanks to db engine itself.

Also add check constraint >= 0 for balance so it would never become negative even if you have thousands of simultaneous payments. If it becomes negative, it will throw, insert trigger will rethrow, no insert will happen, your backend code will catch it.

—

That’s it: insert-trigger and check constraint.

No need for explicit locking, no stored procedures, no locks in you backend also, nada. Just a simple insert row. No matter the load and concurrent users it will work like magic. Blazingly fast too.

That’s why there is ACID in DBs.

—

Shameless plug: learn your tool. Don’t approach Postgresql/Mssql/whathaveyousql like you’re a backend engineer. DB is not a txt file.

Arifcodes · 140 days ago

Using SERIALIZABLE everywhere sounds nice in theory but it tanks throughput on write-heavy workloads. In practice, most teams I've worked with use a mix: SERIALIZABLE for the critical financial paths, READ COMMITTED + explicit locking for the rest.

The advisory lock pattern mentioned in the thread is underrated. We used pg_advisory_xact_lock for exactly the "first resource creation" race condition on a multi-tenant SaaS. Cleaner than dummy rows and no cleanup needed.

throwaway2ge5hg · 140 days ago

Postgres has SERIALIZABLE transaction isolation level. Just use it and then you never have to worry about any of these race conditions.

And if for some reason you refuse to, then this "barrier" or "hooks" approach to testing will in practice not help. It requires you to already know the potential race conditions, but if you are already aware of them then you will already write your code to avoid them. It is the non-obvious race conditions that should scare you.

To find these, you should use randomized testing that runs many iterations of different interleavings of transaction steps. You can build such a framework that will hook directly into your individual DB query calls. Then you don't have to add any "hooks" at all.

But even that won't find all race condition bugs, because it is possible to have race conditions surface even within a single database query.

You really should just use SERIALIZABLE and save yourself all the hassle and effort and spending hours writing all these tests.

theptip · 140 days ago

I’ve idly toyed with this problem as well, I think there’s a good opportunity to build a nice framework in Python with monkeypatching (or perhaps in other languages using DB/ORM middleware) so you don’t need to modify the code under test.

I think you can do better than explicit barrier() calls. My hunch is the test middleware layer can intercept calls and impose a deterministic ordering.

(There are a few papers around looking into more complex OS level frameworks to systematically search for concurrency bugs, but these would be tough to drop into the average web app.)

scottlamb · 140 days ago

It'd be interesting to see a version of this that tries all the different interleavings of PostgreSQL operations between the two (or N) tasks. https://crates.io/crates/loom does something like this for Rust code that uses synchronization primitives.

haliliceylan · 140 days ago

Thats not postgresql problem, thats your code

IMHO you should never write code like that, you can either do UPDATE employees SET salary = salary + 500 WHERE employee_id = 101;

Or if its more complex just use STORED PROCEDURE, there is no point of using database if you gonna do all transactional things in js

Diggsey · 140 days ago

Stored procedures don't eliminate serialization anomalies unless they are run inside a transaction that is itself SERIALIZABLE.

There's essentially no difference between putting the logic in the app vs a stored procedure (other than round trip time)

lirbank · 140 days ago

Fair point - atomic updates like SET salary = salary + 500 sidestep the race condition entirely for simple cases. The examples are intentionally simplified to isolate the concurrency behavior. The barrier pattern is more relevant when you have read-modify-write operations that involve application logic between the read and the write - those can't always collapse into a single UPDATE.

bare_stack · 140 days ago

We called these "rendezvous points" in the Oracle test harness back in '99. Same idea—force threads to hit checkpoints before proceeding. The barrier pattern never goes away, just gets rediscovered every decade with new syntax.

lirbank · 140 days ago

Here's a real-world example where atomic updates aren't an option - an order status transition that reads the current status from one table, validates the transition, and inserts into another:

await db().transaction(async (tx) => { await hooks?.onTxBegin?.();

  const [order] = await tx.select().from(orders)
    .where(eq(orders.id, input.id))
    .for("update");

  const [status] = await tx.select().from(orderStatuses)
    .where(eq(orderStatuses.orderId, input.id))
    .orderBy(desc(orderStatuses.createdAt))
    .limit(1);

  if (input.status === status.code)
    throw new Error("Status already set");

  await tx.insert(orderStatuses).values({ ... });

});

You need the transaction + SELECT FOR UPDATE because the validation depends on current state, and two concurrent requests could both pass the duplicate check. The hooks parameter is the barrier injection point from the article - that's how you test that the lock actually prevents the race.

flint99 · 140 days ago

We hit this in production with inventory updates across sharded tables. SELECT FOR UPDATE works but you need row-level locks before the read, not after validation. We ended up adding a lock acquisition step first, then doing the multi-table logic. The trick was making lock timeouts fail fast rather than pile up—set lock_timeout to something sane like 2s so you don't get cascading waits.

andrenotgiant · 140 days ago

Is there any good reason to use stored procedures in 2026?

zara · 140 days ago

Actually, stored procedures run with READ COMMITTED isolation by default in Postgres, same as any other transaction. You'd still need explicit SERIALIZABLE to avoid these anomalies. The JS vs stored procedure distinction doesn't really matter here.

fake-name · 140 days ago

I'm confused.

> The simplest case: no transaction, just a SELECT and an UPDATE with a barrier between them:

There is no context where you do not have a transaction. Postgres requires them.

It's likely that the library the author is using is doing automatic implicit transactions, but it seems like the author needs to understand their tools a bit better.

lirbank · 140 days ago

You're right, Postgres wraps every statement in an implicit transaction. The point of that first example is that the SELECT and UPDATE are in separate auto-committed transactions - there's no explicit transaction block wrapping both. So another connection can change the value between your SELECT and your UPDATE.

jijji · 140 days ago

to avoid these conditions i have usually inserted a row into a lock table used for this purpose to create a lock with a unique key for that row with a few minute timer, the once the transaction is complete it will delete the lock row. This way, simultaneous users will only get the first lock, all other requests would fail, and then if the timer expired, we would assume the transaction never completed and it could try again after a few minutes

egedev · 140 days ago

We hit exactly this kind of race condition in our Go + Postgres SaaS when handling concurrent waitlist signups. Two requests would read the current count, both pass the limit check, and both insert — exceeding the waitlist cap.

Ended up using SELECT FOR UPDATE on the waitlist row before the count check. Simple but effective. The barrier testing approach described here would have caught this much earlier in development instead of discovering it under load.

One thing I'd add: in Go, it's tempting to handle this at the application level with mutexes, but that breaks the moment you have multiple instances. Pushing the serialization down to Postgres is almost always the right call for correctness.

lirbank · 140 days ago

Hey, thanks for sharing this - these bugs are so easy to miss because everything works fine until you get real concurrent traffic. And yeah, the moment you have multiple instances, app-level mutexes can't save you.

ravi928 · 140 days ago

SELECT FOR UPDATE works until you have failover or connection pooling issues. Then you get deadlocks at 3am and discover your retry logic doesn't handle them.

HackerThemAll · 140 days ago

Javascript developers learn kindergarten basics of transactions and SQL. LOL. Is it the camp "we don't need a degree to be programmers"?

leo · 140 days ago

So the solution to race conditions is... introducing a new synchronization primitive. What could go wrong.

lirbank · 140 days ago

Nice - that's a good case for barriers too. When there's no row to SELECT FOR UPDATE against, you'd inject the barrier after acquiring the advisory lock and verify the second transaction blocks until the first commits.

klysm · 140 days ago

Seems like a good way to materialize the conflict.

deepsun · 140 days ago

I always did "INSERT ... ON CONFLICT DO NOTHING".