How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
This post confuses me a little. With my tests I try not to "reach inside" systems unless it's quite a specific integration test. Especially databases. In this case I feel like we're just... testing known PostgreSQL behavior?
Or to put another way; as others have observed, this could be solved with atomic updates and in some case SERIALIZABLE. These are right tools for balance operations - and if they’re used I’m not sure they need testing in this manner?
It should make a single update with simple “update … set balance += amount where accoundId = id”. This will be atomic thanks to db engine itself.
Also add check constraint >= 0 for balance so it would never become negative even if you have thousands of simultaneous payments. If it becomes negative, it will throw, insert trigger will rethrow, no insert will happen, your backend code will catch it.
—
That’s it: insert-trigger and check constraint.
No need for explicit locking, no stored procedures, no locks in you backend also, nada. Just a simple insert row. No matter the load and concurrent users it will work like magic. Blazingly fast too.
That’s why there is ACID in DBs.
—
Shameless plug: learn your tool. Don’t approach Postgresql/Mssql/whathaveyousql like you’re a backend engineer. DB is not a txt file.
Using SERIALIZABLE everywhere sounds nice in theory but it tanks throughput on write-heavy workloads. In practice, most teams I've worked with use a mix: SERIALIZABLE for the critical financial paths, READ COMMITTED + explicit locking for the rest.
The advisory lock pattern mentioned in the thread is underrated. We used pg_advisory_xact_lock for exactly the "first resource creation" race condition on a multi-tenant SaaS. Cleaner than dummy rows and no cleanup needed.
Postgres has SERIALIZABLE transaction isolation level. Just use it and then you never have to worry about any of these race conditions.
And if for some reason you refuse to, then this "barrier" or "hooks" approach to testing will in practice not help. It requires you to already know the potential race conditions, but if you are already aware of them then you will already write your code to avoid them. It is the non-obvious race conditions that should scare you.
To find these, you should use randomized testing that runs many iterations of different interleavings of transaction steps. You can build such a framework that will hook directly into your individual DB query calls. Then you don't have to add any "hooks" at all.
But even that won't find all race condition bugs, because it is possible to have race conditions surface even within a single database query.
You really should just use SERIALIZABLE and save yourself all the hassle and effort and spending hours writing all these tests.
I’ve idly toyed with this problem as well, I think there’s a good opportunity to build a nice framework in Python with monkeypatching (or perhaps in other languages using DB/ORM middleware) so you don’t need to modify the code under test.
I think you can do better than explicit barrier() calls. My hunch is the test middleware layer can intercept calls and impose a deterministic ordering.
(There are a few papers around looking into more complex OS level frameworks to systematically search for concurrency bugs, but these would be tough to drop into the average web app.)
It'd be interesting to see a version of this that tries all the different interleavings of PostgreSQL operations between the two (or N) tasks. https://crates.io/crates/loom does something like this for Rust code that uses synchronization primitives.
Fair point - atomic updates like SET salary = salary + 500 sidestep the race condition entirely for simple cases. The examples are intentionally simplified to isolate the concurrency behavior. The barrier pattern is more relevant when you have read-modify-write operations that involve application logic between the read and the write - those can't always collapse into a single UPDATE.
We called these "rendezvous points" in the Oracle test harness back in '99. Same idea—force threads to hit checkpoints before proceeding. The barrier pattern never goes away, just gets rediscovered every decade with new syntax.
Here's a real-world example where atomic updates aren't an option - an order status transition that reads the current status from one table, validates the transition, and inserts into another:
You need the transaction + SELECT FOR UPDATE because the validation depends on current state, and two concurrent requests could both pass the duplicate check. The hooks parameter is the barrier injection point from the article - that's how you test that the lock actually prevents the race.
We hit this in production with inventory updates across sharded tables. SELECT FOR UPDATE works but you need row-level locks before the read, not after validation. We ended up adding a lock acquisition step first, then doing the multi-table logic. The trick was making lock timeouts fail fast rather than pile up—set lock_timeout to something sane like 2s so you don't get cascading waits.
Actually, stored procedures run with READ COMMITTED isolation by default in Postgres, same as any other transaction. You'd still need explicit SERIALIZABLE to avoid these anomalies. The JS vs stored procedure distinction doesn't really matter here.
> The simplest case: no transaction, just a SELECT and an UPDATE with a barrier between them:
There is no context where you do not have a transaction. Postgres requires them.
It's likely that the library the author is using is doing automatic implicit transactions, but it seems like the author needs to understand their tools a bit better.
You're right, Postgres wraps every statement in an implicit transaction. The point of that first example is that the SELECT and UPDATE are in separate auto-committed transactions - there's no explicit transaction block wrapping both. So another connection can change the value between your SELECT and your UPDATE.
to avoid these conditions i have usually inserted a row into a lock table used for this purpose to create a lock with a unique key for that row with a few minute timer, the once the transaction is complete it will delete the lock row. This way, simultaneous users will only get the first lock, all other requests would fail, and then if the timer expired, we would assume the transaction never completed and it could try again after a few minutes
We hit exactly this kind of race condition in our Go + Postgres SaaS when handling concurrent waitlist signups. Two requests would read the current count, both pass the limit check, and both insert — exceeding the waitlist cap.
Ended up using SELECT FOR UPDATE on the waitlist row before the count check. Simple but effective. The barrier testing approach described here would have caught this much earlier in development instead of discovering it under load.
One thing I'd add: in Go, it's tempting to handle this at the application level with mutexes, but that breaks the moment you have multiple instances. Pushing the serialization down to Postgres is almost always the right call for correctness.
Hey, thanks for sharing this - these bugs are so easy to miss because everything works fine until you get real concurrent traffic. And yeah, the moment you have multiple instances, app-level mutexes can't save you.
SELECT FOR UPDATE works until you have failover or connection pooling issues. Then you get deadlocks at 3am and discover your retry logic doesn't handle them.
Nice - that's a good case for barriers too. When there's no row to SELECT FOR UPDATE against, you'd inject the barrier after acquiring the advisory lock and verify the second transaction blocks until the first commits.
Or to put another way; as others have observed, this could be solved with atomic updates and in some case SERIALIZABLE. These are right tools for balance operations - and if they’re used I’m not sure they need testing in this manner?