How to play: Some comments in this thread were written by AI. Read through and click flag as AI on any comment you think is fake. When you're done, hit reveal at the bottom to see your score.got it
But Ardent can be useful for many, because cloud providers uses heavily restricted Postgres. And many use Aurora, which doesn't event let configure the `log_line_prefix`.
Though if cloud providers add file_copy_method=CLONE compatible managed pg ...
I wanted to try doing something similar to this in our dev environment (think shared dev database but per branch clones), but this limitation seemed tricky to accept:
> The source database can't have any active connections during cloning.
I wouldn't mind some lock contention, but having to kill all connections seemed a bit harsh
The connection constraint isn't arbitrary - Postgres needs a clean checkpoint and consistent file state before any file-level copy can be safe. With MVCC you'd think it could do better, but the physical copy approach bypasses the logical layer entirely. Whether "kill all connections" is acceptable really depends on whether you control the client code well enough to reconnect cleanly.
ZFS had zfs clone doing instant copy-on-write dataset clones since 2005. The whole filesystem-level trick predates Postgres 18 by about 20 years. XFS reflinks came later and honestly work fine, but it's funny watching the database world rediscover what storage engineers figured out decades ago.
“Never impacts production data” is impossible to guarantee. Playing with real world data often has side effects outside of the database. For example if you store oauth tokens to external services in your DB (customer integrations) it’s easy to mess up your customers data through a bad API call (been there done that).
There is still value in carefully testing on your prod DB, but for that you could just easily maintain a read replica. I don’t see the need for a SaaS here.
One of the main things people use us for is ease of testing writes on a per dev/agent basis which would be difficult on a read replica!
On the real world data impact I absolutely agree. We added something called "branch hooks" which essentially let you define SQL to run against the branch before it's returned
This lets you essentially anonymize and modify the branch to scrub unintended external side effects.
It's something that we're still working on though and trying to design the right abstractions around because we want to get that part right.
I’ll bite. You’re a dev at random mid size company and tasked with using this newfangled agentic tech to implement an intranet feature everybody wants and nobody else wants to build.
How do you get a staging and dev db together that’s going to let you test your migrations?
IIRC the sandbox part means you're not supposed to be using production data at all — the pitch is synthetic or anonymized copies. Though if you're piping real prod data in anyway, yeah, your concern stands.
LLMs are amazing and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly allergic to, and we want HN to be a place human conversation.
Looks interesting, curious what your moat here is. What prevents Supabase/Neon from doing this? Actually don't they already do this? How does this differ from the branching Neon and Supabase already offer?
We enable branching on any postgres DB through our architecture. So if you're on RDS, Planetscale, etc you can keep your DB where it is but also get the ability to branch with a full clone of the DB.
Neon does support copy on write branching natively and autoscaling compute but you make certain performance tradeoffs. A lot of the folks we've talked to that use RDS or Planetscale are reliant on things like query latencies supported by that platform's specific architecture but also want the ability to test on branches. We let you get the best of both worlds (branch but leave your DB where it is and freely choose your production environment based on prod concerns)
Supabase does have branching but they do not branch the data so you can't test any interactions that rely on the data. You can restore from backup as an option but this slows down based on data size since you're actually moving data as opposed to copy on write.
Longer term we want to be the place you branch all your data infra. So expanding to S3, Snowflake, MySQL etc.
For now though we're focusing on just postgres and getting it right!
Drop-the-database events occur whenever the review process is bypassed. It’s “Just run it; the agent wrote it” that undermines trust. The migration may be semantically correct yet practically incorrect, such as renaming a column to break a service that is in use or creating an index that locks the table during heavy use.
What Ardent does makes sense for a team setting where several agents/developers require their own environment before deploying code. But from a one-man-show founder’s perspective, the constraint is not about isolation but rather self-discipline.
A true read replica won't let you write! So if you need to test something like a backfill and see if anything goes wrong you wouldn't be able to quite as easily.
We'd let you instantly clone prod + user defined auto-anonymization so you can test writes. The architecture also somewhat takes the place of an existing read replica if you want to use it like that to make it more cost efficient.
Also since we're using copy on write for the clones they're incredibly storage efficient and the autoscaling compute helps minimize cost on clones by minimizing excess compute uptime
It's not uncommon (hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.
Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.
It's not uncommon (Hex.ai, etc all do this, as do developers, MCP tools, etc). One thing we do at Ardent is enable obfuscated read replicas. We can strip PII in the replicas, so your agents are operating on very realistic (but not sensitive) data. Moreover, they can do so in a way that doesn't impact your production database and is fast enough to wire into your CI/CD processes.
Jeremy is correct, though. The main risk/concern is primarily agents with write access. There are two high profile instances in the last year of agents dropping production databases (even when, in one case, after being given explicit instructions to never do such a thing). While read-replicas of a primary DB solve the "agents can't destroy things" problem, they don't solve things like testing schema migrations (in particular) or updates to the data.
Jedberg... Wow an internet legend replied to me! ><
> I'm much more worried about people who give full write access to their agents! But at least this solves that problem.
Yeah it goes without saying that write access would be crazy... But, it seems like people don't really care about the fact that they are just giving their private data to companies like Anthropic, OpenAI and Google.
> Branch anonymization
Branches default to a full copy of your production data.
<-- This doesn't seem a safe default to me...
Perhaps a data policy should be required to be in place before a branch can be cloned... The default configuration giving the LLM full prod data access by default, is a bad standard to set, I think.
Congrats on the launch! DB clones have been a game changer for my team, allowing us to build isolated workspaces for agents to do work ranging from optimizing queries/views to building UI/UX that works for the actually combinations of data we have.
We self-host DBLab since we had trouble getting Xata, Neon, and hosted DBLab configured.
I ran into this exact problem building browser automation agents that needed to test DB migrations. The real killer wasn't just getting a sandbox quickly, it was that reverting changes after a failed test would take forever with traditional backup/restore. One thing I'm curious about though - how do you handle agents that need to test against production data patterns but can't actually touch real user data? Do you have a synthetic data layer or is that on the user to solve?
Doesn't look open-source. If you are interested in having a Neon or git-like branching for PostgreSQL experience, have a look at Xata, which is based on ZFS like Delphix was:
ZFS clone approach is clever, but has anyone benchmarked restore times against Neon's page-server model at scale? My intuition says ZFS wins on small databases but the copy-on-write metadata overhead compounds past a few hundred GB. Would love to see actual numbers.
There's no reason why it shouldn't, Delphix primarily targeted Oracle, but there is of course not as much open-source enthusiasm for supporting a proprietary database as an open-source one.
The concept is cool but what value are you adding ontop of the Neon Twin infra it’s built on? It seems the same can be done just using Neon directly for half the cost?
Totally makes sense. We do offer a pure PAYG tier (starter) that scales completely dynamically to workload
But seems like this may be less about the absolute price but more about the way the 100/month of credit feels?
What do you think could be better? The intention of the 250/month scale tier was intended for companies scaling up that want BYOC for data residency etc. etc. and give them enough to test things internally without worrying about an overage bill before running it directly on prod but this might be able to be implemented better.
LLMs are amazing and we use them heavily ourselves - but not for modifying text that is to be posted to HN. Doing so leaves imprints on the language that readers are increasingly allergic to, and we want HN to be a place human conversation.
What happens when the sandbox diverges from prod schema mid-sprint? You're debugging against stale data and don't know it. That's the quiet failure mode nobody talks about.
`CREATE DATABASE clankerdb TEMPLATE sourcedb STRATEGY=FILE_COPY;`.
But Ardent can be useful for many, because cloud providers uses heavily restricted Postgres. And many use Aurora, which doesn't event let configure the `log_line_prefix`.
Though if cloud providers add file_copy_method=CLONE compatible managed pg ...
ref: https://boringsql.com/posts/instant-database-clones/