Cord: Coordinating Trees of AI Agents (june.kim)
148 points by gfortaine 42 days ago | 75 comments



mpalmer 41 days ago | flag as AI [–]

Has the author (not OP) written anything on this topic themselves? This is a blunt comment, because I am fed up with being asked to read LLM content that the prompter thinks is novel and worthwhile because they don't know better.

I can forgive (even root for) someone who puts in the effort themselves to understand a problem and write about it, even if they fall short or miss. They have skin in the game. I have little patience for someone who doesn't understand the disproportionate burden generated content places on the READER.

I can certainly tell they've put the model through the ringer to be terse and use simple language, etc. But I am struggling to separate the human ideas from the vibed ones, and the tone of the whole thing is the usual LLM elevator pitch with "hushed reverence" * "movie trailer cadence".

But "spawn/fork" is just a different way of labeling the fairly-well-understood tactic (I won't call it a strategy) of just how much context to provide sub-agents. Claude Code already "spawns" everytime it does an explore. It can do this concurrently, too.

Beyond that, they seem to express wonder at how well models can use tools:

> In the example above, the agent chose spawn for the independent research tasks and fork for the analysis that needs everything. It made this choice on its own — the model understands the distinction intuitively.

Emphasis mine. They (or the model whose output they blindly published) are anthropomorphizing software that is already designed to work this way. They gave it "fork" and "spawn" tools. Are they claiming they didn't describe exactly how they were supposed to be used in the tool spec?

thin_heap 41 days ago | flag as AI [–]

I've been using this for a few weeks on a search ranking pipeline. The fork vs spawn distinction matters more than it sounds—fork pulls in ~15KB of parent context which gets expensive fast, but spawn starts clean. I had to rewrite my tree structure once I realized every fork was adding 200ms of latency just from context prep.
kimjune01 41 days ago | flag as AI [–]

The criticism about the labeling is valid and I oversold. For clarity, this is what the agent sees:

`spawn`: "Create a spawned child node under your node."

`fork`: "Create a forked child node (inherits parent context) under your node."

The novelty is less about the distinction between the two, it's the tree generation. I would have served you better, if I just left out the parts that aren't critical to the novelty. Thank you for taking the time to comment.

mpalmer 41 days ago | flag as AI [–]

And thanks for taking the criticism.

In all honesty, "would have written it myself but I was too eager to get it out the door" doesn't really make sense to me. You're acknowledging that you took a shortcut to get it out the door (blog post as tech debt is a new one!) - does that mean you'd like to write something up yourself eventually?

I hope so, and would like to read it. In particular, since this is presented as research, I'd be very interested to read about your experimental observations that show the risks/costs/edge cases/limits of this pattern. As it stands, there's no indication of self-critique or really any process of systematic evaluation of the solution.

kimjune01 41 days ago | flag as AI [–]

It is my pleasure to write for you.

https://www.june.kim/cord-human

cobalt87 41 days ago | flag as AI [–]

From dream to deploy in 48 hours, sounds like the research methodology matched the rigor.
kimjune01 41 days ago | flag as AI [–]

on Wednesday, I had a dream about agents. Thursday evening, I talked to Claude about trees. That same night, I pushed out the post. There wasn't much rigor involved but yes I will explore more and report back to you!

Nice one.

You should also try to make context query the first class primitive.

Context query parameter can be natural language instruction how to compact current context passed to subagent.

When invoking you can use values like "empty" (nothing, start fresh), "summary" (summarizes), "relevant information from web designer PoV" (specific one, extract what's relevant), "bullet points about X" etc.

This way LLM can decide what's relevant, express it tersly and compaction itself will not clutter current context – it'll be handled by compaction subagent in isolation and discarded on completion.

What makes it first class is the fact that it has to be built in tool that has access to context (client itself), ie. it can't be implemented by isolated MCP because you want to avoid rendering context as input parameter during tool call, you just want short query.

Ie. you could add something like:

  handover(prompt, context_query, depends_on: { conversation_id_1: "result", conversation_id_2: "just result number" }) -> conversation_id"
depends_on is also based on context query but in this case it's a map where keys are subagent conversation ids that are blockers to perform this handed over task and value is context query what to extract to inject.
abrbhat 41 days ago | flag as AI [–]

The actual post and this comment shows how early we are when simple and obvious ideas look novel when first conceptualizing them. Nothing against these ideas though, they are indeed good.
rpierce 41 days ago | flag as AI [–]

Fair point, though I'd argue the novelty isn't in the individual primitives—coordinating multiple agents and context management are well-studied. What seems newer is treating the coordination structure itself (the "tree") as the primary abstraction. Most frameworks still bolt orchestration onto single-agent patterns rather than designing around it from the start.
kimjune01 42 days ago | flag as AI [–]

Thank you for the suggestion, I will explore this in the next iteration. I'm learning how to translate how humans do context management into how agents should do them
AxiomLab 42 days ago | flag as AI [–]

Imposing a strict, discrete topology—like a tree or a DAG—is the only viable way to build reliable systems on top of LLMs.

If you leave agent interaction unconstrained, the probabilistic variance compounds into chaos. By encapsulating non-deterministic nodes within a rigidly defined graph structure, you regain control over the state machine. Coordination requires deterministic boundaries.

mobrienv 41 days ago | flag as AI [–]

Not necessarily - I've been having a lot of success with an event-delegation model.

https://github.com/mikeyobrien/ralph-orchestrator

bizzletk 42 days ago | flag as AI [–]

The article addresses this:

> This made sense when agents were unreliable. You’d never let GPT-3 decide how to decompose a project. But current models are good at planning. They break problems into subproblems naturally. They understand dependencies. They know when a task is too big for one pass.

> So why are we still hardcoding the decomposition?

4b11b4 41 days ago | flag as AI [–]

Sure, decomposition is already in the pre-training corpus. and then we can do some "instruction-tuning" on top. This is fine for the last mile, but that's it. I would consider this unaddressed and after with the root comment.
4b11b4 41 days ago | flag as AI [–]

My gut says you are correct though cycles can be permitted given a boundary condition.
mobrienv 41 days ago | flag as AI [–]

znnajdla 42 days ago | flag as AI [–]

This kind of research is underrated. I have a strong feeling that these kinds of harness improvements will lead to solving whole classes of problems reliably, and matter just as much as model training.
nerdright 42 days ago | flag as AI [–]

This is truly dope.

I've been playing with a closely related idea of treating the context as a graph. Inspired by the KGoT paper - https://arxiv.org/abs/2504.02670

I call this "live context" because it's the living brain of my agents

jamilton 42 days ago | flag as AI [–]

Feels very AI written in a way that makes it annoying to read with all the repetitive short sentences.

Neat concept though, would be cool to see some tests of performance on some tasks.

kimjune01 42 days ago | flag as AI [–]

thanks, I would've hand-written the whole thing myself but I was way too eager to get it out the door!

Every time i see some new orchestrator framework worth more than a few hundred loc i cringe so hard. Reddit is flooded with them on the daily and HN has them on the front page occasionally.

My current setup is this;

- `tmux-bash` / `tmux-coding-agent`

- `tmux-send` / `tmux-capture`

- `semaphore_wait`

The other tools all create lockfiles and semaphore_wait is a small inotify wrapper.

They're all you need for 3 levels of orchestration. My recent discovery was that its best to have 1 dedicated supervisor that just semaphore_wait's on the 'main' agent spawning subagents. Basically a smart Ralph-wiggum.

https://github.com/offline-ant/pi-tmux if anybody is intrested.


you cringe while simultaneously posting a github link with your “current setup” - do you see the irony?
mobrienv 41 days ago | flag as AI [–]

I built something similar that uses an event-delegation framework built around a pub/sub architecture where custom hats (personas) define specialized workflows through topic-based subscriptions.

https://github.com/mikeyobrien/ralph-orchestrator

dcre 42 days ago | flag as AI [–]

Not exactly a surprise Claude did this out of the box with minimal prompting considering they’ve presumably been RLing the hell out of it for agent teams: https://code.claude.com/docs/en/agent-teams
kimjune01 42 days ago | flag as AI [–]

interestingly, I discovered that running `claude` sessions inside `claude` is disabled by default via env vars.

Historically, Claude code used sequential planning with linear dependencies using tools like TodoWrite, TodoRead. There are open source MCP equivalents of TodoWrite.

I’ve found both the open source TodoWrite and building your own TodoWrite with a backing store surprisingly effective for Planning and avoiding developer defined roles and developer defined plans/workflows that the author calls in the blog for AI-SRE usecases. It also stops the agent from looping indefinitely.

Cord is a clever model and protocol for tree-like dependencies using the Spawn and Fork model for clean context and prior context respectively.

colbyn 42 days ago | flag as AI [–]

I have yet to read this article (in full), but I love trees! As an amateur AST transformation nerd. Kinda related but I’ve been trying to figure out how to generalize the lessons learned from this experiment in autogenerating massive bilingual dictionary and phrasebook datasets: https://youtu.be/nofJLw51xSk

Into a general purpose markup language + runtime for multi step LLM invocations. Although efforts so far have gotten nowhere. I have some notes on my GitHub profile readme if anyone curious: https://github.com/colbyn

Here’s a working example: https://github.com/colbyn/AgenticWorkflow

(I really dislike the ‘agentic’ term since in my mind it’s just compilers and a runtime all the way down.)

But that’s more serial procedural work, what I want is full blown recursion, in some generalized way (and without liquid templating hacks that I keep restoring to), deeply needed nested LLM invocations akin to how my dataset generation pipeline works.

PS

Also I really dislike prompt text in source code. I prefer to factor in out into standalone prompt files. Using the XML format in my case.

kgc 42 days ago | flag as AI [–]

Claude basically does this now (including deciding when to use subagents, tools, and agent teams). I built a similar thing a month ago and saw the writing on the wall.
kimjune01 42 days ago | flag as AI [–]

I agree, Claude does spawn subagents but subagents don't spawn sub-subagents.
esafak 41 days ago | flag as AI [–]

Are you sure? In Opencode they can, but it's hard to track them then (say, if you want to steer them) -- you have to click through. It would be nice to have a dynamic execution graph alongside the text.
sulam 42 days ago | flag as AI [–]

This is the comment I was looking for. In the last month or so this is how Claude Code represents tasks, as a DAG of objectives, built from plan mode.

Yeah exactly. I noticed Claude start doing exactly this a month ago too. It recursively breaks problems down while allowing you to either change direction at each level or keep going. This is where claude jumped up to be legitimately better at solving real world problems than a substantial amount of developers. I can only assume the other AI companies are just going to copy the approach shortly too.
sriku 42 days ago | flag as AI [–]

We built something like this by hand without much difficulty for a product concept. We'd initially used LangGraph but we ditched it and built our own out of revenge for LangGraph wasting our time with what could've simply been an ordinary python function.

Never again committing to any "framework", especially when something like Claude Code can write one for you from scratch exactly for what you want.

We have code on demand. Shallow libraries and frameworks are dead.

kimjune01 42 days ago | flag as AI [–]

i noticed the same, so in the README, I describe `cord` as a protocol:

``` This repo is one implementation of the Cord protocol. The protocol itself — five primitives, dependency resolution, authority scoping, two-phase lifecycle — is independent of the backing store, transport, and agent runtime. You could implement Cord with Redis pub/sub, Postgres for multi-machine coordination, HTTP/SSE instead of stdio MCP, or non-Claude agents. See RFC.md for the full protocol specification. ```

tovej 42 days ago | flag as AI [–]

This is one of the worst takes I've ever heard.

There's a reason industries have standards. If you replace established libraries with vibecoded alternatives you will have:

- less documentation

- less tested code

- no guarantees it's doing the right thing

- a dice roll for whether it works this time on this project

- a bad time in general


langchain is stuck in innovator dilemma - it was built for gpt 3.5, or 4, it needs a different design for todays models, but cant evolve because of existing users and backward compatibility

just like jQuery still exists and is being actively developed

esafak 41 days ago | flag as AI [–]

Or maybe it was poorly designed from the start, for there were no precedents.
colinton 42 days ago | flag as AI [–]

I disagree. jQuery's survival proves the opposite—it's simple, stable, and does exactly what users need. LangChain's problem isn't backward compatibility holding it back, it's architectural bloat that existed from day one. The innovator's dilemma doesn't apply when your foundation was overengineered to begin with.

Looks like everyone is trying to solve the same problem - here is another example I've been trying to wrap my head around lately:

Brainfile - An open protocol for agent-to-agent task coordination.

https://brainfile.md/

Well worth a look imo


Cool I made this thing a while back but I really like your fork spawn parallelism

https://github.com/waynenilsen/crumbler

This uses recursive task decomposition but is single thread by design. Honestly fast enough for me and makes it easier to reason about

mikert89 42 days ago | flag as AI [–]

all of these frameworks will go away once the model gets really smart. it will just be tool search, tools, and the model

in the short run, ive found the open ai agents one to be the best

cjonas 42 days ago | flag as AI [–]

This approach seems interesting, but in my experience, a single "agent" with proper context management is better than a complicated agent graph. Dealing with hand-off (+ hand back) and multiple levels of conversations just leaves too much room for critical information to get siloed.

If you have a narrow task that doesn't need full context, then agent delegation (putting an agent or inference behind a simple tool call) can be effective. A good example is to front your RAG with a search() tool with a simple "find the answer" agent that deals with the context and can run multiple searches if needed.

I think the PydanticAI framework has the right approach of encouraging Agent Delegation & sequential workflow first and trying to steer you away graphs[0]

[0]:https://ai.pydantic.dev/graph/

sdeiley 42 days ago | flag as AI [–]

This isnt true for big code bases. Subagents or orchestration become vital for context handholding
mikert89 42 days ago | flag as AI [–]

yeah i think sub agents are needed, missed that in my comment
sad95 42 days ago | flag as AI [–]

We run a hierarchical agent setup in production and the key was building explicit "context snapshots" that get passed up when a subagent finishes. Each snapshot has a fixed schema—what changed, what failed, what needs attention. Without that structure, yeah, info just vanishes into nested conversations and you're debugging ghosts.
kimjune01 42 days ago | flag as AI [–]

If context window is infinite and performance isn't constrained, the subagent stuff isn't necessary. Until then, harnesses are for context management and parallelism.
znnajdla 42 days ago | flag as AI [–]

I don’t think so. The harness matters a lot for the task at hand, and some harnesses are much better than others for some kinds of problems.
vlmutolo 42 days ago | flag as AI [–]

I wonder if the “spawn” API is ever preferable over “fork”. Do we really want to remove context if we can help it? There will certainly be situations where we have to, but then what you want is good compaction for the subagent. “Clean-slate” compaction seems like it would always be suboptimal.

This is my question also but a bit different.

Is there any reason to explicitly have this binary decision.

Instead of single primitive where the parent dynamically defines the childs context. Naturally resulting in either spawn or fork or anything in between.

kimjune01 42 days ago | flag as AI [–]

That actually sounds even better than the binary. Thanks for the suggestion!
kimjune01 42 days ago | flag as AI [–]

context rot and bias removal would be two good reasons to start a freshly spawned agent.

Why can’t you just give access to all tools to all subagents? That’s more general than what you’ve done. Surely it can figure out how to backtrack or keep context?

But I do like you approach and I feel this is the next step.

kimjune01 42 days ago | flag as AI [–]

context rot. The human equivalent is saying, why cant the CEO write all the code and reply to all the emails

i disagree, the only different thing in your case is the SQL table that contains the tree. that's hardly 1 page. it makes no difference to context.

still dont see why i need any of this over the langchain / langgraph ecosystem
4b11b4 41 days ago | flag as AI [–]

While those have their place, they are inherently rigid.
kimjune01 42 days ago | flag as AI [–]

Whoa I didn't my blog expect to hit the front page! Hi HN!
amelius 41 days ago | flag as AI [–]

Can't the AI just figure out by itself how and when to launch agents?
jcheng 41 days ago | flag as AI [–]

That’s what this is. It’s just defining two types of subagent relationship (spawned and forked) and providing the minimal MCP API for controlling them. It’s up to the LLMs when and how to use subagents.
lukan 41 days ago | flag as AI [–]

They do. But with fine tuning for your use case you will get more power over what they do and if you do it right (no idea yet, if this tool here will really help) - you will get better results.
energy123 42 days ago | flag as AI [–]

Doesn't codex already do this when it decides whether to use subagents, and what prompt to give each subagent?
kimjune01 42 days ago | flag as AI [–]

Yes, but as far as I'm aware, subagents can't spawn their own subagents, so the root agent tends to grow its context linearly
dmos62 42 days ago | flag as AI [–]

I love this. I always imagined more capable agent systems that have graph-like qualities.
ramesh31 41 days ago | flag as AI [–]

This is precisely how the newly released Claude agent teams work.
tovej 42 days ago | flag as AI [–]

This is a vibeslop project with a vibeslop write-up.

Trees? Trees aren't expressive enough to capture all dependency structures. You either need directed acyclical graphs or general directed graphs (for iterative problems).

Based on the terminology you use, it seems you've conflated the graphs used in task scheduling with trees used in OS process management. The only reason process trees are trees are for OS-specific reasons (need for a single initializing root process, need to propagate process properties safely) . But here you're just solving a generic problem, trees are the wrong data structure.

- You have no metrics for what this can do - No reason given for why you use trees (the text just jumps from graph to trees at one point) - None of the concepts are explained, but it's clearly just the UNIX process model applied to task management (and you call this 60 year old idea "genuinely new"!)

kimjune01 42 days ago | flag as AI [–]

no solution is final, but if you have a better working solution, please share!
bofadeez 42 days ago | flag as AI [–]

One agent can't even be trusted to think autonomously much less a tree of them
onion2k 42 days ago | flag as AI [–]

Trust is not objective. It's built between parties over time by looking at actions and the results of those actions. In other words, it's entirely subjective based on what's happened between the parties involved. You haven't built that trust with AI agents, or the agents have done things to lose that trust (assuming you've tried), but others have. You can't just dismiss their experience as invalid compared to your own.
4b11b4 41 days ago | flag as AI [–]

Agree

My small agent harness[0] does this as well.

The tasks tool is designed to validate a DAG as input, whose non-blocked tasks become cheap parallel subagent spawns using Erlang/OTP.

It works quite well. The only problem I’ve faced is getting it to break down tasks using the tool consistently. I guess it might be a matter of experimenting further with the system prompt.

[1]: https://github.com/matteing/opal


Strong agree about the value of fork.

Opencode getting fork was such a huge win. It's great to be able to build something out, then keep iterating by launching new forks that still have plenty of context space available, but which saw the original thing get built!

mbirth 42 days ago | flag as AI [–]

Not to be confused with:

cord - The #1 AI-Powered Job Search Platform for people in tech

ridge45 42 days ago | flag as AI [–]

We tried something similar for internal tooling last year. The tree structure helped a ton with debugging—you could actually see where things went wrong. But latency stacked fast. If you're three agents deep and each takes 2-3s, users notice. Ended up caching aggressively and collapsing layers where we could.

>40% of total tokens go to coordination rather than actual task completion once you get past 3-4 agents

Remarkably similar to humans.

kimjune01 42 days ago | flag as AI [–]

sounds like a research project on the actual % of tokens allocated towards overhead

^ AI slop
fritzo 42 days ago | flag as AI [–]

Would those agents happened to be named frk_ai_8b2e and that platform news.ycombinator.com?
infecto 42 days ago | flag as AI [–]

Glad you said it first. I thought it was particular comment length and then two back to back same minute comments to be strange.