Running Multiple AI Coding Agents in Parallel: A System for Safe, Self-Merging PRs

JLJeff Liu·Jun 14, 2026·6 min read

Running one AI coding agent is easy. It edits files, you check the result, done.

Running several at once on the same repo is where it gets interesting, and where it almost bit me. The bottleneck stops being how good the code is. It becomes how the agents coordinate.

The catch is that the agents can't see each other. Each one runs in its own session with no idea the others exist. The only thing they share is the repo itself, the git history and the open pull requests.

So in one session I had three agents working, and they ended up on the same branch, each forked from a different point in history. One of them was about to open a pull request that would have quietly deleted a feature another agent had already shipped. Not because it was wrong. It had branched from a stale commit, before a squash-merge, so its diff read as a deletion. Every automated check was green. No human reviewer would have caught it from the checkmarks.

That near-miss is the whole lesson. Multi-agent engineering fails at coordination, not capability. The models write fine code. They trip over each other. So I stopped trying to make the agents smarter and started treating it like managing a team of contractors who can't see each other's screens.

Step 1: one agent, one worktree, off fresh main

This is the single most important rule. Two agents on one branch is the number one source of divergence and silent regressions.

The rule: one agent gets its own git worktree, branched off fresh origin/main, opens a small PR, merges, and the branch gets deleted. Never share a branch.

Why a worktree and not just a branch? A git worktree is a second working copy of the same repo, checked out to its own branch, in its own folder, sharing one underlying history. It's like giving each contractor a separate desk with the same filing cabinet behind them. They physically cannot collide on the filesystem.

In Claude Code this is a setting:

json

{
  "worktree": {
    "bgIsolation": "worktree",
    "baseRef": "fresh",
    "symlinkDirectories": ["node_modules"]
  }
}

bgIsolation: worktree gives each background agent an isolated checkout.
baseRef: fresh branches from origin/main, never a stale local ref. This is the line that prevents the regressive diff from the story above.
symlinkDirectories means worktrees share installed packages instead of each running a slow reinstall.

Step 2: recover a stale branch without nuking anyone's work

When a branch has already diverged and carries useful commits, do not force-push it. Another agent might be on it. Instead:

bash

git fetch origin
# make a fresh worktree off the latest main
git worktree add ../recovery -b feature/clean origin/main
# bring over only the genuinely-new commits; already-merged ones drop out
git cherry-pick <new-commit-sha>
# check the diff has no unexpected deletions, then push under a new name

This is exactly how I turned that dangerous branch (the one whose PR would have deleted a live feature) into a clean five-file PR with zero regression.

Step 3: make "done" mean the whole system stays true

The quieter failure is drift. Code changes, but the docs, the ticket, and the changelog don't. Multiply that across a bunch of fast agents and the project's memory rots.

The fix is to make those updates a requirement to merge, not a thing you remember. A change isn't done when the code works. It's done when the living system stays coherent, all in the same pull request:

Docs: code that changed a documented concern updates its doc.
The conventions file (AGENTS.md or CLAUDE.md): no policy now contradicts the change.
Changelog: an entry, or a conventional-commit title it can be derived from.
Ticket: referenced, acceptance criteria checked, status moved.
Tech debt: touched files scanned, orphaned code removed.

Step 4: never let an agent be its own only reviewer

This holds beyond AI, too.

When the same agent that wrote the code also reviews it, it brings the exact blind spots that produced the bug. Same context, same assumptions, correlated errors. Self-review feels productive and catches almost nothing structural.

So I use a fresh-context reviewer plus adjudication:

The building agent self-verifies first. Typecheck, lint, tests. Don't hand back broken basics.
A separate agent with no memory of building it reviews the change adversarially.
I adjudicate. Fresh eyes catch blind spots, but they also lack the builder's context, so they raise false alarms. I keep the real findings and overrule the wrong ones.

On the last change, the fresh reviewer found one genuine bug (a component forcing underlines on nested links) and two false positives (it flagged valid code it didn't have the version context for). Applying all three blindly would have introduced a regression. The judgment step is what makes independent review safe.

Review style	Catches deep bugs?	Risk
Agent reviews its own work	No, it shares its blind spots	False confidence
Fresh agent, no judgment	Yes, but flags non-issues	Applies wrong fixes
Fresh agent plus you adjudicate	Yes	Low, best of both

Step 5: let the pipeline review and merge itself

Once each agent is isolated and independently reviewed, you can wire it so PRs are self-merging and you only step in on the exceptions:

Deterministic CI gates: typecheck, lint, unit tests, database tests, secret scanning.
An independent AI review on every PR, run as a loop-auditor that checks correctness plus the definition of done above.
Branch protection that requires the always-on checks. I leave admin bypass on so trivial docs can still go straight to main.
Auto-merge and auto-delete, so the PR merges itself the moment the gates and the review pass.

The AI review can route through an AI gateway so it shares one key with unified billing and observability, using the gateway's Anthropic-compatible endpoint:

yaml

- uses: anthropics/claude-code-action@v1
  env:
    ANTHROPIC_BASE_URL: https://ai-gateway.vercel.sh
    ANTHROPIC_AUTH_TOKEN: ${{ secrets.AI_GATEWAY_KEY }}
  with:
    claude_args: "--model anthropic/claude-sonnet-4.6 ..."

Where it lands

An agent finishes work, opens a PR, a fresh agent reviews it and audits the definition of done, it self-merges when green, and the branch auto-deletes. Nothing piles up. The docs and tickets and changelog stay current because the gate won't let them drift. I step in only on the ones that go red.

What I'd tell you if you're starting

The models write fine code. Coordination is what breaks. Solve it the way you'd manage a team that can't see each other's screens.
Isolation is non-negotiable. One agent, one worktree, fresh main, short-lived branch.
Self-review is a blind-spot trap. Use a fresh-context reviewer and adjudicate what it finds.
Make the system self-healing by turning docs, tickets, and changelog updates into merge gates, not afterthoughts.
Then let the pipeline merge itself, so you supervise by exception instead of babysitting every change.

Start with two agents on two clearly separate parts of a project and watch them each open a clean PR. That's the whole thing, scaled down to where you can see it work.

Keep reading

Building a Commit Guard for AI Agents That an Adversary Can't Slip Past

I gave a coding agent a guard that checks every commit lands in the repo and branch I intend. My 417 tests were all green. Then I hired a second agent to break it, and it walked through two holes the green suite never thought to check.

Jul 22, 2026Read

Graph Engineering Makes Agent Failure Legible, Not Agents Reliable

A video making the rounds argues graph engineering supersedes the loop-based agent pattern. After building multi-agent systems, I think the loop-versus-graph framing asks the wrong question. Topology decides whether you can SEE a failure, not whether one happens. Reliability lives in the deterministic checks at the edges, not in how you wire the agents.

Jul 22, 2026Read

Running Multiple AI Coding Agents in Parallel: A System for Safe, Self-Merging PRs

Step 1: one agent, one worktree, off fresh main

Step 2: recover a stale branch without nuking anyone's work

Step 3: make "done" mean the whole system stays true

Step 4: never let an agent be its own only reviewer

Step 5: let the pipeline review and merge itself

Where it lands

What I'd tell you if you're starting

Keep reading

Building a Commit Guard for AI Agents That an Adversary Can't Slip Past

Graph Engineering Makes Agent Failure Legible, Not Agents Reliable

Claude Desktop vs Cowork vs Code: A Beginner's Roadmap to Which One to Use

Remote-Controlling AI Coding Agents From Your Phone: What's Real, What It Costs, and Where It's Going

Why I'm Building AI Education for Kids

AI Native, Human First

Using AI Without Losing Your Humanity

How I Built a Publishing Stack for AI Search

Building Systems, Then 10x the Output

45 Rule Files Were Making My AI Worse

Why I Let AI Fight With Itself Before I Ship Anything

How I Turned My AI's Mistakes Into Guardrails