← Back to blog

Different Strokes for Different Folks

Two projects born ten days apart grew into wildly different things — a small tool that became a product, and a framework for getting AI agents to check each other's work. Both came down to the same instinct: match the stroke to the job, whether the stroke is a tool or an agent.

Two of my projects were born ten days apart. One started as a throwaway script and turned into a production system I run every week. The other was independent from day one — a public framework for getting AI agents to collaborate without stepping on each other. They were never the same codebase and they grew into completely different things. What ties them together is quieter: the lessons I learned building the first shaped the second, and the second ended up running inside the first.

The instinct underneath both is the same one a tradesperson has about a brush: you don’t use the same stroke for every job. Sometimes the right answer is the lightest thing that could possibly work. Sometimes it’s a heavyweight, gated pipeline. And — increasingly — “the right stroke” isn’t even a tool. It’s which agent you hand the work to.

Start with the smallest stroke that works

The first project began as the most boring thing imaginable: a command-line tool and a couple of AI agent “skills” to stop me hand-writing the same documents over and over. No web app, no database, no infrastructure. A script, some templates, and a prompt or two. It did one job, and it did it on my laptop.

That restraint was deliberate. The fastest way to kill a useful tool is to build the cathedral before you know anyone wants to pray in it. So it stayed small — until reality started asking for more. More people wanted to use it. The output needed to be consistent across all of them. “Runs on my laptop” became “needs to run somewhere everyone can reach.” Each of those pressures earned a bigger stroke, one at a time, until the thing that started as a CLI was a real multi-tenant product with all the unglamorous production concerns that come with one: authentication, persistence, access control, a deployment pipeline, the lot.

The lesson I keep relearning is that the ceremony should trail the stakes, not lead them. The work tool didn’t deserve a production deployment on day one. It earned it on day forty, when the cost of it breaking finally outweighed the cost of running it properly.

The process is the real product

Here’s the part that surprised me: the most valuable thing that project produced wasn’t the product. It was the way I built it.

Every meaningful change went through the same spec-driven loop — explore the problem, write down what’s changing and why, implement against that, then fold it back into the canonical spec. Early on that was just discipline. But the loop itself kept growing teeth. Review gates got added before anything could land. A branch-and-worktree lifecycle replaced “edit the files and hope.” Changes started carrying priority and area metadata so the queue could order itself.

By the time I looked up, the change process had more engineering in it than some of the features. That’s not a complaint — it’s the tell. Building this way taught me what disciplined coordination actually requires: independent review, an honest record of who decided what, and changes that can’t quietly collide. Those lessons turned out to matter most somewhere else entirely — the messiest coordination problem I had, which was more than one agent in the room.

Different folks: when one agent isn’t the right stroke

I don’t build with a single AI agent. I build with several, and they’re not interchangeable. One model is a better architect; another is a better implementer; a third is the one I trust to find the security hole. Asking one generalist agent to do all of that is like asking one person to be author, editor, and fact-checker on the same sentence at the same time — you get something plausible and unverifiable.

The obvious move is to give each agent the role it’s actually good at. The hard part is coordination: how do independent agents review each other’s work, disagree productively, and converge on something a human can sign off — without quietly agreeing with whoever spoke first?

That coordination problem deserved a project of its own — and it had one. agentic-mcp-teaming is a small, standalone open-source framework I built for exactly this, independent of any product it might end up serving. It’s shaped by everything the work tool had taught me about disciplined change: named specialists — an architect, a security reviewer, an implementer, a tester — collaborate through a shared coordination bus, review each other in parallel, and have to reach unanimous agreement before the work advances. Later, I wired it into the work tool itself — but the framework was never carved out of that product; the product became one of its consumers.

How the bus actually works

A few design decisions in that framework are worth pulling out, because they’re where “different agents for different roles” stops being a slogan and becomes mechanics.

One bus for everything. The coordinator is itself an MCP server, and every interaction — reading a file, invoking an agent, advancing the workflow, recording a decision — is a tool call on that one server. There are no side channels. That means a single, honest audit trail: every action flows through one place and is logged before the coordinator responds. Any MCP-capable client — Claude, a human at a CLI, some future agent — can plug into the same bus and play.

Review the same snapshot, independently. When an artifact goes up for review, the coordinator captures one snapshot — the artifact plus the exact tool outputs — and hands that identical snapshot to every reviewer in parallel. They each respond before any feedback is shared. This is the bit I care about most: it stops the agents from anchoring on each other. Nobody gets to read “looks good to me” before forming their own opinion. You get genuinely independent first-pass reviews, not an echo.

Consensus, but honest about how it was reached. If every reviewer approves, the artifact is marked consensus-reached. If they deadlock — someone blocks, or they burn through the revision cap — it escalates to me, and if I push it through, it’s marked human-approved. Both let the work move forward, but the record never pretends a human override was a unanimous agreement. The audit log stays truthful about why each thing advanced.

Isolation by filesystem, not by lock. During implementation, each task runs in its own Git worktree built from a recorded base commit. An agent produces a diff; the coordinator applies it inside that worktree; a reviewing agent checks it; only then does it merge back. No two agents ever write to the same checkout at the same time. The isolation is physical, so the coordination doesn’t have to be clever about locking — it just has to be honest about ordering.

The loop closes

The satisfying part is how the two met in the middle. The work tool taught me what good multi-agent coordination needs; that learning went into a framework that was always its own separate project; and then I implemented that framework back inside the work tool, sharpening how its own changes get proposed, reviewed, and merged. Lessons flow one way, capability flows back the other — but the two were never the same thing. Two independent projects, each leaving the other better than it found it.

That’s really all “different strokes for different folks” means to me. The right tool for a job might be fifty lines or fifty thousand. The right reviewer might be a model that’s brilliant at architecture and useless at finding injection bugs, so you hand the injection bugs to the one that’s wired for paranoia. The skill isn’t owning the biggest brush. It’s knowing which stroke the work in front of you actually needs — and being willing to use a small one when a small one is right.


A note on authorship: this post was co-written with Claude — one of the very agents described above. Which is either perfectly on-brand or slightly too on-brand. Both, probably.