← Back to blog

Why I Run My Homelab Like Production

Most homelabs are playgrounds. Mine runs on the same discipline as production infrastructure — git as source of truth, spec-driven change, and identity-as-code — and that's exactly what lets me hand the keys to an AI agent.

I let an AI agent make changes to my home infrastructure. Not suggestions in a chat window — actual changes: opening proposals, raising issues, writing config, pushing branches. People’s first reaction is usually some version of “you did what to your network?”

The honest answer is that letting an agent touch my lab is the easy part. The hard part — the part that took years — was making the lab safe enough that handing over the keys was a small, boring decision rather than a reckless one. This post is about that part: why I run a homelab like it’s production, and why that turns out to be the prerequisite for everything interesting I do with it.

Most homelabs are pets. Mine is cattle.

The classic homelab is a collection of pets. You SSH into a box, install something, tweak a config until it works, and move on. Six months later you have no idea what you changed or why, nothing is reproducible, and the only disaster-recovery plan is “hope the disk doesn’t die.” It’s fun, but it’s fragile.

I went the other way. My lab runs on the same principles I spend my working life helping enterprises adopt:

  • Git is the source of truth. If a change isn’t in a repo, it didn’t happen. A Proxmox cluster underneath, a three-node Kubernetes cluster on top (Cilium, MetalLB, Traefik), the apps, the firewall rules, DNS, identity — all declared in code. ArgoCD reconciles the cluster from git; Ansible manages hosts after provisioning; the network and firewall are defined with OpenTofu.
  • Nothing changes by hand. No clicking around in a UI to fix something “just this once.” That’s how drift starts. Changes are committed first, then applied — never the reverse. Drift checks run on a schedule and get loud when live state diverges from git.
  • Secrets live in a vault, not in files. Every workload pulls its credentials from OpenBao through the External Secrets Operator. An internal certificate authority (Step CA) issues the TLS that ties it together. Nothing sensitive sits in a repo.
  • Backups are assumed, not hoped for. Cluster state, databases, and volumes are backed up off-site — Velero plus native snapshots to object storage — with the restore path known before it’s needed, because a backup you’ve never restored is just a rumour.
  • The default branch is protected. Changes go through pull requests and gated CI checks in Gitea, even when I’m the only reviewer. Especially when I’m the only reviewer.

None of this is exotic. It’s just production hygiene, applied at home. The payoff is that the lab is reproducible, recoverable, and — crucially — legible. I can look at any part of it and know what it is and why.

How the lab evolves: spec-driven development

Discipline that only exists at a single point in time decays. The harder question is how a system changes without accumulating entropy. My answer is spec-driven development, using a workflow called OpenSpec.

Every meaningful change to the lab goes through the same four steps:

  1. Explore — think the problem through, read the existing system, write no code yet.
  2. Propose — write the change down first: what’s changing, why, the design decisions, and the spec deltas. This becomes a reviewable artifact before a single line is implemented.
  3. Apply — implement against the proposal, task by task.
  4. Archive — fold the change back into the canonical spec once it ships.

Each proposal is raised as a tracked issue, referenced by the pull request that implements it, and closed when the change lands — so every change is traceable from “why did we do this?” all the way to the commit. Dependency updates flow through the same gates via Renovate; an inventory system keeps a live map of what actually exists. The spec stays current because archiving is part of the workflow, not an afterthought.

This sounds heavy for a home network. It isn’t. The overhead is a few minutes of writing intent before touching anything — and in return I never have to reverse-engineer my own decisions. A recent example: I retired the last of my legacy directory infrastructure and consolidated identity into a single provider, configured entirely as code. That’s the kind of change that turns into a multi-weekend archaeology project in a pets-style lab. Here it was a proposal, a reviewed diff, and a clean cutover with a rollback path. It’s the difference between a system you operate and a system you excavate.

The proof point: an AI agent with its own keys

Here’s where it pays off. Because the lab is fully codified, reversible, and gated, I can safely let an AI agent operate inside it — and I do.

The agent doesn’t borrow my credentials. It operates under its own identity: a dedicated, least-privilege service account, scoped only to what a given task needs, with its credentials stored in the vault — never mine. Before it acts against any system, it reads its own service-account credential for that system. If it doesn’t have one, it stops and asks. It can’t quietly escalate its own access, and everything it does is attributable to it, not to me.

It works the same way I do: it explores, it writes a proposal, it raises an issue, it opens a pull request. A human reviews the spec before anything is applied, and branch protection means the agent can’t merge past a failing check any more than I can. The same spec-driven workflow that keeps me honest is what makes an autonomous agent safe to run — the guardrails aren’t AI-specific, they’re just the production discipline that was already there.

It also doesn’t have to phone home to do its job. A local LLM runs on a GPU in the cluster (via Ollama) for the work that should stay on-prem, keeping the sensitive parts of the loop inside my own walls. And it keeps a memory of past decisions, so the why behind the lab survives between sessions — for the agent as much as for me.

Why this matters beyond the lab

I spend my working life on exactly this question for enterprises: what has to be true before you can trust an autonomous system in production? The answer keeps coming back to the same fundamentals — identity, least privilege, auditability, reversibility, and a change process you can actually inspect.

My homelab is where I get to answer that question with my own hands instead of slideware. Running it like production isn’t gold-plating a hobby. It’s the reason the hobby can do things most production environments still can’t.


A note on authorship: this post was co-written with Claude — the same agent described two sections up. It read the lab’s own documentation, helped shape the structure, and we edited it together. Which is either the most on-brand thing I could do, or proof that the experiment is further along than you’d think. Probably both.