May 30, 2026

Living with a Harness — Notes from Ralph Loop

I built homeos, a CLI tool, using a Harness methodology called Ralph Loop. This was my first time bringing an AI agent into development in earnest. As of May 2026, Ralph Loop is increasingly treated as a somewhat classical approach, but I still gained enough observations from the past few months that I want to organize them while they are fresh.

Even though it has become classical, it still works well. For the foreseeable future I plan to keep developing with this Harness. As a starting point for my next project, I also published a minimal repository called ralph-loop-starter. This post summarizes what I learned in operation, following the conventions of that starter.

The essence of Ralph Loop

As I worked with Ralph Loop, its essence gradually came into view. Below are the elements I see as its heart.

Three roles

In Ralph Loop, the responsibilities of development are split cleanly across three actors.

Actor	Responsibility
Human	Sets direction, holds the requirements, approves, corrects
Conversational LLM	Writes the README, drafts PRD tasks, keeps SPEC and CONVENTIONS current
Ralph (the executor LLM)	Picks up one PRD task, implements it, and commits

At first I expected a simple division of labor: the human writes the PRD, Ralph implements it. After a few weeks of operation, the conversational LLM had taken over PRD drafting as well. The amount I personally type on my keyboard has steadily fallen — first source code went to zero, then PRD did, and now even SPEC and README are approaching zero.

The setup ends up being three windows side by side. An editor — though in the way I use it, it is mostly a reader now — a terminal where Ralph is running, and a chat window with the conversational LLM. While Ralph is implementing in the background, the conversational LLM and I can discuss a different problem.

Externalizing memory

PRD, README, SPEC, CONVENTIONS, report, Git history. Every decision, state, and history is stored as primitive text. As a result, I can persist things from before they become code — from the stage where an idea is still just “let’s make this.” The cost of writing things down has also dropped, so writing has itself become a habit. Ideas no longer vanish inside a session.

Documenting the how

Because everything reduces to text, switching editors or AI models does not break the operation.

Repositories used to be where you stored what a piece of software is — the implementation and the README. How you actually build it was scattered, across developers’ heads and internal wikis. With Ralph Loop, the how itself sits inside the repository, as PRD / SPEC / CONVENTIONS / prompt.md / AGENTS.md. The what and the how now live on the same git history. Even when a new developer takes over, they can have the AI explain the how back to them and pick up the project’s context.

The shape of Ralph Loop Starter

I extracted the setup that took shape through operation into a minimal repository called ralph-loop-starter. The files it ships with are:

File / Directory	Purpose
`README.md`	What this software is and how to use it (user-facing)
`SPEC/`	Internal behavior and invariants (developer-facing)
`PRD.md`	What and why to build, plus what to do now (Tasks)
`CONVENTIONS.md`	How to write the code (test pattern, lint, commit style)
`AGENTS.md` / `CLAUDE.md`	The philosophy of Ralph Loop
`prompt.md`	The procedure Ralph follows each loop
`ralph.sh` / `ralph.ps1`	The loop driver that starts Ralph
`reports/report.html`	The human-facing report Ralph writes each loop

Why SPEC became a directory

At first I had SPEC as a single file, SPEC.md, but as scope grew it bloated. Some projects want a yaml in OpenAPI form; others want a dot-format ER diagram for the data model. Putting all of that into one Markdown gets hard to read — formats clash and the file grows large.

So I converted SPEC.md into a SPEC/ directory. At the same time, the directory is permissive enough that you can just toss in any related documents — even if that means some specs go a little stale or the context window gets a bit squeezed.

Why HTML reports instead of progress.md

The standard Ralph Loop setup has Ralph append to progress.md and read it back at the top of each loop. I followed that pattern at first.

But as development continued, I lost sight of whether progress.md was actually doing anything. The information Ralph can draw on when starting a task is the task description, the relevant sources, and the recent commit log — and I came to feel that even without progress.md, the implementation was good enough. I also wanted to save tokens and limit how much of the context window it took up, so I gradually stopped feeding it to Ralph and eventually retired it.

In its place, I started writing reports/report.html. Ralph appends to it on every completed task but never reads it back. It is designed as a document for humans and the conversational LLM. Each entry has five required sections:

Judgement points — non-trivial choices made during implementation, and the reasoning
Unresolved / workarounds — places Ralph got stuck or sidestepped
Next PRD suggestions — follow-up tasks that surfaced while working on this one
Change summary — a description short enough to read in fifteen seconds
Review highlights — anything Ralph wants a human to look at directly

I use this HTML report when defining the next round of PRD work with the conversational LLM. I am still measuring how well it works in practice, and the structure may yet evolve.

Operating Ralph Loop

Let me walk through the standard one-cycle workflow, along with the patterns and observations that have surfaced over a few months of running it.

The standard cycle

One full cycle settles into six steps:

Work through the spec with the conversational LLM (what to build, for whom, how it is used)
Polish README.md and drop the internal spec into SPEC/ or CONVENTIONS.md
Have the conversational LLM draft Tasks in PRD.md at a small granularity
Run Ralph (ralph.sh)
Read report.html and, in conversation with the LLM, fold spec updates back into README / SPEC / PRD
Once enough updates have accumulated, run again

Spec-first PRD

A pattern that emerged from operation: fill in the SPEC before writing the PRD task. When SPEC is written thoughtfully, even if the PRD ends up a bit hasty Ralph still produces accurate work. On top of that, the PRD itself tends to be higher quality.

PRD task granularity moves through dialogue

Not fixing PRD task granularity has become the most important discipline of operation. If you lock in “tasks are this size” up front, you erase the conversational LLM’s space for judgment. I think preserving that space is what produces the quality of Ralph Loop operation.

Small steps

When Ralph runs at the scale of one task = one commit, a misjudgment by the AI does limited damage, and the cost to roll back is low. I find that this low-fear state is what lets me delegate more to the AI.

I know some argue we should be delegating more widely to AI, and I agree. But at the moment, the operational risks of widening that range — diffs becoming hard to predict, review load piling up — feel too large, so I do not, for instance, run a huge batch overnight and find myself drowning in review the next morning. I am waiting for some breakthrough that improves predictability and review speed before I push the loop size further.

Parallelism

Ralph can control its own loop count. Pass a maximum iteration count to ralph.sh and it will work through that many tasks in sequence. I switch between modes.

The default is a small loop, reviewing on the spot every time. Because I confirm each task’s result individually, I catch drift early. This is my default.

When task results are predictable — for instance when I am repeating similar implementation patterns — I run a larger batch and do something else. Sometimes I have run a bigger batch precisely because I wanted to do chores. Ralph contributes to the upkeep of my home.

Spec growth and development become a single act

There is no need to finalize the spec up front. Define just the core feature, get something running, then add the next task while checking the result one task at a time. The moment I try Ralph’s implementation and think “oh, I want this too,” the PRD grows.

In traditional development, those ideas often got shelved for later and sometimes evaporated. In Ralph Loop, jotting that idea down as a PRD task right away does not slow development, because Ralph keeps running even while I edit the PRD.

The result is that growing the spec and advancing the implementation run in parallel rather than one after the other.

Stepping back

Ralph Loop is not a silver bullet. If anything, it surfaces new kinds of challenges.

Losing sight of what I am building

I think a real problem in AI-driven development is that at some point you stop knowing what shape the thing you are building actually has. To prevent that, I try not to run large batches that change too much at once.

As a further defense, I have settled on a combination of two practices: re-reading the README periodically, and having the AI itself explain what it is building.

I used to think the AI explaining itself was a trust problem. Lately my view has shifted. I can no longer clearly say what the difference is between a human review and an AI review. Human reviews are said to bring guarantees or peace of mind — but what exactly counts as a guarantee or peace of mind? Is it actually better than an AI review? When pressed, I cannot answer. These days I just read the README and let the AI’s self-explanation give me the overall picture.

Harness is not a tool

There are plenty of Ralph-style Harness tools in the wild. I never adopted any of them. I have come to feel that a Harness is the kind of artifact that gets optimized to one person’s mental model; the incentive to consolidate it into a shared tool is weak.

If you depend on a framework, you cannot improve your Harness without waiting for it to update. When you decide that another approach is better, you also have to pay the cost of switching frameworks. In an era when models and Harness methods evolve quickly, a Harness you can change freely is more valuable. The reason I shaped ralph-loop-starter as “a collection of Markdown and shell scripts,” not as a tool, is precisely this. Users are free to rewrite it. It is a starting point, not a destination.

Scaffold moved from Framework to Harness

This is a larger generational shift that I noticed as I continued with Ralph Loop.

“Scaffold” used to mean things like the framework-bundled setup tools — Spring Initializr, for example — or design approaches like Clean Architecture. They all offered the same thing: a “correct” initial structure. Development started by adding domain-specific code on top of that structure.

In the age of AI agents, I feel that the meaning of Scaffold has shifted from Framework to Harness. The code itself can be generated on the spot by the AI reading SPEC and PRD. What you need instead is the scaffolding for how to drive the AI.

Ralph Loop is not tied to a language or a framework. Whether the project is Java or Rust, it boots from the same scaffold. I believe the era when Scaffold was tied to a specific language or framework is over.

Closing

The biggest things I gained from developing with Ralph Loop were two: the habit of externalizing memory, and an intuitive trust in handing judgment off to AI. The two reinforce each other. The more externalization progresses, the easier it becomes to delegate to AI; the more I delegate, the more useful the externalized memory becomes.

The Harness will keep improving. Over these past few months, I have come to feel that improving the Harness has itself become a software engineer’s job.