The Self-Assembling Harness

On the difference between building systems around models and models building systems around shared reality

The mainstream consensus on agent architecture has crystallised into a clean equation:

Agent = Model + Harness

The model contains the intelligence. The harness makes that intelligence useful. A harness is every piece of code, configuration, and execution logic that isn't the model itself — system prompts, tools, filesystems, sandboxes, orchestration, memory, compaction. The human engineer designs the harness. The model inhabits it. The harness serves the model.

This is correct. It describes Claude Code, Codex, Devin, and every serious coding agent in production today. The harness is where the engineering happens. Models get better; harnesses get better; the product is the combination.

sync proposes a different equation:

Agent = Model + Room

where the room is not designed for the model but constructed by the models (and humans) that inhabit it.

This is not a rejection of harnesses. It is a question about where harness construction happens: before the agent starts, or while the agent works.

The harness inventory

A good harness provides:

Durable storage — filesystem, database, workspace
Tools — bash, code execution, browser, APIs
Memory — injected context, AGENTS.md, conversation history
Context management — compaction, summarisation, progressive disclosure
Execution environment — sandboxes, runtimes, dependency management
Verification — test runners, linters, screenshot comparison
Orchestration — subagent spawning, handoffs, routing

These are real needs. Every agent system must solve them somehow.

sync solves them too, but the mechanism is different at every level:

Harness component	sync equivalent	Who builds it
Filesystem	Scoped state (key-value substrate)	The substrate provides it
Tools	Registered actions (declarative transitions)	Agents register them
Memory	Context reads + computed views	Agents define the projections
Context management	depth/only/include parameters + _context envelope	The substrate provides shaping; agents choose depth
Sandbox	Room (isolated, scoped, agent-bound)	The substrate provides isolation
Verification	Views that evaluate post-write state; result expressions	Agents define the checks
Orchestration	(absent)	Nobody. Stigmergic coordination replaces it

The rightmost column is the interesting one. In a harness, the human engineer builds every row before the agent starts. In sync, the substrate provides the primitive capabilities (storage, isolation, context shaping) and the agents build the specific vocabulary (what tools exist, what memory projections matter, what verification looks like) at runtime.

What "self-assembling" means concretely

An agent arrives in an empty sync room. It faces four built-in actions:

_register_action — propose a write capability
_register_view — propose a read capability
_send_message — communicate
help — read guidance

There is no filesystem. No tool set. No memory structure. No verification strategy. There is a substrate — scoped, versioned, observable state — and four operations for constructing vocabulary over it.

The agent reads help({ key: "standard_library" }) and gets a set of ready-to-register action templates: set, delete, increment, append, claim, submit_result, vote. It picks what it needs. It registers actions with domain-specific names, preconditions, and write templates. It registers views that compute derived facts. The "harness" — the tool set, memory structure, and coordination protocol — assembles itself through use.

This is not a hypothetical. It is how sync rooms work today. The MCP integration means Claude reads the room's context (state, views, available actions) and invokes actions through tool calls. The tool set is the registered vocabulary. The memory is the state. The context management is the _context envelope. None of it was pre-designed for this specific task.

When a second agent arrives, it reads the same context. It sees the vocabulary the first agent registered. It can invoke existing actions, register new ones, or contest existing ones by proposing competing vocabulary. The coordination protocol is not orchestrated — it emerges from agents reading shared state and acting on what they see.

What a harness knows that a room doesn't

The self-assembling story has real limits, and they are worth naming honestly.

A harness knows the task. Claude Code's harness is designed for coding. Its filesystem is structured for codebases. Its tools include git, grep, compilers. Its verification strategy is "run the tests." A sync room has no task knowledge until agents bring it. The first agent in an empty room must bootstrap from scratch. This is powerful for generality but expensive for cold start. A pre-designed harness is faster to productive work than a self-assembled one.

A harness knows the model. Claude Code's harness is co-trained with the model — the model was post-trained to work well with specific tool interfaces like apply_patch. This coupling improves performance but creates brittleness: change the tool format and performance degrades. sync's vocabulary is not coupled to any model's training data. Actions are registered at runtime with arbitrary IDs, parameters, and write templates. This is more flexible but may miss the performance gains of harness-model co-training.

A harness has opinions about quality. A well-designed harness includes lint checks, test runners, compilation gates — deterministic verification that catches errors before they propagate. sync rooms have no built-in quality checks on vocabulary or state. A broken view persists indefinitely. A stale action accumulates without signal. The room's _context envelope tells you what's there but not whether it's good.

This last point is the most important gap. The harness model's verification story is mature: the harness designer builds quality gates, the model operates within them. The substrate model's verification story is nascent. Views can evaluate post-write state. Action preconditions can enforce invariants. The _contested synthetic view surfaces write-target conflicts. But there is no systematic assessment of whether the room's vocabulary is serving its purpose or accumulating cruft.

This is an open problem, not a solved one. The adaptive salience design document describes what such an assessment might look like — computing room-level signals from the trajectory (broken view count, stale action ratio, message-to-invocation ratio) and surfacing them in the context envelope. But the right mechanism is not yet clear. The substrate can count broken views. Whether it should compute a "vocabulary health" assessment, and what form that should take, is a question the current implementation has not yet answered. The data exists. The aggregation strategy doesn't.

The filesystem analogy and where it breaks

The harness literature identifies the filesystem as "arguably the most foundational harness primitive." Files provide durable storage. Agents read data, write outputs, coordinate through shared files. Git adds versioning. The filesystem is a natural collaboration surface.

sync's state is not a filesystem. The differences matter:

Files are opaque; state entries are structured. To read a file, you open it and parse its contents. To read state, you access a typed value at a scoped key. Views compute derived facts across entries without parsing. An agent can ask "how many findings are recorded?" and get 2 from a view, rather than reading a file, parsing JSON, and counting array elements.

Files have no authority model; state has scopes. Anyone with filesystem access can read or write any file. State entries are scoped: agent alice cannot read agent bob's private scope. This means the substrate can enforce privacy without relying on filesystem permissions, which in practice most agent harnesses don't use.

Files have no transitions; state has actions. Writing a file is unconditional. Writing state requires invoking a registered action, which may have preconditions, parameter validation, and scope authority checks. The registration is the commitment. There is no _set_state.

Files have no computed projections; state has views. A file sitting on disk doesn't tell you anything about the state of the system unless you read and interpret it. Views compute and return derived facts on every context read. The room is self-describing.

These differences are not about quality — filesystems are excellent and proven. They are about what the primitive makes easy. Filesystems make storage and retrieval easy. The state-plus-actions-plus-views substrate makes structured coordination easy. A harness built on a filesystem needs orchestration logic to coordinate multiple agents. A room built on the substrate gets coordination from agents reading shared state and acting on what they see.

Progressive disclosure: harness-level vs substrate-level

The harness model has developed a sophisticated approach to progressive disclosure. "Skills" are a harness primitive that load tool descriptions on demand rather than stuffing everything into context at start. Tools are progressively disclosed as the task requires them. This protects against context rot — the degradation of model performance as the context window fills with irrelevant material.

sync's progressive disclosure operates at a different level. Actions have if preconditions: an action is available only when its predicate holds against current state. Views have enabled expressions: a view is visible only when its condition is met. The vocabulary space itself grows as agents register new actions and views. The disclosure is driven by state, not by task stage.

The distinction: in a harness, progressive disclosure is designed by the engineer ("show the database tools after the schema is loaded"). In sync, progressive disclosure is a consequence of vocabulary design ("this action is available when phase == synthesizing"). The agent who registered the action chose the disclosure predicate. The substrate evaluates it. Nobody external decides the disclosure schedule.

This makes sync's disclosure more flexible (any CEL expression can gate any action) but less curated (nobody ensures the disclosure sequence makes pedagogical sense). The harness model can create a carefully designed onboarding flow. The substrate model creates disclosure conditions that are locally coherent but may not globally compose into a sensible progression.

The orchestration gap

The harness model treats multi-agent coordination as an engineering problem: "orchestration logic (subagent spawning, handoffs, model routing)." Someone designs the coordination. The harness implements it. AutoGen does multi-turn conversation. MetaGPT does SOP-based role sequences. CrewAI does event-driven flows. In each case, a human decided the interaction pattern.

sync has no orchestration. This is by design, and the "Isn't This Just ReAct?" essay argues it is the core architectural claim. But the absence of orchestration means the absence of guarantees about coordination quality. Two agents in a room might coordinate beautifully through stigmergic traces, or they might register competing vocabulary, spam messages, and produce incoherent state.

The harness model would solve this by designing better orchestration. sync's answer is that the room surfaces coordination problems (the _contested view, directed messages, visible agent objectives) and trusts agents to resolve them. This works when agents are capable enough to read the room and negotiate. It fails when they aren't. The current architecture has no fallback for coordination failure — no escalation, no deadlock detection, no "the agents have been messaging in circles for 20 turns and nothing has changed."

Whether the substrate should have an opinion about coordination quality — and what form that opinion should take — is part of the same open question as vocabulary health. The data is there. The trajectory records everything. What to compute from it, and what to surface, is unresolved.

The training coupling

The harness model has a specific prediction about the future: "as models get more capable, some of what lives in the harness today will get absorbed into the model." Planning, verification, long-horizon coherence will improve natively. Harnesses will shrink.

sync's trajectory is different. If the "harness" is co-constructed by agents at runtime, more capable models don't shrink it — they produce richer vocabulary. A more capable agent registers more precise actions, more expressive views, more nuanced preconditions. The room grows with agent capability rather than being replaced by it.

This is a testable prediction, not a proven one. But the direction matters: the harness model assumes a convergence where the model absorbs the infrastructure. The substrate model assumes a divergence where better models produce better infrastructure. In one future, the harness disappears. In the other, it flourishes.

The coupling story also runs differently. Harness-model co-training creates performance but also brittleness — models overfit to specific tool interfaces. sync's vocabulary is arbitrary: action IDs, parameter names, write templates are all agent-chosen at runtime. There is no training-time coupling to overfit to. But there may also be no training-time optimisation to benefit from. Whether an agent that constructs its own vocabulary outperforms one that uses a pre-designed tool set is an empirical question that depends on model capability, task complexity, and time horizon. For short tasks with known tool sets, the pre-designed harness almost certainly wins. For long-horizon tasks in novel domains with multiple agents, the self-assembled vocabulary may win — because nobody could have pre-designed the right harness.

What sync actually is, relative to harnesses

sync is not a harness. A harness is designed by engineers for models. sync is a substrate where agents construct their own operational vocabulary.

sync is not a meta-harness — that implies a system that generates harnesses, which isn't quite right either. The room doesn't generate a tool set and hand it to an agent. The room is the environment, and the vocabulary that agents register within it is the tool set, discovered and refined through use.

The closest framing: sync is a substrate for self-assembling coordination infrastructure.

The substrate provides: scoped state, context shaping, scope authority, append-only trajectory, version-stamped entries, declarative transitions.

The agents provide: the vocabulary (what actions exist, what views compute, what preconditions gate), the conventions (phase management, role definitions, negotiation patterns), and the work (invoking actions, reading context, sending messages).

The resulting room — state plus vocabulary plus trajectory — is the thing a harness engineer would have designed in advance, except nobody designed it and it emerged from agent activity within the substrate's constraints.

The honest boundaries

The substrate model has real advantages over the harness model for multi-agent coordination in open-ended tasks: no pre-designed interaction patterns, no orchestration bottleneck, vocabulary that evolves with the task, coordination that emerges from shared state.

It has real disadvantages for single-agent, well-defined tasks: slower cold start, no training-time coupling, no pre-designed verification strategy, no curated progressive disclosure.

And it has open questions that the harness model has mostly answered:

How should the substrate assess vocabulary quality? The data exists in the trajectory. The aggregation strategy doesn't.
How should the substrate signal that coordination is failing? Contested targets and message volume are mechanical signals. Whether they compose into a meaningful assessment is unclear.
How should the substrate encourage vocabulary revision? Registration is idempotent. Nothing in the current system makes revision easier or more visible than accretion.

These are not rhetorical questions. They are the implementation frontier. The adaptive salience design document describes possible approaches — computed room-level signals, observation/interpretation splits, evaporating salience. Some of those approaches will prove right, some will prove wrong, and some will prove to be the wrong question entirely.

What the substrate model offers that the harness model doesn't is a place for those answers to live. In a harness, quality assessment is an engineering decision made at design time. In a room, quality assessment could be a computed property of the trajectory — a view over the room's own history. The assessment would be legible to the agents, revisable through the same vocabulary mechanisms as everything else, and subject to the same coordination dynamics. The room wouldn't just contain the work. It would contain the assessment of the work. And agents would act on both.

That is the aspiration. The implementation has not yet caught up. The room is the world model — but the world model is still missing an opinion about its own health. Whether that opinion should be computed in the substrate, expressed as agent-registered views, or something else entirely is the question we're building toward, not the question we've answered.

Christopher · Edinburgh · March 2026

c15r

sync