◆ Supervised Worker
Takes a well-specified ticket (a dependency bump, a flaky-test fix, a small feature behind a flag), implements it, writes the tests, runs the build-and-test loop in a sandbox, and opens a PR for the review agent to gate. It works bounded, scoped changes and queries the originating agent when a ticket is underspecified.
Memory
Working The ticket, the plan, the files changed and the test results.
Episodic Similar past changes in this codebase.
Semantic The codebase architecture, conventions and build system.
Procedural Implementation patterns that passed review in this repo.
Store Repo-context retrieval + change-history store
Orchestration
pipeline MCPA2A
Harness · Managed Agents: session per ticket; sandboxed code-exec for build + test loops; structured note-taking across multi-file changes.
Tools
{ } Git / repo API ›_ Build + test sandbox Code exec { } Issue tracker API ⇄ Originating-agent clarification channel A2A
Evals & guardrails
- Every change goes through the Code Review Agent and its judge gate before merge; never self-merges.
- Must pass CI in the sandbox before opening a PR; red builds aren't submitted.
- Scope guardrail: refuses to touch files outside the ticket's stated blast radius.
Frontier edge
- ▲Long-horizon autonomy: drives a multi-file change end to end (implement, test, iterate on red builds) across a multi-hour run, well along the METR time-horizon curve.
- ▲World-model simulation: runs the build-and-test loop in a sandbox to verify the change before opening a PR, so red builds aren't submitted.
- ▲Proactive scoping: detects an underspecified ticket and asks before coding, rather than guessing and producing a plausible-but-wrong patch.
In numbers
58%
Scoped tickets auto-implemented
3.5x
Backlog toil-ticket throughput
Handoffs
Hands to → Code Review Agent