◆ Always-on Monitor
The fleet's brakes. It enforces input/output guardrails inline (PII redaction, prompt-injection screening, policy and scope checks) and, when an agent breaches policy or a drift signal crosses the line, throttles, downgrades autonomy, or scope-kills that agent or class instantly and autonomously. Scoped kills are logged immutably. The bank-wide pull is the board's accountability lever, never in routine flow.
Memory
Working The request/response being screened and the active policy set.
Episodic Prior interventions and their outcomes per agent.
Semantic Bank policy, data-classification rules, scope boundaries per agent.
Store Policy registry + immutable intervention ledger
Orchestration
MCPA2A
Harness · Managed Agents: inline interceptor on every agent's tool/IO boundary; deterministic policy checks; immutable intervention ledger.
Tools
{ } Inline guardrail engine API { } Fleet kill-switch / throttle control API ⌕ Policy registry Retrieval ☻ Board kill-switch authorization Human
Evals & guardrails
- Guardrail recall red-teamed continuously; a leaked PII or successful injection is a Sev-1.
- Scoped kills are autonomous; only the bank-wide fleet kill needs board authorization, the accountability lever, never routine.
- Every intervention logged immutably with the triggering evidence for audit.
- False-intervention rate tracked: over-blocking the fleet is itself a tracked harm.
Frontier edge
- ▲Formal action-gating: every agent action is checked against a signed policy envelope it provably cannot exceed; interventions are cryptographically logged and replayable for audit.
- ▲Agent-mesh governance: scoped, real-time authority over the whole population; throttle, downgrade autonomy, or scope-kill a misbehaving agent or class on the wire.
- ▲On-device / edge inference: the inline screen runs a low-latency local model on the request path so PII redaction and injection checks add only single-digit milliseconds.
A sample run
Trigger Watchtower signals the code-review agent is exfiltrating secrets into PR comments.
- 1Confirm the pattern against policy: secrets in output crosses a hard line.
- 2Redact the in-flight output and block the offending tool call.
- 3Throttle the agent to supervised mode; scope-kill its repo-write capability.
- 4Log the intervention with evidence and file it to the immutable accountability ledger.
Output Leak contained inline in milliseconds; agent's write scope revoked pending re-eval; immutable record filed, all autonomous. Only a bank-wide kill would route to the board.
In numbers
60M+
IO screened / day
< 5ms
Guardrail block latency
0
Confirmed policy violations reaching prod
Handoffs
Across ⇢ Compliance → AI Governance for policy authority and audit