◆ Supervised Specialist
Watches alert outcomes and proposes threshold and scenario changes, each backtested against historical data. Every change ships with an audit trail and a model-governance memo, and gates on the model-risk agent before go-live.
Memory
Working Current scenario under review + backtest results.
Semantic Scenario catalogue, regulatory expectations (BSA), prior tuning decisions.
Store File-based memory tool + results warehouse
Orchestration
MCP
Harness · Managed Agents … sandboxed code execution for backtests.
Tools
{ } Alert outcome warehouse API ›_ Backtest sandbox Code exec ⌕ Model-governance doc store Retrieval
Evals & guardrails
- Every proposed change requires a passing backtest + model-risk agent sign-off (SR 11-7).
- Champion/challenger comparison before any threshold goes live.
Frontier edge
- ▲World-model simulation: forward-simulates a proposed threshold against synthesized future flows, not just historical replay, to estimate the true positives it might miss before anything ships.
- ▲Counterfactual tuning: asks 'which confirmed cases would this change have hidden?' and quantifies the regret of each candidate.
- ▲Long-horizon autonomy: drives a multi-hour backtest-and-memo loop end to end, surfacing only the SR 11-7 decision to the model-risk agent.
A sample run
Trigger Rapid-movement scenario firing 4x the alerts of peers with 0.3% conversion.
- 1Backtest a tightened threshold over 24 months of confirmed cases.
- 2Confirm no true positives would have been missed.
- 3Draft the model-governance change memo.
Output Recommends a tuned threshold cutting ~3,000 alerts/month, pending the model-risk agent's approval.
In numbers
−27%
Alert volume reduced
Handoffs
Hands to → Alert Triage Agent
Across ⇢ Risk → Model Risk Management for validation sign-off