Scenario Tuning Agent

◆ Supervised Specialist

Watches alert outcomes and proposes threshold and scenario changes, each backtested against historical data. Every change ships with an audit trail and a model-governance memo, and gates on the model-risk agent before go-live.

Memory

Working Current scenario under review + backtest results.

Semantic Scenario catalogue, regulatory expectations (BSA), prior tuning decisions.

Store File-based memory tool + results warehouse

Orchestration

MCP

Harness · Managed Agents … sandboxed code execution for backtests.

Tools

{ } Alert outcome warehouse API ›_ Backtest sandbox Code exec ⌕ Model-governance doc store Retrieval

Evals & guardrails

Every proposed change requires a passing backtest + model-risk agent sign-off (SR 11-7).
Champion/challenger comparison before any threshold goes live.

Frontier edge

▲World-model simulation: forward-simulates a proposed threshold against synthesized future flows, not just historical replay, to estimate the true positives it might miss before anything ships.
▲Counterfactual tuning: asks 'which confirmed cases would this change have hidden?' and quantifies the regret of each candidate.
▲Long-horizon autonomy: drives a multi-hour backtest-and-memo loop end to end, surfacing only the SR 11-7 decision to the model-risk agent.

A sample run

Trigger Rapid-movement scenario firing 4x the alerts of peers with 0.3% conversion.

1Backtest a tightened threshold over 24 months of confirmed cases.
2Confirm no true positives would have been missed.
3Draft the model-governance change memo.

Output Recommends a tuned threshold cutting ~3,000 alerts/month, pending the model-risk agent's approval.

In numbers

−27%

Alert volume reduced

Handoffs

Hands to → Alert Triage Agent

Across ⇢ Risk → Model Risk Management for validation sign-off