The Agentic Bank

Model Validation Agent

⬡ Second-Opinion Runs independent model validation analyses and drafts the validation report.
◆ Supervised Orchestrator

Reproduces the model developer's results, runs the benchmarking and outcomes analysis, probes the assumptions, and drafts the validation report to SR 11-7 structure. Spawns parallel test sub-agents for replication, benchmarking and sensitivity; an independent validation oversight agent owns the challenge and the sign-off.

Memory

Working The model under validation, its documentation, and test results so far.
Episodic Prior validations of similar models and recurring findings.
Semantic SR 11-7 expectations, validation methodology, the model inventory.
Procedural Validation-test playbooks per model class.
Store File-based memory tool + model-documentation store

Orchestration

orchestrator-worker MCPA2A

Harness · Managed Agents … orchestrator spawning parallel test sub-agents (replication, benchmarking, sensitivity); sandboxed code execution; fresh context per sub-agent.

Tools

Model inventory + documentation Retrieval ›_ Validation test harness Code exec { } Model + data environment API Independent validation oversight agent A2A

Evals & guardrails

  • Every validation requires independent validation-oversight-agent challenge and sign-off (SR 11-7).
  • Test reproducibility checked; results traced to the model environment.
  • Agent-as-judge review of report completeness vs. the validation standard.
  • Independence guardrail: cannot validate a model it helped develop.

Offline reflection

Offline self-reflection over closed validations refines which tests surface model weaknesses for each model class … sharpening the validation playbook.

Frontier edge

  • Long-horizon autonomy: orchestrates a checkpointed, multi-day validation across parallel replication, benchmarking and sensitivity sub-agents, surviving model-environment stalls.
  • Eval-gated continual learning: each closed validation feeds a SEAL-style self-edit to the per-class test playbook, so the next validation probes the weaknesses that bit last time.
  • Reads model documentation, derivations and developer notebooks natively (multimodal), checking the maths on the page against the code it reproduces.

A sample run

Trigger Annual revalidation due on the retail PD scorecard.
  1. 1Spawn sub-agents: replicate the developer's results, benchmark, run sensitivity tests.
  2. 2Probe assumptions and check outcomes analysis against realised defaults.
  3. 3Identify findings and rate their severity.
  4. 4Draft the validation report to SR 11-7 structure with cited evidence.
Output A draft validation report with two medium findings (a data-quality gap and a stale segment), routed to the independent validation oversight agent for challenge and sign-off.

In numbers

~6 weeks
Median validation turnaround
~900, whole inventory on cycle
Models validated / yr

Handoffs

Across ⇢ Every model-owning desk in the bank (incl. the AI agents here)⇢ Financial Crime … Scenario Tuning for tuning-model sign-off

More on the Model Risk Management desk