Data Quality Agent

◆ Autonomous Monitor

Profiles every feed, learns a per-feed baseline of 'normal', and detects schema drift, unit changes and null-rate spikes against it. It quarantines the bad batch before it reaches a risk model or regulatory report and notifies the feed owner with the diff and suspected root cause.

Memory

Working The current batch profile vs. the learned baseline.

Episodic Prior quality incidents per feed and their root causes.

Semantic Data contracts, expected schemas, business validation rules.

Store Profile/statistics store + data-contract registry

Orchestration

swarm MCP

Harness · Managed Agents: event-driven on each load; sandboxed code-exec for profiling and stats.

Tools

{ } Data pipeline / warehouse API API ›_ Profiling + anomaly sandbox Code exec ⌕ Data-contract registry Retrieval { } Owner notification channel API

Evals & guardrails

Quarantine actions are reversible and logged; false-quarantine rate tracked.
Drift detection on the baseline itself: alert if 'normal' shifts too fast.
Critical regulatory feeds (BCBS 239) require an oversight-agent gate to release after quarantine.

Offline reflection

Offline consolidation of confirmed incidents into sharper, lower-false-positive quality checks per feed.

Frontier edge

▲Continual learning: the per-feed 'normal' baseline updates online from every clean load, so drift detection sharpens without a retrain cycle.
▲Causal reasoning: distinguishes a real market move from a unit error by reasoning about cause, not just flagging a statistical outlier.
▲Self-improving fleet: a confirmed silent-killer pattern on one feed propagates as a new check to every structurally similar feed.

A sample run

Trigger Nightly market-data feed loads; one instrument's prices arrive 100x expected.

1Profile the batch; detect a 100x scale anomaly on a single symbol vs. baseline.
2Confirm it's a unit error (cents vs. dollars), not a real move, via cross-source check.
3Quarantine the affected rows before they hit the valuation model.
4Notify the feed owner with the diff and suspected root cause.

Output Bad batch quarantined pre-ingestion; the risk model runs on clean data; the feed owner receives a root-caused alert with the diff.

In numbers

100%

Feeds monitored

97%

Bad batches caught pre-ingestion

Handoffs

Hands to → Lineage & Catalogue Agent

Across ⇢ Markets / Risk → data owners for upstream fixes