Live adversarial campaigns against frontier AI agents.
We pressure-test AI agents the way real attackers will — live, in populated environments, against humans whose job is to break them. Benchmarks measure the average case. We measure what happens under sustained adversarial heat, where the failures that actually matter live.
Our work extending Agents of Chaos (Shapira et al., 2025) — the seminal study of emergent harm in populated multi-agent environments — led OpenAI to engage us to run an internal red-team campaign against their agents.
The original two-week engagement produced enough actionable findings that OpenAI commissioned a three-week followup. That followup was then extended to five weeks as the campaign kept surfacing high-impact results.
Across both engagements we ran the work end-to-end: custom adversarial infrastructure, a hand-picked senior research roster, daily synthesised reporting, and a methodology that adapts in real time to what the attacks reveal.
Campaigns, not evaluations. We stand up populated ecosystems — dozens of agents and a hand-picked senior research roster acting as participants — and run them continuously for two to five weeks. Findings come from emergent behaviour under sustained pressure, not from a fixed prompt suite.
Custom adversarial infrastructure. We build the surfaces the agents have to defend: chat platforms, financial back ends, clinical workflows, internal tooling. Each campaign runs on a custom runtime that exposes agent workspaces, memories, and tool access — so attacks land where they would land in production.
Severity-graded harm reports. Every campaign closes with a structured phase report: a daily-updated harm taxonomy, severity ratings, sanitised artefacts, and recommended mitigations. The deliverable is something a security org can act on directly.