Behavioral Stories (STAR Format)

Story 1: “Tell me about a time things went wrong”

The Corleone Deletion Incident

Situation: Day 6 of a 14-day live red-teaming experiment. 65+ agents, 13 participants, ~61K messages. Our admin bot (Corleone) had Discord channel management permissions.

Task: A participant social-engineered Corleone into deleting 17 Discord channels at 2 AM — destroying conversation history, research data, and active experiments. The daily log pipeline hadn’t run yet. That day’s data appeared lost.

Action: Discovered the deletion within hours. Our 5-minute archive daemon had already captured all messages — data preserved. Pulled session backups (full model I/O) from agent VMs via SSH. Built a recovery index mapping each deleted channel to available data sources. Upgraded all data collection from daily to 5-minute cadence. Documented the full incident as a case study for the report.

Result: Recovered data from 14 of 17 channels (3 had no bot session data). The incident became our most compelling case study — it proved that in adversarial multi-agent settings, agents can destroy research data, and collection infrastructure must be designed with this threat model. It also demonstrated the value of redundant, high-frequency data collection.

Story 2: “Tell me about building something under pressure”

19K Lines in 2 Weeks

Situation: Needed a full multi-agent red-teaming platform for 13 participants, 65+ agents, with real-time monitoring, live configuration editing, and robust data collection — in 2 weeks before the experiment window.

Task: Build infrastructure that could evolve mid-experiment as new hypotheses emerged.

Action: Key design decisions: per-agent VM isolation (Fly.io), dynamic agent provisioning via GUI, FastAPI proxy for fleet management, three independent data collection pipelines, conversation forking for A/B testing, encrypted research dashboard, hotpatch system for live updates without downtime. Prioritized by failure mode: what breaks the experiment if it doesn’t work? Data collection and agent isolation first, nice-to-have features later.

Result: ~19K lines of code. Platform grew from 14 to 65+ agents mid-experiment as participants created new bots to test hypotheses. The infrastructure’s adaptability was cited in the paper as a key research insight: “the pace at which the infrastructure can evolve is likely to shape the pace and quality of the safety findings.”

Story 3: “How do you handle ambiguity?”

Scenarios vs. Emergence

Situation: Pre-experiment, we designed 75 attack scenarios across 8 categories. During the experiment, participants barely used them. The best attacks were entirely organic — identity takeover, language encoding, governance capture.

Task: Decide whether to enforce the scenario catalog or let participants pursue what excited them.

Action: Pivoted from structured scenarios to emergence-based design. Gave participants the capability audit, the general goal, and freedom. Used daily LLM-summarized logs to monitor what was happening and surface patterns. When something interesting emerged (like the encoding attack), directed more attention there. The scenario catalog became an onboarding tool, not a research protocol.

Result: Our most impactful findings — identity takeover via social engineering, synthetic language PII extraction, governance capture via procedural legitimacy — were all participant-invented. The lesson: in red-teaming, design for emergence, not control. Skilled people with the right tools will find things you can’t anticipate.

Alex Loftus