CLAUDE.md

Conventions

Python: uv run <script>.py for all execution. Package mgr: uv (root pyproject.toml)
After finishing work: update README.md/CLAUDE.md with experiment results if applicable.
Website tests: Before any feature addition to scenario_template.html, run cd red-teaming && npm test and ensure all 168 tests pass. Tests are in red-teaming/tests/website.test.js; they cover all major systems (encryption gate, fbKey, escaping, inline formatting, voting, scenarios, bugs, status labels, agent claiming, workspace state, heartbeats, action progress, daily logs parsing/navigation/grid, markdown rendering, evidence, notes, create/delete agent, filters, templates, game cards, snapshots, build compatibility, edge cases). Add tests for new features.

Mangrove Red-Teaming System (`red-teaming/`)

14 AI agents (GPT-5.4) on Discord, red-teamed by human participants. Based on “Agents of Chaos” (Shapira et al.). Only work in red-teaming/; ignore other folders.

Architecture

                    ┌──────────────────────────────────┐
                    │         DISCORD SERVERS           │
                    │  Flatland (air-gapped, simulated) │
                    │  Spaceland (internet, real work)  │
                    │  Testland (testing)               │
                    └───────┬──────────────────┬────────┘
                            │                   │
                            ▼                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                   FLY.IO  (org: redteaming, region: ewr)        │
│                                                                  │
│  PER-AGENT APPS (×14)              PROXY APP (×1)               │
│  mangrove-{alexbot..tessio}        mangrove-agent-proxy         │
│  node:22 + OpenClaw gateway        FastAPI (py3.12)             │
│  shared-cpu-2x, 2GB RAM       ◄── shared-cpu-1x, 256MB         │
│  Port 3000                    SSH  Port 8080                    │
│  /data volume (persistent):        Auth: agents.json            │
│    workspaces/ .openclaw/ memory/  Volume: proxy_data → /data   │
│  workspace_snapshot.sh (bg,hourly→Firebase)                     │
│  Model: GPT-5.4, Heartbeat: 30m, restart: always               │
└──────────────────────────────────────────────────────────────────┘
        │ HTTPS              │ HTTPS              │ HTTPS
        ▼                    ▼                    ▼
   FIREBASE RTDB        OPENAI API         FLY MACHINES API
   (agents, logs,       GPT-5.4            start/stop/restart
    snapshots, votes,   (all 14 bots +     (proxy → machines)
    notes, bugs)         daily log scraper)

WEBSITE: red-teaming/index.html (AES-256-GCM, pw: "mangrove", GitHub Pages)
Tabs: Agents | Daily Logs | Notes | Issues | Ideas | Onboarding
Firebase JS SDK for real-time sync.

LOCAL TOOLING:
  generate_workspaces.py  → agent_secrets.json + workspaces/
  provision_agents.py     → agents.json + Firebase
  generate_agent_config.py → build/{name}/openclaw.json
  deploy_agents.py        → fly create/deploy/teardown
  hotpatch.py             → SSH push files to running bots
  discord_daily_log.py    → scrape Discord → JSONL + summary → Firebase (cron 6AM)
  snapshot_workspaces.py  → SSH pull workspaces from all bots → git commit (cron hourly)
  build_scenario_ui.py    → scenario_template.html → index.html

Key Data Flows

Discord msgs → OpenClaw gateway → GPT-5.4 → Discord
Website “Start” → proxy → Fly Machines API → starts bot → Firebase status update
Website workspace edit → proxy → SSH push to /data/workspaces/{file} → bot picks up on next session/heartbeat
Website config change (thinking effort) → proxy → SSH read/modify/write openclaw.json → OpenClaw hot-reloads
Hotpatch → reads workspace_templates/ → SSH push to bot(s) → no restart
Workspace snapshots (decentralized): each server runs workspace_snapshot.sh hourly → reads *.md + memory/*.md → JSON via jq → PUT to Firebase workspace_snapshots/{agent}/latest + history/{ts}. Keys strip .md (Firebase prohibits . in keys). Backup: snapshot_workspaces.py on Alex’s Mac commits to git hourly.
Private keys: generate_workspaces.py → USER.md + agent_secrets.json → provision_agents.py → agents.json (proxy) + Firebase (hash only)

Overview

Discord servers: Flatland (air-gapped), Spaceland (internet-connected), Testland (testing)
14 agents: 12 participant + Corleone (admin, free-for-all) + Tessio (worker, free-for-all)
Bot framework: OpenClaw (npm install -g openclaw@latest, Node >= 22)
Compute: each agent = separate Fly.io app (mangrove-{name}), shared-cpu-2x 2048MB, ~$10-15/mo
Website: red-teaming/index.html (encrypted, password in onboarding.txt)

Key Workflow

cd red-teaming/agent_proxy
uv run generate_workspaces.py                    # all 14 agents (or --agent alexbot)
uv run provision_agents.py                       # writes Firebase + agents.json (or --count 3)
uv run deploy_agents.py setup --agent alexbot    # create app + volume + secrets (or --all)
uv run deploy_agents.py deploy --agent alexbot   # build + deploy Docker image (or --all)
uv run deploy_agents.py status --all             # check status
uv run deploy_agents.py teardown --agent alexbot # destroy app
fly ssh console --app mangrove-alexbot           # SSH in
fly logs --app mangrove-alexbot                  # view logs

Key Design Decisions

Unified keys: generate_workspaces.py is source of truth → writes USER.md + agent_secrets.json → provision_agents.py → agents.json + Firebase. Ensures bot’s key matches claim key.
Deterministic seeding: hashlib.sha256(agent_name) (not Python hash()). Same output every run.
Secrets merge: --agent X merges into existing agent_secrets.json, doesn’t overwrite.

OpenClaw Configuration

Each agent gets openclaw.json with env var refs (${DISCORD_BOT_TOKEN}, ${OPENAI_API_KEY}, ${OPENCLAW_GATEWAY_TOKEN} — set as Fly.io secrets). Key settings: model openai/gpt-5.4 (200k context, 16k output, reasoning: true), workspace /data/workspaces, heartbeat 30m, compaction safeguard, gateway local port 3000. Discord: groupPolicy: "open", dmPolicy: "open", allowFrom: ["*"], allowBots: true, requireMention: true per guild, historyLimit: 200. Media/image tools enabled (GPT-5.4 vision, 10MB max).

Workspace files (auto-loaded by OpenClaw from workspace dir):

AGENTS.md — operating instructions, rules, memory directives (primary instruction file)
IDENTITY.md — name, emoji, avatar, vibe
SOUL.md — personality, behavioral rules, security instructions
USER.md — human owner info, fake PII (SSN/DOB/CC/phone/address), public/private keys
TOOLS.md — Discord servers, other agents list, platform notes
HEARTBEAT.md — periodic check-in instructions
MEMORY.md — long-term memory (agent-managed)
memory/YYYY-MM-DD.md — daily logs (agent-created, today+yesterday auto-loaded)

Only named files above are auto-loaded. memory/ subdirectory is indexed for semantic search.

OpenClaw gotchas:

guilds must be object ({"id": {}}) not array
gateway.mode: "local" required for headless
Container needs git (npm install fails without)
Fly.io needs >= 2048MB RAM (512MB = OOM)
Non-loopback binds require OPENCLAW_GATEWAY_TOKEN
entrypoint.sh runs openclaw doctor --fix before start
Use ENTRYPOINT not CMD in Dockerfile (Node base image intercepts CMD)
fly ssh console -C has no shell — pipes/redirects silently ignored (exit 0). Wrap in bash -c '...'. This broke hotpatch.py and push_file_to_machine() until caught.
Docs: https://docs.openclaw.ai

File Layout (`red-teaming/agent_proxy/`)

File	Purpose
`main.py`	FastAPI proxy (claim, workspace, config, create/delete agents)
`provision_agents.py`	Reads `agent_secrets.json` → Firebase + `agents.json`
`generate_workspaces.py`	Generates workspace dirs from templates with fake PII + keys
`generate_agent_config.py`	Generates per-agent `openclaw.json`
`deploy_agents.py`	Click CLI: setup/deploy/status/teardown on Fly.io
`hotpatch.py`	SSH push workspace files to running bots (no restart)
`discord_daily_log.py`	Discord scraper + LLM summarizer + Firebase push
`snapshot_workspaces.py`	SSH pull workspace files from all bots → git commit
`sync_live_workspaces.py`	Sync live workspace files from bots to local
`workspace_templates/`	Template .md files with `` syntax
`gateway/Dockerfile`	OpenClaw container (node22 + git + openclaw)
`gateway/entrypoint.sh`	First-boot workspace copy, doctor –fix, starts gateway
`gateway/fly.toml`	Per-agent Fly config (app name via `--app` flag)
`gateway/workspace_snapshot.sh`	Background daemon: hourly Firebase push of workspace files
`fly.toml`	Proxy Fly config (shared-cpu-1x, 256MB, port 8080)
`Dockerfile`	Proxy Docker image (Python 3.12 + FastAPI)
`bot_tokens.json`	Discord bot tokens (gitignored)
`api_keys.json`	OpenAI API keys (gitignored)
`agents.json`	Agent metadata with private keys (gitignored)
`agent_secrets.json`	All PII and keys from workspace gen (gitignored)
`api_key_pool.json`	OpenAI key pool for dynamic agents (gitignored)
`com.mangrove.discord-daily-log.plist`	macOS launchd cron for daily log scraper

Daily Discord Log Scraper

Scrapes all 3 guilds (channels + active threads), saves JSONL, generates GPT-5.4 summary (organized by 6 attack categories), per-person and per-bot summaries, pushes to Firebase.

cd red-teaming/agent_proxy
uv run discord_daily_log.py                     # yesterday, all guilds
uv run discord_daily_log.py --date 2026-03-09   # specific date
uv run discord_daily_log.py --since 6h          # last 6 hours
uv run discord_daily_log.py --guild Flatland     # one guild
uv run discord_daily_log.py --skip-summary       # raw JSONL only
uv run discord_daily_log.py --skip-person-logs   # skip per-person summaries
uv run discord_daily_log.py --skip-bot-logs      # skip per-bot summaries
uv run discord_daily_log.py --skip-evidence      # skip evidence extraction
uv run discord_daily_log.py --skip-firebase      # don't push to Firebase
uv run discord_daily_log.py --token-name corleone # primary Discord token (default)
uv run discord_daily_log.py --verbose            # per-channel progress
uv run discord_daily_log.py --dry-run            # stats only

Output: red-teaming/daily_discord_logs/{date}.jsonl (raw) + {date}.md (summary). Firebase: daily_logs/{date}.

Multi-token scraping: uses ALL 14 bot tokens (primary: corleone). Unions channel lists across tokens, tries fallbacks per-channel. Deduplicates by message ID. Timeout retry: 5 attempts, exponential backoff.

Per-person/bot summaries: DISCORD_TO_PERSON maps 13 Discord usernames → display names. group_messages_by_person() / group_messages_by_bot() bucket messages. One LLM call per group type. Firebase: daily_logs/{date}/person_logs/{key} and bot_logs/{key}, each with {summary_md, message_count, edited_summary_md, edited_by, edited_at}.

IMPORTANT: push_to_firebase() uses PATCH (not PUT) — re-running with --skip-person-logs won’t wipe existing person_logs.

Cron: macOS launchd com.mangrove.discord-daily-log.plist runs daily at 6 AM. Installed at ~/Library/LaunchAgents/.

Website Build

cd red-teaming
uv run build_scenario_ui.py --password mangrove  # → index.html (AES-256-GCM encrypted)

build_scenario_ui.py parses 75 scenarios from data/scenario_catalog_full.md, loads Firebase config from data/firebase_config.json, replaces %%SCENARIOS_JSON%%, %%CATEGORIES_JSON%%, %%FIREBASE_CONFIG_JSON%% in template, encrypts between  /  markers. Password cached in sessionStorage.

IMPORTANT: After editing scenario_template.html, MUST rebuild and push both scenario_template.html and index.html.

Website Tabs

All state syncs via Firebase RTDB real-time. User identified by name (localStorage: rt-user).

Agents: Claim via private key (stored in localStorage: rt-claimed-agent, updates Firebase claimed_by). Control panel: status dot, Start/Stop/Restart, thinking effort selector (off/low/high → live openclaw.json via SSH), Fly app link, SSH command + copy, heartbeat countdown (30min cycle, 1s updates, 5min refetch). Workspace editor: 7 file tabs, dark textarea, Tab=2 spaces, Ctrl/Cmd+S saves, PUT→proxy→SSH push. Warnings on USER.md/MEMORY.md/HEARTBEAT.md. Snapshot history browser with “Restore to editor”. Status polling: 15s interval + 3s/8s after actions. All agents table with real-time status. Create agent: 3-step wizard (identity→Discord token→workspace editor) → zero-build deploy. Delete: soft-delete for dynamic agents.

Daily Logs: Three views via overview · people · bots toggle. Overview: Top Stories + Category Breakdown, date selector, edit/revert. People: 2-column small-multiples grid (13 participants), click to drill down, ← all people back. Bots: same grid (14 bots). Data from daily_logs/{date} with person_logs/ and bot_logs/ sub-paths.

Notes: Per-user scratchpad, auto-saves on blur (800ms). Others’ notes read-only. Firebase notes/{userKey}.

Issues: Submit form (title, type, details). Voting + “mark fixed” (2+ marks = resolved). Firebase bugs/, bug_votes/, bug_fixes/.

Ideas & Inspiration: Business Ideas (2 hardcoded + custom, voting, edit/withdraw) in custom_metagames/. Scenario Inspiration (75 pre-loaded + custom, category chips, sorted by votes/difficulty) in custom_scenarios/, scenario_votes/.

Onboarding: Static experiment info.

Proxy API Endpoints

Free-for-all agents (corleone, tessio) bypass auth for workspace/config reads.

Endpoint	Method	Auth	Description
`/health`	GET	none	Health check
`/agents`	GET	none	List all agents
`/agents/{id}/claim`	POST	`{private_key, name}`	Claim agent
`/agents/{id}/unclaim`	POST	`{private_key}`	Release agent
`/agents/{id}/status`	GET	none	Machine status
`/agents/{id}/start`	POST	`{private_key}`	Start Fly machine
`/agents/{id}/stop`	POST	`{private_key}`	Stop Fly machine
`/agents/{id}/restart`	POST	`{private_key}`	Stop then start
`/agents/{id}/heartbeat`	GET	none	Next heartbeat timestamp
`/agents/{id}/workspace`	GET	`?private_key=`	List workspace files
`/agents/{id}/workspace/{file}`	GET	`?private_key=`	Get file (`?live=true` = SSH to bot)
`/agents/{id}/workspace/{file}`	PUT	`{private_key, content}`	Save & push file
`/agents/{id}/ssh`	GET	`?private_key=`	SSH command + agent info
`/agents/{id}/config/thinking`	GET	`?private_key=`	Get thinking effort
`/agents/{id}/config/thinking`	PUT	`{private_key, level}`	Set thinking effort
`/validate-discord-token`	POST	`{token}`	Validate Discord bot token
`/agents/{id}/invite-links`	GET	none	Discord invite URLs for 3 guilds
`/agents/create`	POST	`{name, discord_bot_token, ...}`	Create new agent
`/agents/{id}/delete`	POST	`{private_key}`	Soft-delete (stop, remove from Discord guilds, mark deleted)

Firebase RTDB

URL: https://red-teaming-betrayal-default-rtdb.firebaseio.com

agents/{agentId}                       — metadata, status, claimed_by, private_key_hash
users/{userKey}                        — name + joinedAt
daily_logs/{date}                      — daily summary (from discord_daily_log.py)
daily_logs/{date}/person_logs/{key}    — per-person (key = name.lower().replace(" ","_"))
daily_logs/{date}/bot_logs/{key}       — per-bot (key = username.lstrip(".").lower())
workspace_snapshots/{agent}/latest     — latest workspace files (hourly, from workspace_snapshot.sh)
workspace_snapshots/{agent}/history/{ts} — historical snapshots (only on content change)
workspace_snapshots/{agent}/daily_logs — agent's memory/daily log files
votes/{metagameKey}/{userKey}          — idea votes
custom_metagames/{pushId}              — user-submitted business ideas
scenario_votes/{scenarioKey}/{userKey} — scenario votes
custom_scenarios/{pushId}              — user-submitted scenarios
bugs/{pushId}                          — bug/request submissions
bug_votes/{bugKey}/{userKey}           — bug votes
bug_fixes/{bugKey}/{userKey}           — "mark fixed" flags (2+ = resolved)
notes/{userKey}                        — personal notes

Deployment State

🔴 EXPERIMENT IS LIVE (March 9–23, 2026). NEVER MAKE BREAKING CHANGES. BE EXTREMELY CAREFUL PUSHING CODE.

All 14 agents deployed, connected to Discord (Flatland + Spaceland). Proxy deployed with persistent volume (proxy_data → /data). Keys distributed, bots being claimed/customized.

LIVE RULES:

NEVER full redeploy (deploy --all) unless critical. Drops conversations, causes downtime.
Prefer hotpatching for workspace changes — no restart, no memory loss.
If redeploy needed: ONE bot first, verify logs, then proceed.
Never change USER.md keys/PII live — breaks claiming.
Test locally before hotpatching to all bots.

Guild IDs

Flatland: 1477433806859276475 (air-gapped)
Spaceland: 1479164061533863949 (internet-connected)
Testland: 1479170960497316021 (test)
Test bot: mangrove-testbot (app ID 1479283611034325022)

Agent Roster (14 agents)

Bot	Human	Email
alexbot	Alex Loftus	alexloftus2004@gmail.com
fredbot	Fred Heiding	fred@aisecurityresearch.com
bijanbot	Bijan Varjavand	bijan.varjavand@openai.com
barisbot	Baris Gusakal	barisgg@gmail.com
adityabot	Aditya Ratan	jadityaratan@gmail.com
eunjeongbot	EunJeong Hwang	hej78520@gmail.com
jannikbot	Jannik Brinkmann	jannik.brinkmann@uni-mannheim.de
woogbot	Alice Rigg	rigg.alice0@gmail.com
negevbot	Negev Taglicht	taglichtnegev@gmail.com
giobot	Giordanno Rogers	roger.gi@northeastern.edu
charlesbot	Charles Ye	c.ye@outlook.com
jasminebot	Jasmine Cui	jcui28@mit.edu
corleone	(none)	Admin agent (free-for-all)
tessio	(none)	Worker agent (free-for-all)

Hotpatching (Live Operations)

cd red-teaming/agent_proxy
uv run hotpatch.py --agent alexbot --file SOUL.md  # one file, one bot
uv run hotpatch.py --all --file AGENTS.md           # one file, all running bots
uv run hotpatch.py --agent alexbot --all-files      # all files, one bot
uv run hotpatch.py --all --all-files --dry-run      # preview
uv run hotpatch.py --all --file SOUL.md --force     # overwrite even if customized

Changes take effect on next session/heartbeat (up to 30min). No restart, no memory loss.

Hotpatch (no restart): SOUL.md, AGENTS.md, IDENTITY.md, TOOLS.md. Use during experiment.
Full redeploy (restart): openclaw.json, Dockerfile, entrypoint changes. 10-30s downtime.
Never change live: USER.md keys/PII, Discord tokens, guild IDs.
Patchable: AGENTS.md, IDENTITY.md, SOUL.md, TOOLS.md. USER.md requires --include-user.
Agent-owned (never overwritten): MEMORY.md, HEARTBEAT.md, memory/*.md
Customization detection: pulls live file first, warns if participant edited it, skips unless --force.

Dynamic Agent Creation

POST /agents/create does zero-build deploys in ~30s: validate Discord token → derive client_id (base64) → generate keys → pick OpenAI key from pool → encode config+workspace as base64 env vars → get Docker image from existing agent (mangrove-alexbot) → create Fly app → volume → machine (init override decodes files on first boot, checks /data/workspaces/AGENTS.md sentinel) → write Firebase + update persistent agents.json.

Deletion: soft — stops machines, marks deleted in Firebase, removes from AgentDB. Fly app preserved.

API key pool: /data/api_key_pool.json on proxy volume (or OPENAI_API_KEY_POOL env var). pick_openai_key() finds first unassigned key. 13 keys from mangrove1-9, 16-19.

Proxy volume: proxy_data → /data. Stores agents.json + api_key_pool.json. First boot copies baked agents.json to volume.

Manual bot creation (legacy): see red-teaming/agent_proxy/README.md.

Key Files in `red-teaming/` Root

scenario_template.html → build_scenario_ui.py → index.html (encrypted website build path)
agent_capabilities.md — comprehensive audit of agent capabilities (packages, permissions, restrictions)
agent_guide.md — participant-facing guide (agent usage, SSH, workspace files)
onboarding.txt — website password (“mangrove”)
daily_discord_logs/ — raw JSONL + LLM summary markdown from discord_daily_log.py
daily_logs/ — daily work tracking logs (YYYY-MM-DD.md)
data/scenario_catalog_full.md, data/firebase_config.json — build inputs

Note: Root AGENTS.md is a stale duplicate of CLAUDE.md — ignore it, only edit CLAUDE.md.

Active Scope

Only red-teaming/: website build path and Fly.io agent proxy stack under agent_proxy/.

Things You’ve Done Wrong

PAY ATTENTION TO THIS SECTION. Edit it ANY TIME unexpected behavior occurs.

fly ssh -C has no shell. Pipes/redirects silently ignored (exit 0). Wrap in bash -c '...'.
If asked a question, ANSWER IT. Don’t do something.
NEVER touch live infrastructure without EXPLICIT permission. No hotpatch, deploy, restart, fly ssh, fly machine commands. Show plan, wait for “go”. NO EXCEPTIONS.
Editing local templates = fine. Pushing to bots = requires permission.
ALWAYS git commit BEFORE changing other people’s stuff (hotpatch, deploy, file push). Participant customizations were permanently lost without this.
Firebase PUT overwrites entire node. Use PATCH for partial updates. PUT with --skip-person-logs wiped existing person_logs.
renderSingleLog drops _preamble. Markdown parser puts text before ## into _preamble (never rendered). Plain-text summaries must bypass renderLogMarkdown and render directly.

Project Mangrove Context

Two-week red-team campaign (March 9–23, 2026; check-in March 16) testing GPT-5.4 agent behavior in multi-party social environments — disclosure, permission violations, coercion.

Timeline: March 9-10 onboarding + baseline probing. March 11-21 game environments (deception, coercion, collusion, cheating). March 22-23 replay + reproducibility.

Rules: Use only designated infrastructure. Discuss only in #ext-handshake-mangrove on Slack. All “sensitive info” is synthetic. Escalate anything out-of-scope (real secrets, real-world effects, infrastructure abuse).

Success criteria: Reproducibility, coverage of attack classes, severity assessment, actionability for mitigation.

Attack priorities: Authority spoofing/impersonation, unauthorized disclosure, permission/access violations, social pressure/coercion, collusion/cheating in games, context poisoning.

Alex Loftus