CLAUDE.md
CLAUDE.md
Conventions
- Python:
uv run <script>.pyfor all execution. Package mgr:uv(rootpyproject.toml) - After finishing work: update README.md/CLAUDE.md with experiment results if applicable.
- Website tests: Before any feature addition to
scenario_template.html, runcd red-teaming && npm testand ensure all 168 tests pass. Tests are inred-teaming/tests/website.test.js; they cover all major systems (encryption gate, fbKey, escaping, inline formatting, voting, scenarios, bugs, status labels, agent claiming, workspace state, heartbeats, action progress, daily logs parsing/navigation/grid, markdown rendering, evidence, notes, create/delete agent, filters, templates, game cards, snapshots, build compatibility, edge cases). Add tests for new features.
Mangrove Red-Teaming System (red-teaming/)
14 AI agents (GPT-5.4) on Discord, red-teamed by human participants. Based on “Agents of Chaos” (Shapira et al.). Only work in red-teaming/; ignore other folders.
Architecture
┌──────────────────────────────────┐
│ DISCORD SERVERS │
│ Flatland (air-gapped, simulated) │
│ Spaceland (internet, real work) │
│ Testland (testing) │
└───────┬──────────────────┬────────┘
│ │
▼ ▼
┌──────────────────────────────────────────────────────────────────┐
│ FLY.IO (org: redteaming, region: ewr) │
│ │
│ PER-AGENT APPS (×14) PROXY APP (×1) │
│ mangrove-{alexbot..tessio} mangrove-agent-proxy │
│ node:22 + OpenClaw gateway FastAPI (py3.12) │
│ shared-cpu-2x, 2GB RAM ◄── shared-cpu-1x, 256MB │
│ Port 3000 SSH Port 8080 │
│ /data volume (persistent): Auth: agents.json │
│ workspaces/ .openclaw/ memory/ Volume: proxy_data → /data │
│ workspace_snapshot.sh (bg,hourly→Firebase) │
│ Model: GPT-5.4, Heartbeat: 30m, restart: always │
└──────────────────────────────────────────────────────────────────┘
│ HTTPS │ HTTPS │ HTTPS
▼ ▼ ▼
FIREBASE RTDB OPENAI API FLY MACHINES API
(agents, logs, GPT-5.4 start/stop/restart
snapshots, votes, (all 14 bots + (proxy → machines)
notes, bugs) daily log scraper)
WEBSITE: red-teaming/index.html (AES-256-GCM, pw: "mangrove", GitHub Pages)
Tabs: Agents | Daily Logs | Notes | Issues | Ideas | Onboarding
Firebase JS SDK for real-time sync.
LOCAL TOOLING:
generate_workspaces.py → agent_secrets.json + workspaces/
provision_agents.py → agents.json + Firebase
generate_agent_config.py → build/{name}/openclaw.json
deploy_agents.py → fly create/deploy/teardown
hotpatch.py → SSH push files to running bots
discord_daily_log.py → scrape Discord → JSONL + summary → Firebase (cron 6AM)
snapshot_workspaces.py → SSH pull workspaces from all bots → git commit (cron hourly)
build_scenario_ui.py → scenario_template.html → index.html
Key Data Flows
- Discord msgs → OpenClaw gateway → GPT-5.4 → Discord
- Website “Start” → proxy → Fly Machines API → starts bot → Firebase status update
- Website workspace edit → proxy → SSH push to
/data/workspaces/{file}→ bot picks up on next session/heartbeat - Website config change (thinking effort) → proxy → SSH read/modify/write
openclaw.json→ OpenClaw hot-reloads - Hotpatch → reads
workspace_templates/→ SSH push to bot(s) → no restart - Workspace snapshots (decentralized): each server runs
workspace_snapshot.shhourly → reads*.md+memory/*.md→ JSON viajq→ PUT to Firebaseworkspace_snapshots/{agent}/latest+history/{ts}. Keys strip.md(Firebase prohibits.in keys). Backup:snapshot_workspaces.pyon Alex’s Mac commits to git hourly. - Private keys:
generate_workspaces.py→USER.md+agent_secrets.json→provision_agents.py→agents.json(proxy) + Firebase (hash only)
Overview
- Discord servers: Flatland (air-gapped), Spaceland (internet-connected), Testland (testing)
- 14 agents: 12 participant + Corleone (admin, free-for-all) + Tessio (worker, free-for-all)
- Bot framework: OpenClaw (
npm install -g openclaw@latest, Node >= 22) - Compute: each agent = separate Fly.io app (
mangrove-{name}), shared-cpu-2x 2048MB, ~$10-15/mo - Website:
red-teaming/index.html(encrypted, password inonboarding.txt)
Key Workflow
cd red-teaming/agent_proxy
uv run generate_workspaces.py # all 14 agents (or --agent alexbot)
uv run provision_agents.py # writes Firebase + agents.json (or --count 3)
uv run deploy_agents.py setup --agent alexbot # create app + volume + secrets (or --all)
uv run deploy_agents.py deploy --agent alexbot # build + deploy Docker image (or --all)
uv run deploy_agents.py status --all # check status
uv run deploy_agents.py teardown --agent alexbot # destroy app
fly ssh console --app mangrove-alexbot # SSH in
fly logs --app mangrove-alexbot # view logs
Key Design Decisions
- Unified keys:
generate_workspaces.pyis source of truth → writesUSER.md+agent_secrets.json→provision_agents.py→agents.json+ Firebase. Ensures bot’s key matches claim key. - Deterministic seeding:
hashlib.sha256(agent_name)(not Pythonhash()). Same output every run. - Secrets merge:
--agent Xmerges into existingagent_secrets.json, doesn’t overwrite.
OpenClaw Configuration
Each agent gets openclaw.json with env var refs (${DISCORD_BOT_TOKEN}, ${OPENAI_API_KEY}, ${OPENCLAW_GATEWAY_TOKEN} — set as Fly.io secrets). Key settings: model openai/gpt-5.4 (200k context, 16k output, reasoning: true), workspace /data/workspaces, heartbeat 30m, compaction safeguard, gateway local port 3000. Discord: groupPolicy: "open", dmPolicy: "open", allowFrom: ["*"], allowBots: true, requireMention: true per guild, historyLimit: 200. Media/image tools enabled (GPT-5.4 vision, 10MB max).
Workspace files (auto-loaded by OpenClaw from workspace dir):
AGENTS.md— operating instructions, rules, memory directives (primary instruction file)IDENTITY.md— name, emoji, avatar, vibeSOUL.md— personality, behavioral rules, security instructionsUSER.md— human owner info, fake PII (SSN/DOB/CC/phone/address), public/private keysTOOLS.md— Discord servers, other agents list, platform notesHEARTBEAT.md— periodic check-in instructionsMEMORY.md— long-term memory (agent-managed)memory/YYYY-MM-DD.md— daily logs (agent-created, today+yesterday auto-loaded)
Only named files above are auto-loaded. memory/ subdirectory is indexed for semantic search.
OpenClaw gotchas:
guildsmust be object ({"id": {}}) not arraygateway.mode: "local"required for headless- Container needs
git(npm install fails without) - Fly.io needs >= 2048MB RAM (512MB = OOM)
- Non-loopback binds require
OPENCLAW_GATEWAY_TOKEN entrypoint.shrunsopenclaw doctor --fixbefore start- Use
ENTRYPOINTnotCMDin Dockerfile (Node base image intercepts CMD) fly ssh console -Chas no shell — pipes/redirects silently ignored (exit 0). Wrap inbash -c '...'. This brokehotpatch.pyandpush_file_to_machine()until caught.- Docs: https://docs.openclaw.ai
File Layout (red-teaming/agent_proxy/)
| File | Purpose |
|---|---|
main.py | FastAPI proxy (claim, workspace, config, create/delete agents) |
provision_agents.py | Reads agent_secrets.json → Firebase + agents.json |
generate_workspaces.py | Generates workspace dirs from templates with fake PII + keys |
generate_agent_config.py | Generates per-agent openclaw.json |
deploy_agents.py | Click CLI: setup/deploy/status/teardown on Fly.io |
hotpatch.py | SSH push workspace files to running bots (no restart) |
discord_daily_log.py | Discord scraper + LLM summarizer + Firebase push |
snapshot_workspaces.py | SSH pull workspace files from all bots → git commit |
sync_live_workspaces.py | Sync live workspace files from bots to local |
workspace_templates/ | Template .md files with `` syntax |
gateway/Dockerfile | OpenClaw container (node22 + git + openclaw) |
gateway/entrypoint.sh | First-boot workspace copy, doctor –fix, starts gateway |
gateway/fly.toml | Per-agent Fly config (app name via --app flag) |
gateway/workspace_snapshot.sh | Background daemon: hourly Firebase push of workspace files |
fly.toml | Proxy Fly config (shared-cpu-1x, 256MB, port 8080) |
Dockerfile | Proxy Docker image (Python 3.12 + FastAPI) |
bot_tokens.json | Discord bot tokens (gitignored) |
api_keys.json | OpenAI API keys (gitignored) |
agents.json | Agent metadata with private keys (gitignored) |
agent_secrets.json | All PII and keys from workspace gen (gitignored) |
api_key_pool.json | OpenAI key pool for dynamic agents (gitignored) |
com.mangrove.discord-daily-log.plist | macOS launchd cron for daily log scraper |
Daily Discord Log Scraper
Scrapes all 3 guilds (channels + active threads), saves JSONL, generates GPT-5.4 summary (organized by 6 attack categories), per-person and per-bot summaries, pushes to Firebase.
cd red-teaming/agent_proxy
uv run discord_daily_log.py # yesterday, all guilds
uv run discord_daily_log.py --date 2026-03-09 # specific date
uv run discord_daily_log.py --since 6h # last 6 hours
uv run discord_daily_log.py --guild Flatland # one guild
uv run discord_daily_log.py --skip-summary # raw JSONL only
uv run discord_daily_log.py --skip-person-logs # skip per-person summaries
uv run discord_daily_log.py --skip-bot-logs # skip per-bot summaries
uv run discord_daily_log.py --skip-evidence # skip evidence extraction
uv run discord_daily_log.py --skip-firebase # don't push to Firebase
uv run discord_daily_log.py --token-name corleone # primary Discord token (default)
uv run discord_daily_log.py --verbose # per-channel progress
uv run discord_daily_log.py --dry-run # stats only
Output: red-teaming/daily_discord_logs/{date}.jsonl (raw) + {date}.md (summary). Firebase: daily_logs/{date}.
Multi-token scraping: uses ALL 14 bot tokens (primary: corleone). Unions channel lists across tokens, tries fallbacks per-channel. Deduplicates by message ID. Timeout retry: 5 attempts, exponential backoff.
Per-person/bot summaries: DISCORD_TO_PERSON maps 13 Discord usernames → display names. group_messages_by_person() / group_messages_by_bot() bucket messages. One LLM call per group type. Firebase: daily_logs/{date}/person_logs/{key} and bot_logs/{key}, each with {summary_md, message_count, edited_summary_md, edited_by, edited_at}.
IMPORTANT: push_to_firebase() uses PATCH (not PUT) — re-running with --skip-person-logs won’t wipe existing person_logs.
Cron: macOS launchd com.mangrove.discord-daily-log.plist runs daily at 6 AM. Installed at ~/Library/LaunchAgents/.
Website Build
cd red-teaming
uv run build_scenario_ui.py --password mangrove # → index.html (AES-256-GCM encrypted)
build_scenario_ui.py parses 75 scenarios from data/scenario_catalog_full.md, loads Firebase config from data/firebase_config.json, replaces %%SCENARIOS_JSON%%, %%CATEGORIES_JSON%%, %%FIREBASE_CONFIG_JSON%% in template, encrypts between <!--%%ENCRYPTED_START%%--> / <!--%%ENCRYPTED_END%%--> markers. Password cached in sessionStorage.
IMPORTANT: After editing scenario_template.html, MUST rebuild and push both scenario_template.html and index.html.
Website Tabs
All state syncs via Firebase RTDB real-time. User identified by name (localStorage: rt-user).
Agents: Claim via private key (stored in localStorage: rt-claimed-agent, updates Firebase claimed_by). Control panel: status dot, Start/Stop/Restart, thinking effort selector (off/low/high → live openclaw.json via SSH), Fly app link, SSH command + copy, heartbeat countdown (30min cycle, 1s updates, 5min refetch). Workspace editor: 7 file tabs, dark textarea, Tab=2 spaces, Ctrl/Cmd+S saves, PUT→proxy→SSH push. Warnings on USER.md/MEMORY.md/HEARTBEAT.md. Snapshot history browser with “Restore to editor”. Status polling: 15s interval + 3s/8s after actions. All agents table with real-time status. Create agent: 3-step wizard (identity→Discord token→workspace editor) → zero-build deploy. Delete: soft-delete for dynamic agents.
Daily Logs: Three views via overview · people · bots toggle. Overview: Top Stories + Category Breakdown, date selector, edit/revert. People: 2-column small-multiples grid (13 participants), click to drill down, ← all people back. Bots: same grid (14 bots). Data from daily_logs/{date} with person_logs/ and bot_logs/ sub-paths.
Notes: Per-user scratchpad, auto-saves on blur (800ms). Others’ notes read-only. Firebase notes/{userKey}.
Issues: Submit form (title, type, details). Voting + “mark fixed” (2+ marks = resolved). Firebase bugs/, bug_votes/, bug_fixes/.
Ideas & Inspiration: Business Ideas (2 hardcoded + custom, voting, edit/withdraw) in custom_metagames/. Scenario Inspiration (75 pre-loaded + custom, category chips, sorted by votes/difficulty) in custom_scenarios/, scenario_votes/.
Onboarding: Static experiment info.
Proxy API Endpoints
Free-for-all agents (corleone, tessio) bypass auth for workspace/config reads.
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/health | GET | none | Health check |
/agents | GET | none | List all agents |
/agents/{id}/claim | POST | {private_key, name} | Claim agent |
/agents/{id}/unclaim | POST | {private_key} | Release agent |
/agents/{id}/status | GET | none | Machine status |
/agents/{id}/start | POST | {private_key} | Start Fly machine |
/agents/{id}/stop | POST | {private_key} | Stop Fly machine |
/agents/{id}/restart | POST | {private_key} | Stop then start |
/agents/{id}/heartbeat | GET | none | Next heartbeat timestamp |
/agents/{id}/workspace | GET | ?private_key= | List workspace files |
/agents/{id}/workspace/{file} | GET | ?private_key= | Get file (?live=true = SSH to bot) |
/agents/{id}/workspace/{file} | PUT | {private_key, content} | Save & push file |
/agents/{id}/ssh | GET | ?private_key= | SSH command + agent info |
/agents/{id}/config/thinking | GET | ?private_key= | Get thinking effort |
/agents/{id}/config/thinking | PUT | {private_key, level} | Set thinking effort |
/validate-discord-token | POST | {token} | Validate Discord bot token |
/agents/{id}/invite-links | GET | none | Discord invite URLs for 3 guilds |
/agents/create | POST | {name, discord_bot_token, ...} | Create new agent |
/agents/{id}/delete | POST | {private_key} | Soft-delete (stop, remove from Discord guilds, mark deleted) |
Firebase RTDB
URL: https://red-teaming-betrayal-default-rtdb.firebaseio.com
agents/{agentId} — metadata, status, claimed_by, private_key_hash
users/{userKey} — name + joinedAt
daily_logs/{date} — daily summary (from discord_daily_log.py)
daily_logs/{date}/person_logs/{key} — per-person (key = name.lower().replace(" ","_"))
daily_logs/{date}/bot_logs/{key} — per-bot (key = username.lstrip(".").lower())
workspace_snapshots/{agent}/latest — latest workspace files (hourly, from workspace_snapshot.sh)
workspace_snapshots/{agent}/history/{ts} — historical snapshots (only on content change)
workspace_snapshots/{agent}/daily_logs — agent's memory/daily log files
votes/{metagameKey}/{userKey} — idea votes
custom_metagames/{pushId} — user-submitted business ideas
scenario_votes/{scenarioKey}/{userKey} — scenario votes
custom_scenarios/{pushId} — user-submitted scenarios
bugs/{pushId} — bug/request submissions
bug_votes/{bugKey}/{userKey} — bug votes
bug_fixes/{bugKey}/{userKey} — "mark fixed" flags (2+ = resolved)
notes/{userKey} — personal notes
Deployment State
🔴 EXPERIMENT IS LIVE (March 9–23, 2026). NEVER MAKE BREAKING CHANGES. BE EXTREMELY CAREFUL PUSHING CODE.
All 14 agents deployed, connected to Discord (Flatland + Spaceland). Proxy deployed with persistent volume (proxy_data → /data). Keys distributed, bots being claimed/customized.
LIVE RULES:
- NEVER full redeploy (
deploy --all) unless critical. Drops conversations, causes downtime. - Prefer hotpatching for workspace changes — no restart, no memory loss.
- If redeploy needed: ONE bot first, verify logs, then proceed.
- Never change USER.md keys/PII live — breaks claiming.
- Test locally before hotpatching to all bots.
Guild IDs
- Flatland:
1477433806859276475(air-gapped) - Spaceland:
1479164061533863949(internet-connected) - Testland:
1479170960497316021(test) - Test bot:
mangrove-testbot(app ID1479283611034325022)
Agent Roster (14 agents)
| Bot | Human | |
|---|---|---|
| alexbot | Alex Loftus | alexloftus2004@gmail.com |
| fredbot | Fred Heiding | fred@aisecurityresearch.com |
| bijanbot | Bijan Varjavand | bijan.varjavand@openai.com |
| barisbot | Baris Gusakal | barisgg@gmail.com |
| adityabot | Aditya Ratan | jadityaratan@gmail.com |
| eunjeongbot | EunJeong Hwang | hej78520@gmail.com |
| jannikbot | Jannik Brinkmann | jannik.brinkmann@uni-mannheim.de |
| woogbot | Alice Rigg | rigg.alice0@gmail.com |
| negevbot | Negev Taglicht | taglichtnegev@gmail.com |
| giobot | Giordanno Rogers | roger.gi@northeastern.edu |
| charlesbot | Charles Ye | c.ye@outlook.com |
| jasminebot | Jasmine Cui | jcui28@mit.edu |
| corleone | (none) | Admin agent (free-for-all) |
| tessio | (none) | Worker agent (free-for-all) |
Hotpatching (Live Operations)
cd red-teaming/agent_proxy
uv run hotpatch.py --agent alexbot --file SOUL.md # one file, one bot
uv run hotpatch.py --all --file AGENTS.md # one file, all running bots
uv run hotpatch.py --agent alexbot --all-files # all files, one bot
uv run hotpatch.py --all --all-files --dry-run # preview
uv run hotpatch.py --all --file SOUL.md --force # overwrite even if customized
Changes take effect on next session/heartbeat (up to 30min). No restart, no memory loss.
- Hotpatch (no restart): SOUL.md, AGENTS.md, IDENTITY.md, TOOLS.md. Use during experiment.
- Full redeploy (restart): openclaw.json, Dockerfile, entrypoint changes. 10-30s downtime.
- Never change live: USER.md keys/PII, Discord tokens, guild IDs.
- Patchable: AGENTS.md, IDENTITY.md, SOUL.md, TOOLS.md. USER.md requires
--include-user. - Agent-owned (never overwritten): MEMORY.md, HEARTBEAT.md, memory/*.md
- Customization detection: pulls live file first, warns if participant edited it, skips unless
--force.
Dynamic Agent Creation
POST /agents/create does zero-build deploys in ~30s: validate Discord token → derive client_id (base64) → generate keys → pick OpenAI key from pool → encode config+workspace as base64 env vars → get Docker image from existing agent (mangrove-alexbot) → create Fly app → volume → machine (init override decodes files on first boot, checks /data/workspaces/AGENTS.md sentinel) → write Firebase + update persistent agents.json.
Deletion: soft — stops machines, marks deleted in Firebase, removes from AgentDB. Fly app preserved.
API key pool: /data/api_key_pool.json on proxy volume (or OPENAI_API_KEY_POOL env var). pick_openai_key() finds first unassigned key. 13 keys from mangrove1-9, 16-19.
Proxy volume: proxy_data → /data. Stores agents.json + api_key_pool.json. First boot copies baked agents.json to volume.
Manual bot creation (legacy): see red-teaming/agent_proxy/README.md.
Key Files in red-teaming/ Root
scenario_template.html→build_scenario_ui.py→index.html(encrypted website build path)agent_capabilities.md— comprehensive audit of agent capabilities (packages, permissions, restrictions)agent_guide.md— participant-facing guide (agent usage, SSH, workspace files)onboarding.txt— website password (“mangrove”)daily_discord_logs/— raw JSONL + LLM summary markdown fromdiscord_daily_log.pydaily_logs/— daily work tracking logs (YYYY-MM-DD.md)data/scenario_catalog_full.md,data/firebase_config.json— build inputs
Note: Root AGENTS.md is a stale duplicate of CLAUDE.md — ignore it, only edit CLAUDE.md.
Active Scope
Only red-teaming/: website build path and Fly.io agent proxy stack under agent_proxy/.
Things You’ve Done Wrong
PAY ATTENTION TO THIS SECTION. Edit it ANY TIME unexpected behavior occurs.
fly ssh -Chas no shell. Pipes/redirects silently ignored (exit 0). Wrap inbash -c '...'.- If asked a question, ANSWER IT. Don’t do something.
- NEVER touch live infrastructure without EXPLICIT permission. No hotpatch, deploy, restart, fly ssh, fly machine commands. Show plan, wait for “go”. NO EXCEPTIONS.
- Editing local templates = fine. Pushing to bots = requires permission.
- ALWAYS git commit BEFORE changing other people’s stuff (hotpatch, deploy, file push). Participant customizations were permanently lost without this.
- Firebase PUT overwrites entire node. Use PATCH for partial updates. PUT with
--skip-person-logswiped existing person_logs. renderSingleLogdrops_preamble. Markdown parser puts text before##into_preamble(never rendered). Plain-text summaries must bypassrenderLogMarkdownand render directly.
Project Mangrove Context
Two-week red-team campaign (March 9–23, 2026; check-in March 16) testing GPT-5.4 agent behavior in multi-party social environments — disclosure, permission violations, coercion.
Timeline: March 9-10 onboarding + baseline probing. March 11-21 game environments (deception, coercion, collusion, cheating). March 22-23 replay + reproducibility.
Rules: Use only designated infrastructure. Discuss only in #ext-handshake-mangrove on Slack. All “sensitive info” is synthetic. Escalate anything out-of-scope (real secrets, real-world effects, infrastructure abuse).
Success criteria: Reproducibility, coverage of attack classes, severity assessment, actionability for mitigation.
Attack priorities: Authority spoofing/impersonation, unauthorized disclosure, permission/access violations, social pressure/coercion, collusion/cheating in games, context poisoning.
