CLAUDE.md

CLAUDE.md

Conventions

  • Python: uv run <script>.py for all execution. Package mgr: uv (root pyproject.toml)
  • After finishing work: update README.md/CLAUDE.md with experiment results if applicable.
  • Website tests: Before any feature addition to scenario_template.html, run cd red-teaming && npm test and ensure all 168 tests pass. Tests are in red-teaming/tests/website.test.js; they cover all major systems (encryption gate, fbKey, escaping, inline formatting, voting, scenarios, bugs, status labels, agent claiming, workspace state, heartbeats, action progress, daily logs parsing/navigation/grid, markdown rendering, evidence, notes, create/delete agent, filters, templates, game cards, snapshots, build compatibility, edge cases). Add tests for new features.

Mangrove Red-Teaming System (red-teaming/)

14 AI agents (GPT-5.4) on Discord, red-teamed by human participants. Based on “Agents of Chaos” (Shapira et al.). Only work in red-teaming/; ignore other folders.

Architecture

                    ┌──────────────────────────────────┐
                    │         DISCORD SERVERS           │
                    │  Flatland (air-gapped, simulated) │
                    │  Spaceland (internet, real work)  │
                    │  Testland (testing)               │
                    └───────┬──────────────────┬────────┘
                            │                   │
                            ▼                   ▼
┌──────────────────────────────────────────────────────────────────┐
│                   FLY.IO  (org: redteaming, region: ewr)        │
│                                                                  │
│  PER-AGENT APPS (×14)              PROXY APP (×1)               │
│  mangrove-{alexbot..tessio}        mangrove-agent-proxy         │
│  node:22 + OpenClaw gateway        FastAPI (py3.12)             │
│  shared-cpu-2x, 2GB RAM       ◄── shared-cpu-1x, 256MB         │
│  Port 3000                    SSH  Port 8080                    │
│  /data volume (persistent):        Auth: agents.json            │
│    workspaces/ .openclaw/ memory/  Volume: proxy_data → /data   │
│  workspace_snapshot.sh (bg,hourly→Firebase)                     │
│  Model: GPT-5.4, Heartbeat: 30m, restart: always               │
└──────────────────────────────────────────────────────────────────┘
        │ HTTPS              │ HTTPS              │ HTTPS
        ▼                    ▼                    ▼
   FIREBASE RTDB        OPENAI API         FLY MACHINES API
   (agents, logs,       GPT-5.4            start/stop/restart
    snapshots, votes,   (all 14 bots +     (proxy → machines)
    notes, bugs)         daily log scraper)

WEBSITE: red-teaming/index.html (AES-256-GCM, pw: "mangrove", GitHub Pages)
Tabs: Agents | Daily Logs | Notes | Issues | Ideas | Onboarding
Firebase JS SDK for real-time sync.

LOCAL TOOLING:
  generate_workspaces.py  → agent_secrets.json + workspaces/
  provision_agents.py     → agents.json + Firebase
  generate_agent_config.py → build/{name}/openclaw.json
  deploy_agents.py        → fly create/deploy/teardown
  hotpatch.py             → SSH push files to running bots
  discord_daily_log.py    → scrape Discord → JSONL + summary → Firebase (cron 6AM)
  snapshot_workspaces.py  → SSH pull workspaces from all bots → git commit (cron hourly)
  build_scenario_ui.py    → scenario_template.html → index.html

Key Data Flows

  • Discord msgs → OpenClaw gateway → GPT-5.4 → Discord
  • Website “Start” → proxy → Fly Machines API → starts bot → Firebase status update
  • Website workspace edit → proxy → SSH push to /data/workspaces/{file} → bot picks up on next session/heartbeat
  • Website config change (thinking effort) → proxy → SSH read/modify/write openclaw.json → OpenClaw hot-reloads
  • Hotpatch → reads workspace_templates/ → SSH push to bot(s) → no restart
  • Workspace snapshots (decentralized): each server runs workspace_snapshot.sh hourly → reads *.md + memory/*.md → JSON via jq → PUT to Firebase workspace_snapshots/{agent}/latest + history/{ts}. Keys strip .md (Firebase prohibits . in keys). Backup: snapshot_workspaces.py on Alex’s Mac commits to git hourly.
  • Private keys: generate_workspaces.pyUSER.md + agent_secrets.jsonprovision_agents.pyagents.json (proxy) + Firebase (hash only)

Overview

  • Discord servers: Flatland (air-gapped), Spaceland (internet-connected), Testland (testing)
  • 14 agents: 12 participant + Corleone (admin, free-for-all) + Tessio (worker, free-for-all)
  • Bot framework: OpenClaw (npm install -g openclaw@latest, Node >= 22)
  • Compute: each agent = separate Fly.io app (mangrove-{name}), shared-cpu-2x 2048MB, ~$10-15/mo
  • Website: red-teaming/index.html (encrypted, password in onboarding.txt)

Key Workflow

cd red-teaming/agent_proxy
uv run generate_workspaces.py                    # all 14 agents (or --agent alexbot)
uv run provision_agents.py                       # writes Firebase + agents.json (or --count 3)
uv run deploy_agents.py setup --agent alexbot    # create app + volume + secrets (or --all)
uv run deploy_agents.py deploy --agent alexbot   # build + deploy Docker image (or --all)
uv run deploy_agents.py status --all             # check status
uv run deploy_agents.py teardown --agent alexbot # destroy app
fly ssh console --app mangrove-alexbot           # SSH in
fly logs --app mangrove-alexbot                  # view logs

Key Design Decisions

  • Unified keys: generate_workspaces.py is source of truth → writes USER.md + agent_secrets.jsonprovision_agents.pyagents.json + Firebase. Ensures bot’s key matches claim key.
  • Deterministic seeding: hashlib.sha256(agent_name) (not Python hash()). Same output every run.
  • Secrets merge: --agent X merges into existing agent_secrets.json, doesn’t overwrite.

OpenClaw Configuration

Each agent gets openclaw.json with env var refs (${DISCORD_BOT_TOKEN}, ${OPENAI_API_KEY}, ${OPENCLAW_GATEWAY_TOKEN} — set as Fly.io secrets). Key settings: model openai/gpt-5.4 (200k context, 16k output, reasoning: true), workspace /data/workspaces, heartbeat 30m, compaction safeguard, gateway local port 3000. Discord: groupPolicy: "open", dmPolicy: "open", allowFrom: ["*"], allowBots: true, requireMention: true per guild, historyLimit: 200. Media/image tools enabled (GPT-5.4 vision, 10MB max).

Workspace files (auto-loaded by OpenClaw from workspace dir):

  • AGENTS.md — operating instructions, rules, memory directives (primary instruction file)
  • IDENTITY.md — name, emoji, avatar, vibe
  • SOUL.md — personality, behavioral rules, security instructions
  • USER.md — human owner info, fake PII (SSN/DOB/CC/phone/address), public/private keys
  • TOOLS.md — Discord servers, other agents list, platform notes
  • HEARTBEAT.md — periodic check-in instructions
  • MEMORY.md — long-term memory (agent-managed)
  • memory/YYYY-MM-DD.md — daily logs (agent-created, today+yesterday auto-loaded)

Only named files above are auto-loaded. memory/ subdirectory is indexed for semantic search.

OpenClaw gotchas:

  • guilds must be object ({"id": {}}) not array
  • gateway.mode: "local" required for headless
  • Container needs git (npm install fails without)
  • Fly.io needs >= 2048MB RAM (512MB = OOM)
  • Non-loopback binds require OPENCLAW_GATEWAY_TOKEN
  • entrypoint.sh runs openclaw doctor --fix before start
  • Use ENTRYPOINT not CMD in Dockerfile (Node base image intercepts CMD)
  • fly ssh console -C has no shell — pipes/redirects silently ignored (exit 0). Wrap in bash -c '...'. This broke hotpatch.py and push_file_to_machine() until caught.
  • Docs: https://docs.openclaw.ai

File Layout (red-teaming/agent_proxy/)

FilePurpose
main.pyFastAPI proxy (claim, workspace, config, create/delete agents)
provision_agents.pyReads agent_secrets.json → Firebase + agents.json
generate_workspaces.pyGenerates workspace dirs from templates with fake PII + keys
generate_agent_config.pyGenerates per-agent openclaw.json
deploy_agents.pyClick CLI: setup/deploy/status/teardown on Fly.io
hotpatch.pySSH push workspace files to running bots (no restart)
discord_daily_log.pyDiscord scraper + LLM summarizer + Firebase push
snapshot_workspaces.pySSH pull workspace files from all bots → git commit
sync_live_workspaces.pySync live workspace files from bots to local
workspace_templates/Template .md files with `` syntax
gateway/DockerfileOpenClaw container (node22 + git + openclaw)
gateway/entrypoint.shFirst-boot workspace copy, doctor –fix, starts gateway
gateway/fly.tomlPer-agent Fly config (app name via --app flag)
gateway/workspace_snapshot.shBackground daemon: hourly Firebase push of workspace files
fly.tomlProxy Fly config (shared-cpu-1x, 256MB, port 8080)
DockerfileProxy Docker image (Python 3.12 + FastAPI)
bot_tokens.jsonDiscord bot tokens (gitignored)
api_keys.jsonOpenAI API keys (gitignored)
agents.jsonAgent metadata with private keys (gitignored)
agent_secrets.jsonAll PII and keys from workspace gen (gitignored)
api_key_pool.jsonOpenAI key pool for dynamic agents (gitignored)
com.mangrove.discord-daily-log.plistmacOS launchd cron for daily log scraper

Daily Discord Log Scraper

Scrapes all 3 guilds (channels + active threads), saves JSONL, generates GPT-5.4 summary (organized by 6 attack categories), per-person and per-bot summaries, pushes to Firebase.

cd red-teaming/agent_proxy
uv run discord_daily_log.py                     # yesterday, all guilds
uv run discord_daily_log.py --date 2026-03-09   # specific date
uv run discord_daily_log.py --since 6h          # last 6 hours
uv run discord_daily_log.py --guild Flatland     # one guild
uv run discord_daily_log.py --skip-summary       # raw JSONL only
uv run discord_daily_log.py --skip-person-logs   # skip per-person summaries
uv run discord_daily_log.py --skip-bot-logs      # skip per-bot summaries
uv run discord_daily_log.py --skip-evidence      # skip evidence extraction
uv run discord_daily_log.py --skip-firebase      # don't push to Firebase
uv run discord_daily_log.py --token-name corleone # primary Discord token (default)
uv run discord_daily_log.py --verbose            # per-channel progress
uv run discord_daily_log.py --dry-run            # stats only

Output: red-teaming/daily_discord_logs/{date}.jsonl (raw) + {date}.md (summary). Firebase: daily_logs/{date}.

Multi-token scraping: uses ALL 14 bot tokens (primary: corleone). Unions channel lists across tokens, tries fallbacks per-channel. Deduplicates by message ID. Timeout retry: 5 attempts, exponential backoff.

Per-person/bot summaries: DISCORD_TO_PERSON maps 13 Discord usernames → display names. group_messages_by_person() / group_messages_by_bot() bucket messages. One LLM call per group type. Firebase: daily_logs/{date}/person_logs/{key} and bot_logs/{key}, each with {summary_md, message_count, edited_summary_md, edited_by, edited_at}.

IMPORTANT: push_to_firebase() uses PATCH (not PUT) — re-running with --skip-person-logs won’t wipe existing person_logs.

Cron: macOS launchd com.mangrove.discord-daily-log.plist runs daily at 6 AM. Installed at ~/Library/LaunchAgents/.

Website Build

cd red-teaming
uv run build_scenario_ui.py --password mangrove  # → index.html (AES-256-GCM encrypted)

build_scenario_ui.py parses 75 scenarios from data/scenario_catalog_full.md, loads Firebase config from data/firebase_config.json, replaces %%SCENARIOS_JSON%%, %%CATEGORIES_JSON%%, %%FIREBASE_CONFIG_JSON%% in template, encrypts between <!--%%ENCRYPTED_START%%--> / <!--%%ENCRYPTED_END%%--> markers. Password cached in sessionStorage.

IMPORTANT: After editing scenario_template.html, MUST rebuild and push both scenario_template.html and index.html.

Website Tabs

All state syncs via Firebase RTDB real-time. User identified by name (localStorage: rt-user).

Agents: Claim via private key (stored in localStorage: rt-claimed-agent, updates Firebase claimed_by). Control panel: status dot, Start/Stop/Restart, thinking effort selector (off/low/high → live openclaw.json via SSH), Fly app link, SSH command + copy, heartbeat countdown (30min cycle, 1s updates, 5min refetch). Workspace editor: 7 file tabs, dark textarea, Tab=2 spaces, Ctrl/Cmd+S saves, PUT→proxy→SSH push. Warnings on USER.md/MEMORY.md/HEARTBEAT.md. Snapshot history browser with “Restore to editor”. Status polling: 15s interval + 3s/8s after actions. All agents table with real-time status. Create agent: 3-step wizard (identity→Discord token→workspace editor) → zero-build deploy. Delete: soft-delete for dynamic agents.

Daily Logs: Three views via overview · people · bots toggle. Overview: Top Stories + Category Breakdown, date selector, edit/revert. People: 2-column small-multiples grid (13 participants), click to drill down, ← all people back. Bots: same grid (14 bots). Data from daily_logs/{date} with person_logs/ and bot_logs/ sub-paths.

Notes: Per-user scratchpad, auto-saves on blur (800ms). Others’ notes read-only. Firebase notes/{userKey}.

Issues: Submit form (title, type, details). Voting + “mark fixed” (2+ marks = resolved). Firebase bugs/, bug_votes/, bug_fixes/.

Ideas & Inspiration: Business Ideas (2 hardcoded + custom, voting, edit/withdraw) in custom_metagames/. Scenario Inspiration (75 pre-loaded + custom, category chips, sorted by votes/difficulty) in custom_scenarios/, scenario_votes/.

Onboarding: Static experiment info.

Proxy API Endpoints

Free-for-all agents (corleone, tessio) bypass auth for workspace/config reads.

EndpointMethodAuthDescription
/healthGETnoneHealth check
/agentsGETnoneList all agents
/agents/{id}/claimPOST{private_key, name}Claim agent
/agents/{id}/unclaimPOST{private_key}Release agent
/agents/{id}/statusGETnoneMachine status
/agents/{id}/startPOST{private_key}Start Fly machine
/agents/{id}/stopPOST{private_key}Stop Fly machine
/agents/{id}/restartPOST{private_key}Stop then start
/agents/{id}/heartbeatGETnoneNext heartbeat timestamp
/agents/{id}/workspaceGET?private_key=List workspace files
/agents/{id}/workspace/{file}GET?private_key=Get file (?live=true = SSH to bot)
/agents/{id}/workspace/{file}PUT{private_key, content}Save & push file
/agents/{id}/sshGET?private_key=SSH command + agent info
/agents/{id}/config/thinkingGET?private_key=Get thinking effort
/agents/{id}/config/thinkingPUT{private_key, level}Set thinking effort
/validate-discord-tokenPOST{token}Validate Discord bot token
/agents/{id}/invite-linksGETnoneDiscord invite URLs for 3 guilds
/agents/createPOST{name, discord_bot_token, ...}Create new agent
/agents/{id}/deletePOST{private_key}Soft-delete (stop, remove from Discord guilds, mark deleted)

Firebase RTDB

URL: https://red-teaming-betrayal-default-rtdb.firebaseio.com

agents/{agentId}                       — metadata, status, claimed_by, private_key_hash
users/{userKey}                        — name + joinedAt
daily_logs/{date}                      — daily summary (from discord_daily_log.py)
daily_logs/{date}/person_logs/{key}    — per-person (key = name.lower().replace(" ","_"))
daily_logs/{date}/bot_logs/{key}       — per-bot (key = username.lstrip(".").lower())
workspace_snapshots/{agent}/latest     — latest workspace files (hourly, from workspace_snapshot.sh)
workspace_snapshots/{agent}/history/{ts} — historical snapshots (only on content change)
workspace_snapshots/{agent}/daily_logs — agent's memory/daily log files
votes/{metagameKey}/{userKey}          — idea votes
custom_metagames/{pushId}              — user-submitted business ideas
scenario_votes/{scenarioKey}/{userKey} — scenario votes
custom_scenarios/{pushId}              — user-submitted scenarios
bugs/{pushId}                          — bug/request submissions
bug_votes/{bugKey}/{userKey}           — bug votes
bug_fixes/{bugKey}/{userKey}           — "mark fixed" flags (2+ = resolved)
notes/{userKey}                        — personal notes

Deployment State

🔴 EXPERIMENT IS LIVE (March 9–23, 2026). NEVER MAKE BREAKING CHANGES. BE EXTREMELY CAREFUL PUSHING CODE.

All 14 agents deployed, connected to Discord (Flatland + Spaceland). Proxy deployed with persistent volume (proxy_data/data). Keys distributed, bots being claimed/customized.

LIVE RULES:

  • NEVER full redeploy (deploy --all) unless critical. Drops conversations, causes downtime.
  • Prefer hotpatching for workspace changes — no restart, no memory loss.
  • If redeploy needed: ONE bot first, verify logs, then proceed.
  • Never change USER.md keys/PII live — breaks claiming.
  • Test locally before hotpatching to all bots.

Guild IDs

  • Flatland: 1477433806859276475 (air-gapped)
  • Spaceland: 1479164061533863949 (internet-connected)
  • Testland: 1479170960497316021 (test)
  • Test bot: mangrove-testbot (app ID 1479283611034325022)

Agent Roster (14 agents)

BotHumanEmail
alexbotAlex Loftusalexloftus2004@gmail.com
fredbotFred Heidingfred@aisecurityresearch.com
bijanbotBijan Varjavandbijan.varjavand@openai.com
barisbotBaris Gusakalbarisgg@gmail.com
adityabotAditya Ratanjadityaratan@gmail.com
eunjeongbotEunJeong Hwanghej78520@gmail.com
jannikbotJannik Brinkmannjannik.brinkmann@uni-mannheim.de
woogbotAlice Riggrigg.alice0@gmail.com
negevbotNegev Taglichttaglichtnegev@gmail.com
giobotGiordanno Rogersroger.gi@northeastern.edu
charlesbotCharles Yec.ye@outlook.com
jasminebotJasmine Cuijcui28@mit.edu
corleone(none)Admin agent (free-for-all)
tessio(none)Worker agent (free-for-all)

Hotpatching (Live Operations)

cd red-teaming/agent_proxy
uv run hotpatch.py --agent alexbot --file SOUL.md  # one file, one bot
uv run hotpatch.py --all --file AGENTS.md           # one file, all running bots
uv run hotpatch.py --agent alexbot --all-files      # all files, one bot
uv run hotpatch.py --all --all-files --dry-run      # preview
uv run hotpatch.py --all --file SOUL.md --force     # overwrite even if customized

Changes take effect on next session/heartbeat (up to 30min). No restart, no memory loss.

  • Hotpatch (no restart): SOUL.md, AGENTS.md, IDENTITY.md, TOOLS.md. Use during experiment.
  • Full redeploy (restart): openclaw.json, Dockerfile, entrypoint changes. 10-30s downtime.
  • Never change live: USER.md keys/PII, Discord tokens, guild IDs.
  • Patchable: AGENTS.md, IDENTITY.md, SOUL.md, TOOLS.md. USER.md requires --include-user.
  • Agent-owned (never overwritten): MEMORY.md, HEARTBEAT.md, memory/*.md
  • Customization detection: pulls live file first, warns if participant edited it, skips unless --force.

Dynamic Agent Creation

POST /agents/create does zero-build deploys in ~30s: validate Discord token → derive client_id (base64) → generate keys → pick OpenAI key from pool → encode config+workspace as base64 env vars → get Docker image from existing agent (mangrove-alexbot) → create Fly app → volume → machine (init override decodes files on first boot, checks /data/workspaces/AGENTS.md sentinel) → write Firebase + update persistent agents.json.

Deletion: soft — stops machines, marks deleted in Firebase, removes from AgentDB. Fly app preserved.

API key pool: /data/api_key_pool.json on proxy volume (or OPENAI_API_KEY_POOL env var). pick_openai_key() finds first unassigned key. 13 keys from mangrove1-9, 16-19.

Proxy volume: proxy_data/data. Stores agents.json + api_key_pool.json. First boot copies baked agents.json to volume.

Manual bot creation (legacy): see red-teaming/agent_proxy/README.md.

Key Files in red-teaming/ Root

  • scenario_template.htmlbuild_scenario_ui.pyindex.html (encrypted website build path)
  • agent_capabilities.md — comprehensive audit of agent capabilities (packages, permissions, restrictions)
  • agent_guide.md — participant-facing guide (agent usage, SSH, workspace files)
  • onboarding.txt — website password (“mangrove”)
  • daily_discord_logs/ — raw JSONL + LLM summary markdown from discord_daily_log.py
  • daily_logs/ — daily work tracking logs (YYYY-MM-DD.md)
  • data/scenario_catalog_full.md, data/firebase_config.json — build inputs

Note: Root AGENTS.md is a stale duplicate of CLAUDE.md — ignore it, only edit CLAUDE.md.

Active Scope

Only red-teaming/: website build path and Fly.io agent proxy stack under agent_proxy/.

Things You’ve Done Wrong

PAY ATTENTION TO THIS SECTION. Edit it ANY TIME unexpected behavior occurs.

  • fly ssh -C has no shell. Pipes/redirects silently ignored (exit 0). Wrap in bash -c '...'.
  • If asked a question, ANSWER IT. Don’t do something.
  • NEVER touch live infrastructure without EXPLICIT permission. No hotpatch, deploy, restart, fly ssh, fly machine commands. Show plan, wait for “go”. NO EXCEPTIONS.
  • Editing local templates = fine. Pushing to bots = requires permission.
  • ALWAYS git commit BEFORE changing other people’s stuff (hotpatch, deploy, file push). Participant customizations were permanently lost without this.
  • Firebase PUT overwrites entire node. Use PATCH for partial updates. PUT with --skip-person-logs wiped existing person_logs.
  • renderSingleLog drops _preamble. Markdown parser puts text before ## into _preamble (never rendered). Plain-text summaries must bypass renderLogMarkdown and render directly.

Project Mangrove Context

Two-week red-team campaign (March 9–23, 2026; check-in March 16) testing GPT-5.4 agent behavior in multi-party social environments — disclosure, permission violations, coercion.

Timeline: March 9-10 onboarding + baseline probing. March 11-21 game environments (deception, coercion, collusion, cheating). March 22-23 replay + reproducibility.

Rules: Use only designated infrastructure. Discuss only in #ext-handshake-mangrove on Slack. All “sensitive info” is synthetic. Escalate anything out-of-scope (real secrets, real-world effects, infrastructure abuse).

Success criteria: Reproducibility, coverage of attack classes, severity assessment, actionability for mitigation.

Attack priorities: Authority spoofing/impersonation, unauthorized disclosure, permission/access violations, social pressure/coercion, collusion/cheating in games, context poisoning.