Interview Practice Protocol

Interview Practice Protocol

You have ~6 hours of prep across 4 sessions. Everything is designed around retrieval — reading doesn’t count, only recall under pressure counts.

Red-teaming is your weakest/most uncertain round, so it goes first and gets the most reps.


Session 1: Red-Teaming Focus (~1.5 hr)

Step 1: Read the playbook (15 min)

  • Read redteam_playbook.md once, slowly
  • Pay attention to the mechanism of each attack and the specific Mangrove example
  • Note the meta-tactics at the top (thinking off, memory as trust anchor, refusals don’t transfer)
  • Close the file

Step 2: Recall the playbook (10 min)

  • Open a blank piece of paper (physical paper, not a screen)
  • Write down: the 5 attack names, the mechanism of each, and one ChatGPT prompt for each
  • Also write the 3 meta-tactics
  • Don’t peek. If you’re stuck, leave a blank and move on.
  • When done, open the playbook and compare. Circle what you missed.

Step 3: First mock red-team session (30 min)

  • Open ChatGPT
  • Set a 30-minute timer
  • Narrate aloud as you probe — this is the skill they’re evaluating
  • Start with Attack 1 (cross-context encoding / refusal gap). If it works, explore depth. If blocked, pivot to Attack 2 (identity claims).
  • For EACH attempt, say out loud:
    • What boundary you’re testing and why
    • What you expect to happen
    • What actually happened
    • What that tells you about the model’s safety
  • After 30 min: STOP. Write down:
    • What worked
    • What didn’t
    • What you forgot to try
    • What you wish you’d said better

Step 4: Debrief + re-read gaps (15 min)

  • Open the playbook. Read ONLY the attacks you forgot or didn’t try.
  • For attacks that failed: think about why. Was it the prompt? The framing? Would a different approach to the same boundary work?
  • Write down 2-3 adjusted prompts you want to try next time.

Step 5: Absorb the other materials (20 min)

  • Read research_talk_outline.md once — focus on transitions between sections
  • Read ops_framework.md and behavioral_stories.md once each
  • Close all files
  • On paper, write from memory:
    • Talk sections in order (Setup → Frame → F1 → F2 → F3 → Systems → Implications)
    • 4 ops phases (Scope → Build → Run → Learn) + one example each
    • 3 story punchlines
  • Compare. Circle gaps.

End of Session 1. Put the paper somewhere visible. Don’t re-read docs.


Session 2: Second Red-Team Rep + Code (~1.5 hr)

Step 6: Second mock red-team session (30 min)

  • Open ChatGPT again (fresh conversation)
  • Set a 30-minute timer
  • This time: use your adjusted prompts from Step 4
  • Try at least one attack you DIDN’T try in Session 1
  • Focus on narration quality — practice sounding like you’re thinking, not reading a script
  • Try toggling reasoning off if the model has that option (Negev’s finding)
  • After 30 min: write down what improved vs. Session 1

Step 7: Timed coding exercise (45 min)

  • Delete practice_analysis.py (yes, actually delete it)
  • Set a 45-minute timer
  • Your task: pull Mangrove data from Firebase, produce 2-3 Tufte-style visualizations, and narrate your findings aloud as you code
  • You CAN use coding_toolkit.py (it’s a library, not the answer)
  • You CANNOT look at the deleted script or any notes
  • Firebase URL: https://red-teaming-betrayal-default-rtdb.firebaseio.com
    • daily_logs/{date}/stats.json has total_messages, human_messages, bot_messages, most_active_channels, most_mentioned
    • Dates: 2026-03-09 through 2026-03-23
  • Talk out loud the whole time: “I’m loading the data… I see there are 15 days… let me group by date and look at volume…”
  • When the timer goes off, STOP. Note where you are.

Step 8: Coding debrief (10 min)

  • What took too long? What patterns weren’t automatic?
  • If you got stuck on pandas syntax, practice that specific thing 3 times right now
  • If plotting took too long, simplify: bar chart + direct labels + remove spines. That’s the whole recipe.

End of Session 2. You’ve now done 2 red-team reps and 1 coding rep.


Session 3: Simulate All Rounds (~1.5 hr)

Step 9: Third mock red-team session (20 min)

  • This should be getting smoother now
  • Set a 20-minute timer (shorter — practice being efficient)
  • Focus on the narration: can you smoothly explain what you’re doing and why?
  • Try the attack you’re least confident in

Step 10: Research talk (20 min)

  • Record yourself on your phone (voice memo or video)
  • Set a 15-minute timer
  • Deliver the talk from memory. No notes. No peeking.
  • When you get stuck, DON’T stop — say “let me come back to that” and keep going
  • Listen back. Note:
    • Where did you ramble? (Cut those parts)
    • Where did you get stuck? (Re-read that section only)
    • Where did you sound most confident? (That’s your anchor — build outward from there)

Step 11: Ops Q&A (20 min)

  • Set your phone to record
  • Answer these questions out loud, 2-3 min each:
    1. “Walk me through planning a red-team campaign from scratch.”
    2. “How did you handle it when participants went off-script?”
    3. “Tell me about a time things went wrong in the campaign.”
    4. “How would you scale this to 100+ agents?”
    5. “How do you communicate findings to product/safety teams?”
  • Listen back. Note where you were vague vs. concrete.

Step 12: Retrieval pass (15 min)

  • On a NEW piece of paper, from memory:
    • 5 attack names + mechanisms + one prompt each + 3 meta-tactics
    • Talk structure: 7 sections + example + transition
    • 4 ops phases + example each
    • 3 story punchlines
  • Compare to your Session 1 paper. What improved? What’s still shaky?
  • ONLY NOW open docs to fill remaining gaps.

Step 13: Debrief (10 min)

  • Rank the 5 interview rounds by how ready you feel (1 = most ready, 5 = least ready)
  • For your bottom 2: identify the specific thing that felt weakest
  • Write down 2-3 things to sharpen in Session 4

End of Session 3. You’ve done 3 red-team reps and should feel rough-but-functional across all rounds.


Session 4: Sharpen (day before, ~1 hr)

Step 14: Final red-team rep (20 min)

  • One more ChatGPT session
  • By now you should have a personal “opening sequence”: your go-to first 3 moves
  • Practice that opening sequence until it feels natural
  • If you discover a new technique that works, great — add it to your mental toolkit

Step 15: Re-simulate next-weakest round (20 min)

  • Whatever ranked second-lowest in Step 13, do it again
  • If it was the talk: deliver again, record, compare to Session 3 recording
  • If it was ops: answer the questions again, but have someone ELSE ask them (a friend, a partner, anyone)

Step 16: Night-before review (15 min)

  • Read your circled gaps from all recall sessions
  • Read the playbook one last time (just the attack names + mechanisms + meta-tactics)
  • Read the talk section headers + transitions one last time
  • Go to sleep. Retrieval consolidation happens during sleep.

Day of: Pre-Interview (15 min)

  • Do NOT re-read the docs. You either know it or you don’t.
  • On a blank piece of paper, write from memory:
    • “Agents fail at boundaries: memory, identity, context”
    • 5 attack names
    • 3 meta-tactics
    • Talk arc: Setup → Frame → F1 → F2 → F3 → Systems → Implications
    • “Scope → Build → Run → Learn”
  • If this comes easily, you’re ready.
  • Walk in with the frame “agents fail at boundaries” as your anchor. Everything else hangs off it.

Key Numbers to Internalize

These come up across multiple rounds — drill them:

  • 65+ agents on Discord
  • 13 participants
  • 14 days (March 9-23, 2026)
  • ~61,000 messages (report number) / 83,364 (from Firebase stats — daily log pipeline captures more)
  • 84% bot-generated messages
  • GPT-5.4 model with reasoning, 200K context
  • ~19,000 lines of infrastructure code, built in 2 weeks
  • 17 channels deleted in the Corleone incident
  • 5-minute archive daemon cadence (saved us)
  • 3 independent data collection pipelines
  • 12 feather reports
  • ~100% success rate for memory poisoning
  • The six-word system prompt change that flipped a bot from protecting to harming