Interview Practice Protocol
Interview Practice Protocol
You have ~6 hours of prep across 4 sessions. Everything is designed around retrieval — reading doesn’t count, only recall under pressure counts.
Red-teaming is your weakest/most uncertain round, so it goes first and gets the most reps.
Session 1: Red-Teaming Focus (~1.5 hr)
Step 1: Read the playbook (15 min)
- Read
redteam_playbook.mdonce, slowly - Pay attention to the mechanism of each attack and the specific Mangrove example
- Note the meta-tactics at the top (thinking off, memory as trust anchor, refusals don’t transfer)
- Close the file
Step 2: Recall the playbook (10 min)
- Open a blank piece of paper (physical paper, not a screen)
- Write down: the 5 attack names, the mechanism of each, and one ChatGPT prompt for each
- Also write the 3 meta-tactics
- Don’t peek. If you’re stuck, leave a blank and move on.
- When done, open the playbook and compare. Circle what you missed.
Step 3: First mock red-team session (30 min)
- Open ChatGPT
- Set a 30-minute timer
- Narrate aloud as you probe — this is the skill they’re evaluating
- Start with Attack 1 (cross-context encoding / refusal gap). If it works, explore depth. If blocked, pivot to Attack 2 (identity claims).
- For EACH attempt, say out loud:
- What boundary you’re testing and why
- What you expect to happen
- What actually happened
- What that tells you about the model’s safety
- After 30 min: STOP. Write down:
- What worked
- What didn’t
- What you forgot to try
- What you wish you’d said better
Step 4: Debrief + re-read gaps (15 min)
- Open the playbook. Read ONLY the attacks you forgot or didn’t try.
- For attacks that failed: think about why. Was it the prompt? The framing? Would a different approach to the same boundary work?
- Write down 2-3 adjusted prompts you want to try next time.
Step 5: Absorb the other materials (20 min)
- Read
research_talk_outline.mdonce — focus on transitions between sections - Read
ops_framework.mdandbehavioral_stories.mdonce each - Close all files
- On paper, write from memory:
- Talk sections in order (Setup → Frame → F1 → F2 → F3 → Systems → Implications)
- 4 ops phases (Scope → Build → Run → Learn) + one example each
- 3 story punchlines
- Compare. Circle gaps.
End of Session 1. Put the paper somewhere visible. Don’t re-read docs.
Session 2: Second Red-Team Rep + Code (~1.5 hr)
Step 6: Second mock red-team session (30 min)
- Open ChatGPT again (fresh conversation)
- Set a 30-minute timer
- This time: use your adjusted prompts from Step 4
- Try at least one attack you DIDN’T try in Session 1
- Focus on narration quality — practice sounding like you’re thinking, not reading a script
- Try toggling reasoning off if the model has that option (Negev’s finding)
- After 30 min: write down what improved vs. Session 1
Step 7: Timed coding exercise (45 min)
- Delete
practice_analysis.py(yes, actually delete it) - Set a 45-minute timer
- Your task: pull Mangrove data from Firebase, produce 2-3 Tufte-style visualizations, and narrate your findings aloud as you code
- You CAN use
coding_toolkit.py(it’s a library, not the answer) - You CANNOT look at the deleted script or any notes
- Firebase URL:
https://red-teaming-betrayal-default-rtdb.firebaseio.comdaily_logs/{date}/stats.jsonhastotal_messages,human_messages,bot_messages,most_active_channels,most_mentioned- Dates:
2026-03-09through2026-03-23
- Talk out loud the whole time: “I’m loading the data… I see there are 15 days… let me group by date and look at volume…”
- When the timer goes off, STOP. Note where you are.
Step 8: Coding debrief (10 min)
- What took too long? What patterns weren’t automatic?
- If you got stuck on pandas syntax, practice that specific thing 3 times right now
- If plotting took too long, simplify: bar chart + direct labels + remove spines. That’s the whole recipe.
End of Session 2. You’ve now done 2 red-team reps and 1 coding rep.
Session 3: Simulate All Rounds (~1.5 hr)
Step 9: Third mock red-team session (20 min)
- This should be getting smoother now
- Set a 20-minute timer (shorter — practice being efficient)
- Focus on the narration: can you smoothly explain what you’re doing and why?
- Try the attack you’re least confident in
Step 10: Research talk (20 min)
- Record yourself on your phone (voice memo or video)
- Set a 15-minute timer
- Deliver the talk from memory. No notes. No peeking.
- When you get stuck, DON’T stop — say “let me come back to that” and keep going
- Listen back. Note:
- Where did you ramble? (Cut those parts)
- Where did you get stuck? (Re-read that section only)
- Where did you sound most confident? (That’s your anchor — build outward from there)
Step 11: Ops Q&A (20 min)
- Set your phone to record
- Answer these questions out loud, 2-3 min each:
- “Walk me through planning a red-team campaign from scratch.”
- “How did you handle it when participants went off-script?”
- “Tell me about a time things went wrong in the campaign.”
- “How would you scale this to 100+ agents?”
- “How do you communicate findings to product/safety teams?”
- Listen back. Note where you were vague vs. concrete.
Step 12: Retrieval pass (15 min)
- On a NEW piece of paper, from memory:
- 5 attack names + mechanisms + one prompt each + 3 meta-tactics
- Talk structure: 7 sections + example + transition
- 4 ops phases + example each
- 3 story punchlines
- Compare to your Session 1 paper. What improved? What’s still shaky?
- ONLY NOW open docs to fill remaining gaps.
Step 13: Debrief (10 min)
- Rank the 5 interview rounds by how ready you feel (1 = most ready, 5 = least ready)
- For your bottom 2: identify the specific thing that felt weakest
- Write down 2-3 things to sharpen in Session 4
End of Session 3. You’ve done 3 red-team reps and should feel rough-but-functional across all rounds.
Session 4: Sharpen (day before, ~1 hr)
Step 14: Final red-team rep (20 min)
- One more ChatGPT session
- By now you should have a personal “opening sequence”: your go-to first 3 moves
- Practice that opening sequence until it feels natural
- If you discover a new technique that works, great — add it to your mental toolkit
Step 15: Re-simulate next-weakest round (20 min)
- Whatever ranked second-lowest in Step 13, do it again
- If it was the talk: deliver again, record, compare to Session 3 recording
- If it was ops: answer the questions again, but have someone ELSE ask them (a friend, a partner, anyone)
Step 16: Night-before review (15 min)
- Read your circled gaps from all recall sessions
- Read the playbook one last time (just the attack names + mechanisms + meta-tactics)
- Read the talk section headers + transitions one last time
- Go to sleep. Retrieval consolidation happens during sleep.
Day of: Pre-Interview (15 min)
- Do NOT re-read the docs. You either know it or you don’t.
- On a blank piece of paper, write from memory:
- “Agents fail at boundaries: memory, identity, context”
- 5 attack names
- 3 meta-tactics
- Talk arc: Setup → Frame → F1 → F2 → F3 → Systems → Implications
- “Scope → Build → Run → Learn”
- If this comes easily, you’re ready.
- Walk in with the frame “agents fail at boundaries” as your anchor. Everything else hangs off it.
Key Numbers to Internalize
These come up across multiple rounds — drill them:
- 65+ agents on Discord
- 13 participants
- 14 days (March 9-23, 2026)
- ~61,000 messages (report number) / 83,364 (from Firebase stats — daily log pipeline captures more)
- 84% bot-generated messages
- GPT-5.4 model with reasoning, 200K context
- ~19,000 lines of infrastructure code, built in 2 weeks
- 17 channels deleted in the Corleone incident
- 5-minute archive daemon cadence (saved us)
- 3 independent data collection pipelines
- 12 feather reports
- ~100% success rate for memory poisoning
- The six-word system prompt change that flipped a bot from protecting to harming
