Interview Practice Protocol

You have ~6 hours of prep across 4 sessions. Everything is designed around retrieval — reading doesn’t count, only recall under pressure counts.

Red-teaming is your weakest/most uncertain round, so it goes first and gets the most reps.

Session 1: Red-Teaming Focus (~1.5 hr)

Step 1: Read the playbook (15 min)

Read redteam_playbook.md once, slowly
Pay attention to the mechanism of each attack and the specific Mangrove example
Note the meta-tactics at the top (thinking off, memory as trust anchor, refusals don’t transfer)
Close the file

Step 2: Recall the playbook (10 min)

Open a blank piece of paper (physical paper, not a screen)
Write down: the 5 attack names, the mechanism of each, and one ChatGPT prompt for each
Also write the 3 meta-tactics
Don’t peek. If you’re stuck, leave a blank and move on.
When done, open the playbook and compare. Circle what you missed.

Step 3: First mock red-team session (30 min)

Open ChatGPT
Set a 30-minute timer
Narrate aloud as you probe — this is the skill they’re evaluating
Start with Attack 1 (cross-context encoding / refusal gap). If it works, explore depth. If blocked, pivot to Attack 2 (identity claims).
For EACH attempt, say out loud:
- What boundary you’re testing and why
- What you expect to happen
- What actually happened
- What that tells you about the model’s safety
After 30 min: STOP. Write down:
- What worked
- What didn’t
- What you forgot to try
- What you wish you’d said better

Step 4: Debrief + re-read gaps (15 min)

Open the playbook. Read ONLY the attacks you forgot or didn’t try.
For attacks that failed: think about why. Was it the prompt? The framing? Would a different approach to the same boundary work?
Write down 2-3 adjusted prompts you want to try next time.

Step 5: Absorb the other materials (20 min)

Read research_talk_outline.md once — focus on transitions between sections
Read ops_framework.md and behavioral_stories.md once each
Close all files
On paper, write from memory:
- Talk sections in order (Setup → Frame → F1 → F2 → F3 → Systems → Implications)
- 4 ops phases (Scope → Build → Run → Learn) + one example each
- 3 story punchlines
Compare. Circle gaps.

End of Session 1. Put the paper somewhere visible. Don’t re-read docs.

Session 2: Second Red-Team Rep + Code (~1.5 hr)

Step 6: Second mock red-team session (30 min)

Open ChatGPT again (fresh conversation)
Set a 30-minute timer
This time: use your adjusted prompts from Step 4
Try at least one attack you DIDN’T try in Session 1
Focus on narration quality — practice sounding like you’re thinking, not reading a script
Try toggling reasoning off if the model has that option (Negev’s finding)
After 30 min: write down what improved vs. Session 1

Step 7: Timed coding exercise (45 min)

Delete practice_analysis.py (yes, actually delete it)
Set a 45-minute timer
Your task: pull Mangrove data from Firebase, produce 2-3 Tufte-style visualizations, and narrate your findings aloud as you code
You CAN use coding_toolkit.py (it’s a library, not the answer)
You CANNOT look at the deleted script or any notes
Firebase URL: https://red-teaming-betrayal-default-rtdb.firebaseio.com
- daily_logs/{date}/stats.json has total_messages, human_messages, bot_messages, most_active_channels, most_mentioned
- Dates: 2026-03-09 through 2026-03-23
Talk out loud the whole time: “I’m loading the data… I see there are 15 days… let me group by date and look at volume…”
When the timer goes off, STOP. Note where you are.

Step 8: Coding debrief (10 min)

What took too long? What patterns weren’t automatic?
If you got stuck on pandas syntax, practice that specific thing 3 times right now
If plotting took too long, simplify: bar chart + direct labels + remove spines. That’s the whole recipe.

End of Session 2. You’ve now done 2 red-team reps and 1 coding rep.

Session 3: Simulate All Rounds (~1.5 hr)

Step 9: Third mock red-team session (20 min)

This should be getting smoother now
Set a 20-minute timer (shorter — practice being efficient)
Focus on the narration: can you smoothly explain what you’re doing and why?
Try the attack you’re least confident in

Step 10: Research talk (20 min)

Record yourself on your phone (voice memo or video)
Set a 15-minute timer
Deliver the talk from memory. No notes. No peeking.
When you get stuck, DON’T stop — say “let me come back to that” and keep going
Listen back. Note:
- Where did you ramble? (Cut those parts)
- Where did you get stuck? (Re-read that section only)
- Where did you sound most confident? (That’s your anchor — build outward from there)

Step 11: Ops Q&A (20 min)

Set your phone to record
Answer these questions out loud, 2-3 min each:
1. “Walk me through planning a red-team campaign from scratch.”
2. “How did you handle it when participants went off-script?”
3. “Tell me about a time things went wrong in the campaign.”
4. “How would you scale this to 100+ agents?”
5. “How do you communicate findings to product/safety teams?”
Listen back. Note where you were vague vs. concrete.

Step 12: Retrieval pass (15 min)

On a NEW piece of paper, from memory:
- 5 attack names + mechanisms + one prompt each + 3 meta-tactics
- Talk structure: 7 sections + example + transition
- 4 ops phases + example each
- 3 story punchlines
Compare to your Session 1 paper. What improved? What’s still shaky?
ONLY NOW open docs to fill remaining gaps.

Step 13: Debrief (10 min)

Rank the 5 interview rounds by how ready you feel (1 = most ready, 5 = least ready)
For your bottom 2: identify the specific thing that felt weakest
Write down 2-3 things to sharpen in Session 4

End of Session 3. You’ve done 3 red-team reps and should feel rough-but-functional across all rounds.

Session 4: Sharpen (day before, ~1 hr)

Step 14: Final red-team rep (20 min)

One more ChatGPT session
By now you should have a personal “opening sequence”: your go-to first 3 moves
Practice that opening sequence until it feels natural
If you discover a new technique that works, great — add it to your mental toolkit

Step 15: Re-simulate next-weakest round (20 min)

Whatever ranked second-lowest in Step 13, do it again
If it was the talk: deliver again, record, compare to Session 3 recording
If it was ops: answer the questions again, but have someone ELSE ask them (a friend, a partner, anyone)

Step 16: Night-before review (15 min)

Read your circled gaps from all recall sessions
Read the playbook one last time (just the attack names + mechanisms + meta-tactics)
Read the talk section headers + transitions one last time
Go to sleep. Retrieval consolidation happens during sleep.

Day of: Pre-Interview (15 min)

Do NOT re-read the docs. You either know it or you don’t.
On a blank piece of paper, write from memory:
- “Agents fail at boundaries: memory, identity, context”
- 5 attack names
- 3 meta-tactics
- Talk arc: Setup → Frame → F1 → F2 → F3 → Systems → Implications
- “Scope → Build → Run → Learn”
If this comes easily, you’re ready.
Walk in with the frame “agents fail at boundaries” as your anchor. Everything else hangs off it.

Key Numbers to Internalize

These come up across multiple rounds — drill them:

65+ agents on Discord
13 participants
14 days (March 9-23, 2026)
~61,000 messages (report number) / 83,364 (from Firebase stats — daily log pipeline captures more)
84% bot-generated messages
GPT-5.4 model with reasoning, 200K context
~19,000 lines of infrastructure code, built in 2 weeks
17 channels deleted in the Corleone incident
5-minute archive daemon cadence (saved us)
3 independent data collection pipelines
12 feather reports
~100% success rate for memory poisoning
The six-word system prompt change that flipped a bot from protecting to harming

Alex Loftus