CLAUDE.md
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
This is a Jekyll-based academic website (loftusa.github.io) with integrated Python ML experiments, a FastAPI chat backend, and a Next.js stocks page. The repository serves multiple purposes:
- Static academic website (Jekyll/GitHub Pages)
- ML research experiments (PyTorch/Transformers fine-tuning)
- Production chat API (FastAPI + Cerebras LLM)
- Experimental stocks dashboard (Next.js)
Development Commands
Jekyll Site (Academic Website)
# Install dependencies
bundle install
# Run local development server
bundle exec jekyll liveserve
# Visit http://localhost:4000
# Development with draft posts
bundle exec jekyll serve --config _config.yml,_config.dev.yml
Python Experiments
# Run any Python script (ALWAYS use this syntax)
uv run <script>.py
# Examples:
uv run experiments/chat_api.py
uv run experiments/run_eval.py
uv run experiments/finetune/dataset.py
# Do NOT use:
# python <script>.py
# uv run python <script>.py
Chat API (FastAPI Backend)
# Run locally
cd experiments/
uv run uvicorn chat_api:app --reload --port 8000
# Download chat logs (requires LOG_ACCESS_TOKEN in .env)
curl -H "Authorization: Bearer $LOG_ACCESS_TOKEN" http://localhost:8000/logs/download
# Health check
curl http://localhost:8000/health
Stocks Page (Next.js)
npm run dev
# Visit http://localhost:3000/stocks
# API endpoint
curl http://localhost:3000/api/stocks
Architecture
Chat System (experiments/)
Flow: User → Jekyll frontend (_pages/chat.md) → FastAPI backend (chat_api.py) → Cerebras API → Response stream
Key files:
chat_api.py: FastAPI server with/chat,/health,/logs/downloadendpointssystem_prompt.txt: System instructions for the chat modelresume.txt: Resume context injected into every chatlogs/chat_logs.jsonl: User interaction logs (user_message, bot_response, latencies)run_eval.py: Evaluation harness for chat accuracy (fact-checking againsteval_dataset.json)analyze_logs.py: Log analysis and statisticssync_logs.py: Script to download logs from production
Environment variables (.env in experiments/):
CEREBRAS_API_KEY: API key for Cerebras LLMLOG_PATH: Path to chat logs JSONL fileLOG_ACCESS_TOKEN: Bearer token for accessing logs endpointEVAL_DATASET_PATH: Path to evaluation dataset JSONEVAL_OUTPUT_PATH: Path to evaluation output JSON
Fine-tuning System (experiments/finetune/)
Two training approaches exist: SFT (inverse-length weighted) and DPO (direct preference optimization).
SFT Training (Length-Weighted)
Goal: Fine-tune with inverse-length weighted loss to produce shorter responses
Architecture (see finetune_plan.md for detailed design):
dataset.py:ChatDatasetclass that tokenizes chat logs, masks prompt tokens in labels with -100, returns{input_ids, labels, response_length}. Uses Qwen2-0.5B-Instruct as base model.collator.py:DataCollatorWithLengthsfor left-padding batches and stacking response lengthstrainer.py:LengthWeightedTrainer(subclass of HuggingFace Trainer) - overridescompute_lossto weight loss inversely by response lengthtrain.py: Main SFT training script using DoRA (LoRA with weight decomposition)inference.py: Compare base vs LoRA model responses, shows token count difference
Key implementation detail: Loss is computed per-token with reduction="none", averaged per-example, then weighted by 1.0 / response_length. Weights are normalized to preserve learning rate scale.
DPO Training (Refusal Training)
Goal: Train model to refuse off-topic/inappropriate requests instead of complying
DPO Data Generation Pipeline:
dpo_data_gen.py: Simple approach - uses Claude to rewrite responses more concisely, creates preference pairs where shorter=chosendpo_data_gen_agents.py: Agent-based approach using OpenAI Agents SDK with three agents:RequestClassifier: Categorizes requests as legitimate/off_topic/jailbreak/inappropriateComplianceChecker: Checks if model incorrectly complied with bad requestsRefusalAgent: Generates brief, professional refusals- Only generates DPO pairs when model incorrectly complied (chosen=refusal, rejected=bad_response)
run_dpo_variance.sh: Runs data generation 10 times to measure varianceanalyze_dpo_variance.py: Analyzes agreement across runs using sentence embeddings (all-MiniLM-L6-v2), generates visualizations invariance_analysis/combine_dpo_runs.py: Combines multiple runs into consensus dataset based on agreement threshold (default 50%), uses shortest chosen and modal rejected
DPO Training & Inference:
dpo_train.py: DPO training using TRL’sDPOTrainerwith LoRA on Qwen2-0.5B-Instruct. Trains onlocal_datasets/dpo_combined.jsonldpo_chat.py: Interactive chat to test DPO checkpoints. Commands:quit,base(toggle adapter),ckpt#(load checkpoint N)view_dpo.py: Pretty-print DPO dataset with rich formatting
Data sources:
- Real chat logs from
chat_logs.jsonl - Synthetic Q&A pairs generated from
resume.txt
Jekyll Site Structure
_pages/: Main pages (about, cv, chat, publications, etc.)_posts/: Blog posts (research notes, Bayesian analysis, etc.)_drafts/: Unpublished drafts_publications/: Academic publications metadata_talks/: Conference talks metadata_includes/,_layouts/,_sass/: Jekyll theme customization_config.yml: Site configuration (title, author, social links, etc.)markdown_generator/: Python scripts to generate markdown from TSV/BibTeX
Next.js App (app/, lib/, components/)
app/stocks/: Stocks dashboard pageapp/api/stocks/: API endpoint returning weekly stock changeslib/holdings.ts: Hard-coded holdings list (symbol, quantity)lib/schwabSync.ts: Stub for Schwab API sync (no-op, protected bySYNC_TOKEN)middleware.ts: Auth protection for admin endpoints
Key Conventions
Python Code Organization
- Experiments go in
experiments/directory - not in root - Test scripts go in
tests/scratchpad/- don’t clutter root - Use
uv run <script>.pyfor all Python execution - When working with images models during tests, save outputs to
tests/scratchpad/and open them to verify correctness
Package Management
- Python:
uv(seepyproject.toml) - Jekyll:
bundle(seeGemfile) - Next.js:
npm(seepackage.json)
Environment Files
experiments/.env: Chat API keys, log paths, eval paths- Root
.env(if exists): Fallback for backward compatibility
Deployment
- Jekyll site: Automatically deployed via GitHub Pages
- Chat API: Deployed on Fly.io (see
fly.toml,Dockerfilein experiments/) - Stocks page: Next.js app directory (experimental)
Important Notes
Data Privacy
- Chat logs contain user messages - handle with care
LOG_ACCESS_TOKENrequired to download logs from API- Holdings list in
lib/holdings.tsis hard-coded (no PII intended for storage)
Model Context
- Chat API injects both
system_prompt.txtandresume.txtas system messages before every conversation - Conversation history is passed in every request (not stored server-side)
- Model: Cerebras
gpt-oss-120b
Fine-tuning
- Uses Qwen2-0.5B-Instruct as base model (upgraded from TinyLlama)
- LoRA/PEFT for memory efficiency (DoRA for SFT, standard LoRA for DPO)
- Labels are masked with
-100for prompt tokens (standard causal LM practice) - SFT: Response length weighting (
weight = 1.0 / response_length, normalized) - DPO: Direct preference optimization for refusal training - model learns to refuse off-topic/inappropriate requests
Testing
- Write tests in
tests/directory (usepytestandhypothesis) - Run tests regularly to ensure code correctness
- Update tests when editing existing code
- Fail fast: lots of assertions before expensive operations
When Finishing Work
- After writing/updating experiments: update README.md and CLAUDE.md with what the experiment does
- After getting results: update README.md and CLAUDE.md with the results
- Update README.md if code changes warrant it (avoid unnecessary info)
