Chat feature TODOs:

Chat feature TODOs:

Immediate

get initial chat box HTML and JS frontend working (no LLM call yet) set up fastAPI backend set up cerebras LLM API call to gpt-oss-120B deploy on production w/ Fly.io ~~ ~~switch to streaming API remove references to non-streaming chat add conversation history support add health check endpoint and CI/CD - keep Fly.io machine warm - add chat logging - stats on chat logging

  • build an evaluation harness for accuracy - Q&A questions about my resume - call backend on each question - log raw responses, latency, errors - output summary report
  • Add RAG with vector embeddings. Should contain
    • all links in the resume so that the chat model has access to them if it needs
    • the code used to create the model chat window so that the model can figure out how it is set up
  • add reranker to RAG setup

Future (major)

  • classifier head that predicts whether a user is asking a question about the resume or not
  • reranker into RAG system
  • add tool-use: turn the setup into an agent with the sdk, tool-use should read relevant documents as needed with bash commands.
  • add live monitoring (tokens/sec, first response latency)
  • make faster
    • dont keep passing conversation history back and forth (store on Fly.io server)
    • Preload KV cache w/ background context whenever a new user accesses the website
      • make sure not to leak info to the HTML frontend
  • make more cost-efficient
    • have fly.io machine start up only when at least one user is on the site (upon entering the site)
    • sliding window on conversation history — full history + system prompt re-sent every turn, unbounded token cost
    • $2/conversation cost cap with per-turn token tracking
  • make UI prettier (anthropic-style, rendered as a box inline, scrolling possible)
  • add guardrails with the Agents SDK

Done

  • merge two system messages into one in build_messages
  • add rate limiting — $2/conversation cost cap with token tracking

future (minor)

  • make “send” button and input box lower w.r.t the conversation
  • switch to typescript over javascript
  • log user id by IP address rather than making a new hash every time
  • speed up evaluation by switching to httpx with asyncio