Chat feature TODOs:

Chat feature TODOs:

Immediate

get initial chat box HTML and JS frontend working (no LLM call yet) set up fastAPI backend set up cerebras LLM API call to gpt-oss-120B deploy on production w/ Fly.io ~~ ~~switch to streaming API remove references to non-streaming chat add conversation history support add health check endpoint and CI/CD - keep Fly.io machine warm - add chat logging - stats on chat logging

Future (major)

  • classifier head that predicts whether a user is asking a question about the resume or not
  • build an evaluation harness for accuracy
    • Q&A questions about my resume
    • call backend on each question
    • log raw responses, latency, errors
    • output summary report
  • add live monitoring (tokens/sec, first response latency)
  • make faster
    • dont keep passing conversation history back and forth (store on Fly.io server)
    • Preload KV cache w/ background context whenever a new user accesses the website
      • make sure not to leak info to the HTML frontend
  • make more cost-efficient
    • have fly.io machine start up only when at least one user is on the site (upon entering the site)
  • make UI prettier (anthropic-style, rendered as a box inline, scrolling possible)
  • Add RAG with vector embeddings. Should contain
    • all links in the resume so that the chat model has access to them if it needs
    • the code used to create the model chat window so that the model can figure out how it is set up

future (minor)

  • make “send” button and input box lower w.r.t the conversation
  • switch to typescript over javascript
  • log user id by IP address rather than making a new hash every time