todos

try search strategies
- add kapa.ai as baseline
- create a test suite -> doc with all the comparisons
- actually make it flexible / selectable again instead of comment out for easier benchmarking
markdown endpoint
pre-calculate metadata around each markdown page
pre-calculate complex question query q&a pairs that map to various pages and categories

future

(Ben) can we have a blind eval ranker - aka we show the two results side by side and you pick A or B and do this like 20 times, then it shows you which one ranked better

for TINY DATA SETS you do very different things than massive data sets; vectorize, mixedbread, turbopuffer are all set up for MASSIVE data sets
doing 2sec responses is EASY; sub 1-sec prob requires vps
network hops + isolate warmups kill latency - cosine dist calc and embeddings calc don't matter as much
small/cheap embeddings seem to not be a problem (?? quality??)
generating embeddings locally w/ a small model requires downloading a model (size+computationally expensive) while generating embeddings from an API costs around ~600ms
loading an 80mb xenova--all-miniLM is really good/fast but you it sucks for serverless and for mobile users to download that
- possibly best way is to host this somewhere w/ the onnx and everything saved locally and you can just run it
cloudflare ai embeddings are very fast, but worker needs to be warmed up; if not warm then expect ~800ms-2000ms
- if you ping the cf ai embeddings with a fake request for warmup then it's fast (e.g. while a user is typing)
for testing at least, loading in a massive json file into memory takes a very long time 10+ seconds