groq-docs
Val Town is a collaborative website to build and scale JavaScript apps.
Deploy APIs, crons, & store data – all from the browser, and deployed in milliseconds.
Viewing readonly version of main branch: v102View latest version
- try search strategies
- add kapa.ai as baseline
- create a test suite -> doc with all the comparisons
- actually make it flexible / selectable again instead of comment out for easier benchmarking
- markdown endpoint
- pre-calculate metadata around each markdown page
- pre-calculate complex question query q&a pairs that map to various pages and categories
- (Ben) can we have a blind eval ranker - aka we show the two results side by side and you pick A or B and do this like 20 times, then it shows you which one ranked better
- for TINY DATA SETS you do very different things than massive data sets; vectorize, mixedbread, turbopuffer are all set up for MASSIVE data sets
- doing 2sec responses is EASY; sub 1-sec prob requires vps
- network hops + isolate warmups kill latency - cosine dist calc and embeddings calc don't matter as much
- small/cheap embeddings seem to not be a problem (?? quality??)
- generating embeddings locally w/ a small model requires downloading a model (size+computationally expensive) while generating embeddings from an API costs around ~600ms
- loading an 80mb xenova--all-miniLM is really good/fast but you it sucks for serverless and for mobile users to download that
- possibly best way is to host this somewhere w/ the onnx and everything saved locally and you can just run it
- cloudflare ai embeddings are very fast, but worker needs to be warmed up; if not warm then expect ~800ms-2000ms
- if you ping the cf ai embeddings with a fake request for warmup then it's fast (e.g. while a user is typing)
- for testing at least, loading in a massive json file into memory takes a very long time 10+ seconds
