Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Absolutely. Embeddings have been around a while and most people don’t realize it wasn’t until the e5 series of models from Microsoft that they even benchmarked as well as BM25 in retrieval scores, while being significantly more costly to compute.

I think sparse retrieval with cross encoders doing reranking is still significantly better than embeddings. Embedding indexes are also difficult to scale since hnsw consumes too much memory above a few million vectors and ivfpq has issues with recall.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: