We (at Weaviate) support separate BM25, vector, and combined hybrid search, so I...

We (at Weaviate) support separate BM25, vector, and combined hybrid search, so I can talk about the performance aspect of all of these a lot. The tl;dr is: It's complicated. What you say, may be true in some cases, but not in others. One does not always scale better than the other.

For example, ANN indexing with an index like HNSW scales pretty much logarithmically. A bit oversimplified, but roughly we see search times double with every 10x of the search space. In addition, calculations like euclidean distance and dot product (and therefore cosine sim) can be parallelized very well at a hardware level, e.g. AVX2/AVX512/neon and similar SIMD techniques.

With BM25, this isn't quite as simple. We have algorithms such as WAND and Block-Max-WAND, which help eliminate _scoring_ elements that cannot reach a top-k spot. However, the distribution of terms plays a big role here. As a rule of thumb, the rarer a word is the fewer documents require scoring. Let's say you have 1B documents, and a 2-term query, but each term matches only 100k documents. If you AND-combine those queries, you will have at most 100k matches, if you OR-combine them, you will have at most 200k. The fact that there were 1B documents indexed played no role. But now think of two terms that each match 500M objects. Even with the aforementioned algorithms – which rely on the relative impact of each term – there is a risk we would now have to score every single document in the database. This is, again, over-simplified, but my point is the following:

ANN latency is fairly predictable. BM25 latency depends a lot on the input query. Our monitoring shows that in production cases, when running a hybrid search, sometimes the vector query is the bottleneck, sometimes the BM25 query is.

> Hybrid search can be used to first filter a smaller number of candidates and then rerank them using vector search

My post is already getting quite long, but want to quickly comment on this two-step approach. For Weaviate, that's not the case. BM25 and vector searches happen independently, then the scores are aggregated to combine a single result set.

Yes, you could also use BM25 as a filter set first, then re-rank the results with embeddings, however you wouold lose the BM25 scores. If you want to use keywords just as filters, you can do that much, much cheaper than through BM25 scoring. In Weaviate matching keywords to create a filter set, would only require AND-ing or OR-ing a few roaring bitmaps which is orders of magnitides more efficient than BM25 scoring.

Disclaimer: associated with Weaviate.