I'm a big fan of ann-benchmarks and will be the first to tell you that the research community needs way more benchmarks like this. But I do want to add a couple caveats about it for people looking into this area:
1) Most of these datasets have extremely correlated dimensions. If you plot the covariance matrices, you'll see dense blobs of entries close to 1 all over the place. This makes the ANN task much easier than it would be with, say, high-quality DNN features. As an example, I've compressed MNIST digits down to 1 byte representations with vector quantization and still gotten nearly perfect retrieval accuracy.
2) 1M vectors is not that many. You can get easily get 1k queries per second in a single thread at a decent precision/recall just brute-force scanning through them with a SIMD approximate distance function like Bolt or Quicker ADC [1]. Also worth noting that the FAISS paper (along with a lot of other work since then) focuses mostly on 100M to billions of vectors.
3) Related to (2), I think most of these methods aren't incorporating state-of-the-art approximate distance functions yet (though I haven't dug into all of their source code). AFAICT FAISS+Quicker ADC [2] is the actual leader on x86 CPUS. Can't comment on the production-readiness of their code though.
[1] The latter is a bit faster for ANN search, though the code is more complex IIRC.
I think the Ann benchmark should pay more attention on
1. The index building speed, as this is very important in some production scenarios. Now it only says I will give 5 hours to build the index on that 1 million vectors.
2. The memory footprint, as 1m vectors are not that many. We will have to deal with billion s of vectors for chemical molecules, images and word vectors. The memory consumption will definitely impact how many servers you need.