Many of the top-performing models that you see on the MTEB retrieval for English and Chinese tend to overfit to the benchmark nowadays. voyage-3 and voyage-3-lite are also pretty small in size compared to a lot of the 7B models that take the top spots, and we don't want to hurt performance on other real-world tasks just to do well on MTEB.
We provide retrieval metrics for a variety of datasets and languages: https://blog.voyageai.com/2024/09/18/voyage-3/. I also personally encourage folks to either test on their own data or to find an open source dataset that closely resembles the documents they are trying to search (we provide a ton of free tokens for the evaluating our models).
Many of the top-performing models that you see on the MTEB retrieval for English and Chinese tend to overfit to the benchmark nowadays. voyage-3 and voyage-3-lite are also pretty small in size compared to a lot of the 7B models that take the top spots, and we don't want to hurt performance on other real-world tasks just to do well on MTEB.