The repo mentions approximate NN search but the article implies this is mainly brute force. Is there any indexing at all then? If not, is the approximate part an app-space thing e.g. storing binary vectors alongside the real ones?
In addition, if things are brute forced, wouldn’t a columnar db perform better than a row-based one? E.G. DuckDB?
A columnar database is completely irrelevant to vector search. Vectors aren't stored in columns. Traditional indexing too is altogether irrelevant because brute force means a full pass through the data. Specialized indexes can be relevant, but then the search is generally approximate, not exact.
How's a database being columnar irrelevant to vector search? This very vector search extension shows that brute force search can work surprisingly well up to a certain dataset size and at this point columnar storage is great because it gives a perfect memory access pattern for the vector search instead of iterating over all the rows of a table and only accessing the vector of a row.
In addition, if things are brute forced, wouldn’t a columnar db perform better than a row-based one? E.G. DuckDB?