> From there you can used pgvector's cosine distance operator for searching for ...

anon291 · on Feb 7, 2023

locality sensitive hashing is the typical method here. You generate a ton of random vectors. These vectors automatically give rise to infinite hyperplanes (the one they are normal to). Each vector/hyperplane is associated with a bit in the final hash. Then, each input vector is hashed by setting the bit if it's on one side of the hyperplane or unsetting it if it's on the other. The hamming distance between two vectors is now correlated with the cosine similarity. Or something like that.

kekub · on Feb 7, 2023

That's the reason I come to HN every day. Thanks for the explanation, which probably could not be more compact.

justjonathan · on Feb 7, 2023

That’s so smart! I read stuff like that and I’m just in awe of the cleverness.

pedrosorio · on Feb 7, 2023

You can add indexes per distance function to perform fast approximate nearest neighbor search: https://github.com/pgvector/pgvector#indexing

They mention this paper in the references: https://dl.acm.org/doi/pdf/10.1145/3318464.3386131

ec109685 · on Feb 7, 2023

They describe optimizations here as well: https://simonwillison.net/2023/Jan/13/semantic-search-answer...