> From there you can used pgvector's cosine distance operator for searching for related documents
How does this scale with the number of rows in the database? My first thoughts are that this is O(n). Does the pgvector have a smarter implementation that allows performing k-nearest neighbor searches efficiently?
locality sensitive hashing is the typical method here. You generate a ton of random vectors. These vectors automatically give rise to infinite hyperplanes (the one they are normal to). Each vector/hyperplane is associated with a bit in the final hash. Then, each input vector is hashed by setting the bit if it's on one side of the hyperplane or unsetting it if it's on the other. The hamming distance between two vectors is now correlated with the cosine similarity. Or something like that.
How does this scale with the number of rows in the database? My first thoughts are that this is O(n). Does the pgvector have a smarter implementation that allows performing k-nearest neighbor searches efficiently?