Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Put the questions in some semantic embedding space. Now you’ll have a vector representing each question. Then for each question, you can sort all the questions by how far the Euclidean distance is between their vectors. Or use some clustering algorithm like k-means to find clusters.

By web search I found this to tutorial to put sentences in an embedding space: https://github.com/BramVanroy/bert-for-inference/blob/master...

I did not read this and am not endorsing it, but it looks like it’s doing roughly what I’m suggesting.



Yep, exactly this. Check out sentence-transformers https://pypi.org/project/sentence-transformers/0.3.0/, they have some great pre-trained models. Once you have the embeddings you can just compute the cosine similarity.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: