Put the questions in some semantic embedding space. Now you’ll have a vector rep... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		fshbbdssbbgdd on Aug 29, 2021 \| parent \| context \| favorite \| on: Ask HN: What problem are you close to solving and ... Put the questions in some semantic embedding space. Now you’ll have a vector representing each question. Then for each question, you can sort all the questions by how far the Euclidean distance is between their vectors. Or use some clustering algorithm like k-means to find clusters. By web search I found this to tutorial to put sentences in an embedding space: https://github.com/BramVanroy/bert-for-inference/blob/master... I did not read this and am not endorsing it, but it looks like it’s doing roughly what I’m suggesting.

johnmoberg on Aug 29, 2021 [–]

Yep, exactly this. Check out sentence-transformers https://pypi.org/project/sentence-transformers/0.3.0/, they have some great pre-trained models. Once you have the embeddings you can just compute the cosine similarity.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact