Put the questions in some semantic embedding space. Now you’ll have a vector representing each question. Then for each question, you can sort all the questions by how far the Euclidean distance is between their vectors. Or use some clustering algorithm like k-means to find clusters.
Yep, exactly this. Check out sentence-transformers https://pypi.org/project/sentence-transformers/0.3.0/, they have some great pre-trained models. Once you have the embeddings you can just compute the cosine similarity.
By web search I found this to tutorial to put sentences in an embedding space: https://github.com/BramVanroy/bert-for-inference/blob/master...
I did not read this and am not endorsing it, but it looks like it’s doing roughly what I’m suggesting.