Reminds me of https://xkcd.com/936/ I think "correct horse battery staple" has a...

josephg · on June 5, 2024

A quick Google search suggests English has about 10 bits of entropy per word. Having a long password like that can still have high total entropy I suppose, but it has a low entropy density.

kqr · on June 5, 2024

Maybe 10 bits is the average over the dictionary – which is what matters here, but over normal text it is significantly less. Our best current estimation for relatively high-level text (texts published by the EU) is 6 bits per word[1].

However, as our methods of predicting text improve, this number is revised down. LLMs ought to have made a serious dent in it, but I haven't looked up any newer results.

Anyway, all of this to say is that which words are chosen matters, but how they are put together matters perhaps more.

[1]: http://arxiv.org/pdf/1606.06996

soraminazuki · on June 7, 2024

The diceware method is supposed to generate totally random words, so it should fundamentally be unpredictable unless there's a flaw in the source of randomness.