Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reminds me of https://xkcd.com/936/ I think "correct horse battery staple" has a low entropy, since it is just ordinary looking words (strings).


A quick Google search suggests English has about 10 bits of entropy per word. Having a long password like that can still have high total entropy I suppose, but it has a low entropy density.


Maybe 10 bits is the average over the dictionary – which is what matters here, but over normal text it is significantly less. Our best current estimation for relatively high-level text (texts published by the EU) is 6 bits per word[1].

However, as our methods of predicting text improve, this number is revised down. LLMs ought to have made a serious dent in it, but I haven't looked up any newer results.

Anyway, all of this to say is that which words are chosen matters, but how they are put together matters perhaps more.

[1]: http://arxiv.org/pdf/1606.06996


The diceware method is supposed to generate totally random words, so it should fundamentally be unpredictable unless there's a flaw in the source of randomness.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: