That is mind-blowing. Intuitively, a hash digest is completely uncorrelated with its input, but this shows that you can make inferences about a collection of data based on its hashes.
In retrospect, it makes sense stochastically, but still a Damn Cool Algorithm indeed.
> Other hash applications (like hash tables) don't need that property
I'm unclear on what you guys mean by correlation, but if it means what I think it does I disagree with this. A hash table ideally has an uniform distribution regardless of the input, so any structured correlation with the input will harm this goal in real-world applications.
That's not what I said. Generating a predictable distribution by hashing is old hat.
What's awesome is you can then make inferences about the original data from those hashes -- something that good hash functions are supposed to be resistant to, in isolation.
In retrospect, it makes sense stochastically, but still a Damn Cool Algorithm indeed.