Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That is mind-blowing. Intuitively, a hash digest is completely uncorrelated with its input, but this shows that you can make inferences about a collection of data based on its hashes.

In retrospect, it makes sense stochastically, but still a Damn Cool Algorithm indeed.



[deleted]


> Other hash applications (like hash tables) don't need that property

I'm unclear on what you guys mean by correlation, but if it means what I think it does I disagree with this. A hash table ideally has an uniform distribution regardless of the input, so any structured correlation with the input will harm this goal in real-world applications.


Python's hashes for small integers never collide.

Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)

>>> map(hash, range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


> ... you can make inferences about a collection of data based on its hashes.

Quite a few cryptographers have learned this the hard way!


Isn't that the point? That hashing maps an arbitrary set of numbers to an approximately uniform scale-invariant distribution.


That's not what I said. Generating a predictable distribution by hashing is old hat.

What's awesome is you can then make inferences about the original data from those hashes -- something that good hash functions are supposed to be resistant to, in isolation.


It works here because we only care whether two bits of data are equal -- and a hash function had better preserve that relationship!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: