This is what SipHash was made for: it's a cryptographic MAC that's competitive in speed with other hash functions meant for hash table use. The paper describes its design, and the weaknesses with competitors like Python's randomized hashing:
Interesting, though at 140 cycles to hash 16 bytes, it's much slower than state-of-the-art hash functions, which can hash 16 bytes in just over 16 cycles. If you seed your non-cryptographic hash function with a cryptographically random number, you can get the best of both worlds: fast hashing and collision resistance against an adversary.
Not really. The SipHash page contains proof-of-concept code that breaks common good non-cryptographic randomized hashes, i.e., it generates collisions regardless of the chosen seed.
That code appears to rely on having direct access to the hash function; this seems very unlikely to be leaked to an attacker. More interesting would be code that gets the secret with only access to ordered iteration on the table, which is more likely to leak.
I don't know anything about Python's hash function, all of those benchmarks are still an order of magnitude slower than CityHash (for example), which peaks at 0.16 cycles/byte.
I would be interested to see SipHash put through SMHasher, which is the most comprehensive hash function test suite I know of: http://code.google.com/p/smhasher/
> For CityHash, MurmurHash (2 and 3) there are seed-independent collisions.
Are you referring to the "Advanced hash flooding" section of the paper, or am I missing this in another place? That analysis appears to rely on the hash function being highly linear/periodic (ie. you can always add "l" to get back the same hash value). I don't think that City, Spooky, or Murmur are nearly this linear. For example, City is based on CRC32 (ie. polynomial division).
Don't get me wrong, I'm impressed that SipHash is so fast for a cryptographic hash. But modern non-cryptographic hash functions are so fast (especially when they're CPU-accelerated, like City is on Intel thanks to the CRC32 instruction) that I don't think SipHash can claim to be the obvious choice for all hash tables.
I am interested to know though if the random seed + City/Spooky/Murmur approach really does have practical hash flooding attacks.
Wow, collisions against the best non-cryptographic hash functions that are independent of seed... color me impressed! This does then seem to strongly suggest that a cryptographic hash function is the only way to defend against hash flooding attacks. Props to the SipHash authors, this is some ground-breaking work.
I think they are, actually! The code just uses "seed" in a slightly confusing way -- it's seeding its own algorithm with "seed" but using "key" as the seed for the hash function.
Isn't Python open source? Along with a large quantity of server-side software? I'd assume everyone already knows my hash functions, I didn't make any new ones.
The strategy you propose gives immunity to the more straightforward attacks on hash tables, but in practice tends to leak information about your secret key through differences in the time taken by hash table operations. If you check out pages 14--16 of the SipHash paper, it goes into a bit more detail about this. You may also find Appendix B entertaining.
https://www.131002.net/siphash/siphash.pdf
Some code for it here:
https://www.131002.net/siphash/