Hello, this is the author. (btw Nathan is a commenter, not me) It was exactly my...

barrkel · on June 10, 2009

Except for the last bit - a prime modulus ameliorates weaknesses in the hash function.

Since most hash tables are written to support user-supplied hash functions, and users need to write hash functions to support value-based equality when storing values of custom types in hash tables, it makes a lot of sense for the hash table implementer to take this into account.

Finally, there's a trade-off: sometimes a fast hash function with some weaknesses, but using a prime modulus, can be a better choice than a slower, more thorough hash function with e.g. a power of two modulus (bitwise and).

calambrac · on June 10, 2009

If you have repetitive data in, and repetitive data out, no modulus is going to help you.

Um, no. A prime modulus will often help in this case. That's the whole point.

codexon · on June 10, 2009

Do you have any statistical evidence?

calambrac · on June 10, 2009

In your test code, change your loop to:

    for r in [3, 5, 17, 257]:
        for i in xrange(n):
            num = r * i
            table_p[num%prime] += 1
            table_c[num%composite] += 1

On here and on your blog, you keep falling back to the observation that modulo prime makes no difference when you have a uniform distribution of inputs, like you get after you apply a good hash function.

Nobody who is telling you that you're wrong would dispute that. The fundamental point that you keep ignoring is that doesn't matter, because as a library writer providing a hashtable, you don't control the hash function. Having a simple way to mitigate the ill effects of users choosing bad hash functions is a good thing. Why do you keep ignoring this point?