I'm skeptical that "hashing is much faster on an array the size of a power of tw...

ericsink · on July 16, 2012

IIRC, the real reason for this "power of 2" is for fast rehashing when the table grows. Calculate a 32 bit hash and store it all, but if your table currently has 1024 entries, only use 10 bits of it. Later, when your table grows to 2048 entries, you can just mask off 11 bits worth instead of recomputing every hash.

But, since the hash table is not going to grow in this particular case, any perf gain from using a power of 2 will probably be far less important.

lilyball · on July 16, 2012

He's using a power of 2 so he can use a bitwise-and to mask out his hash into the number of array indexes.

AlisdairO · on July 16, 2012

It's fairly common to keep array sizes bounded at a power of two, because when you take a hash you can use bitwise shifting instead of a modulus to determine the hash bucket to use. Depending on the performance of your hash function it can have a noticeable impact, although you're right that it's probably not a matter of 'much' faster.

lilyball · on July 16, 2012

Using a modulo with a non-power-of-2 is going to introduce modulo bias.

paul · on July 16, 2012

I bet that there's no measurable difference in this code. It just bugs me to see so much unnecessary complication based on blindly imitating an optimization that maybe made sense 20 years ago.

fizx · on July 16, 2012

I've noticed the difference between modulo and bitshift in java code written in the last five years. If it's getting 99.9+% cache hits, being used in an innermost loop, and jdk7 hasn't gotten around to JITing special cases in modulo, this could absolutely matter.

spullara · on July 17, 2012

It was 10% faster to switch from modulo to & on a recent real world problem for me.

brlewis · on July 17, 2012

I believe you. There are two reasons I think paul's right in the context of hash tables.

1. If you're using a non-trivial hash function, one more modulo calculation at the end to get the actual memory address is not a big difference.

2. If you're using a trivial hash function, I bet for a lot of data sets you'll get fewer collisions with a hash table whose size is a prime number, canceling out any benefit from calculating the address slightly faster.