How much slower was the hash table when mapped across two physical memory domain...

dekhn · on May 9, 2017

It was about 50% slower. I would have expected more- 50% of the pages were on the opposite processor, and I expect the cost of a cross-processor communication (on a busy server) would be more than 2X a local memory lookup.

We didn't even know there was a performance problem there- we just wanted to make the program faster, and ran perf, visualizing IPC and sorted by the routines with the lowest IPC (actually, we called it by the reciprocal, CPI, which I find a bit more intuitive). It sort of just provided a bright, blinking sign pointing right at the location causing a huge problem. Once that was solved, you could just select the next item on the list as the next thing to optimize :)

electrum · on May 10, 2017

The Facebook server design is discussed here: https://code.facebook.com/posts/1711485769063510/facebook-s-...