Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How much slower was the hash table when mapped across two physical memory domains vs on the same one?

I recall sometime back that Facebook specifically went with the Xeon-D processor for exactly this reason. Since Xeon D is single socket, it prevents NUMA type of issues.



It was about 50% slower. I would have expected more- 50% of the pages were on the opposite processor, and I expect the cost of a cross-processor communication (on a busy server) would be more than 2X a local memory lookup.

We didn't even know there was a performance problem there- we just wanted to make the program faster, and ran perf, visualizing IPC and sorted by the routines with the lowest IPC (actually, we called it by the reciprocal, CPI, which I find a bit more intuitive). It sort of just provided a bright, blinking sign pointing right at the location causing a huge problem. Once that was solved, you could just select the next item on the list as the next thing to optimize :)


The Facebook server design is discussed here: https://code.facebook.com/posts/1711485769063510/facebook-s-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: