Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, thanks for pointing this out. I just tried it out for myself, and indeed "rep movsb" is consistently (and sometimes significantly) faster than the standard C memcpy for aligned copies larger than 16KB or so on my Intel Core i5 (for unaligned copies, it seems to be on par). It is slightly slower or on par for smaller sizes. There is no noticeable difference between rep movsb and repmovsq.

Apparently, libc hasn't caught up to those micro-architecture changes yet :/



Depends on whose libc you're using. Good implementations definitely take advantage of rep movsb where it's fast.


I guess the libc that comes with Ubuntu doesn't count as a good implementation.


glibc is generally pretty good, but does lag commercial libc implementations somewhat when it comes to microarchitectural optimization. It's also not unheard of for Linux distros to include rather old versions of glibc. I have no idea if that's the problem in your case, but it's worth checking that you have the latest.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: