Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's always a trade-off, but in this case the cost is minimal. If your program is limited by the speed of memmove/memcpy(), and you absolutely must copy (rather than alias, or whatever), you probably want to use a 128-bit aligned, widely unrolled SSE copy or something like that. That is, take advantage of the constraints of your precise situation. You can't do that with an API as generic as memcpy().


In newer versions of glibc, memcpy() does take advantage of SIMD instructions if they are avaialble. It works something like this:

The memcpy() function is marked specially in the ELF file. On the first invocation, the dynamic linker actually calls the function and then takes the return value and treats it as a function pointer. This function pointer is then used for linking, so that subsequent invocations will call that pointer instead.

The memcpy() function in glibc then simply checks which SIMD extensions are available and returns a pointer to the appropriate real memcpy to use.


Yep, except it still needs to check the alignment of the pointers passed into the function to see if it can use the aligned mov instructions. If you're that stuck for speed, you'll want to make sure all your buffers are already aligned, and then call a function that doesn't do any checking.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: