Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, it's otherwise automatic, except I just have to write a selector routine that tries to decide the best performing routine to run at runtime and implementation for each individual case with varying hardware support.


You only need to go to all that trouble if you want high performance across a variety of machines. If you are merely after bragging rights or trying to satisfy someone else's requirement, the theory is that you can compile the exact same piece of high level code using different optimization targets, and the compiler will do all the work for you, providing maximum performance for each instruction set practically for free...

Even more practically, Agner has a typically excellent description of the strengths and weaknesses of the dispatch strategies used by different compilers in Section 13 (p. 122) here: http://www.agner.org/optimize/optimizing_cpp.pdf


Well, you can write that code that tries to decide, or you can let GCC emit that code for you: https://gcc.gnu.org/wiki/FunctionMultiVersioning

All you need to do is figure out what's the best version of that function to write for that target, and let GCC do the rest of the heavy lifting.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: