As a developer who's micro-optimized some genetic software, I can confirm that I...

eyegor · on Jan 18, 2020

If you really care about performance, you could always compile on the target machine directly via -xhost [0] or whatever the flag is on your compiler.

[0] https://software.intel.com/en-us/cpp-compiler-developer-guid...

inetknght · on Jan 18, 2020

In my case, it's GCC. The option is `-march=native -mtune=native`.

The trick though is _describing_ the scalar operations in the language and getting the compiler to understand how to efficiently vectorize them. I couldn't get GCC to do it at the time (GCC-5 if I recall, though we deployed with GCC-6); maybe it was just inexperience on my part. But I ended up writing the intrinsics by hand. To be quite honest it was my first dive into SIMD and I thought it was rather fun to do.

ncmncm · on Jan 18, 2020

-march=native implies -mtune=native.

You can say -march=native -mtune=sandybridge, but there would be no point.

You can say -march=sandybridge -mtune=native, usefully. It might go slower on a real sandybridge than if tuned for it, but would still work, and would go as fast as the smaller instruction mix allows on your build machine.

inetknght · on Jan 18, 2020

I know this. I don't care. I use `-march-native -mtune=native` specifically to point other developers on the team to the two relevant compiler options. And if they don't look, nothing's lost.

BeeOnRope · on Jan 18, 2020

Which ISA did it have?

Even the minimal AVX-512 ISA on any mainstream CPU (SKX) is pretty much a strict superset of AVX2.

inetknght · on Jan 18, 2020

> Which ISA did it have?

Business side was considering whether to buy Skylake or Broadwell.