If I could choose, I would like everything to run at the max turbo frequency all...

ComputerGuru · on Jan 17, 2020

The never-ending Skylake is/was a real problem. Intel was slowly adding features in a manner where it made sense to target last n generations but then all that came to a perpetual stop and suddenly we have this new extension that you can only really use on the very latest and most expensive, with virtually no backwards compatibility.

The instructions are sufficiently different from AVX2 that any appropriate use is not as simple as sticking it behind a gate and using a smaller block size, it basically requires a completely separate (re)write to properly take advantage of.

BeeOnRope · on Jan 17, 2020

> The instructions are sufficiently different from AVX2 that any appropriate use is not as simple as sticking it behind a gate and using a smaller block size, it basically requires a completely separate (re)write to properly take advantage of.

I'd say yeah, you often need a rewrite of the core loop to take full advantage, but you can still more or less write AVX-style code in AVX-512 if you want, and take advantage of the width increase.

The main difference I think for most code is the way the comparison operators compare into a mask register. It would have been nice if they had just extended the existing compare into SIMD reg (0/-1 result) instructions too, to ease porting.

eyegor · on Jan 18, 2020

> it basically requires a completely separate (re)write to properly take advantage of.

Why? At a higher level of abstraction, you can dispatch simd instructions at the max width available. At least, that's how I work with vectorized code. Still see gains on avx512.

blattimwind · on Jan 18, 2020

> It's the most important ISA extension since, well, I'm not sure: a long time (probably AVX and AVX/2 combined would have a similar impact).

IMHO the most important ISA extension since AMD64 was AES-NI, which moved a major consumer of CPU time into the also-rans.