> The instructions are sufficiently different from AVX2 that any appropriate use is not as simple as sticking it behind a gate and using a smaller block size, it basically requires a completely separate (re)write to properly take advantage of.
I'd say yeah, you often need a rewrite of the core loop to take full advantage, but you can still more or less write AVX-style code in AVX-512 if you want, and take advantage of the width increase.
The main difference I think for most code is the way the comparison operators compare into a mask register. It would have been nice if they had just extended the existing compare into SIMD reg (0/-1 result) instructions too, to ease porting.
I'd say yeah, you often need a rewrite of the core loop to take full advantage, but you can still more or less write AVX-style code in AVX-512 if you want, and take advantage of the width increase.
The main difference I think for most code is the way the comparison operators compare into a mask register. It would have been nice if they had just extended the existing compare into SIMD reg (0/-1 result) instructions too, to ease porting.