> The instructions are sufficiently different from AVX2 that any appropriate use...

> The instructions are sufficiently different from AVX2 that any appropriate use is not as simple as sticking it behind a gate and using a smaller block size, it basically requires a completely separate (re)write to properly take advantage of.

I'd say yeah, you often need a rewrite of the core loop to take full advantage, but you can still more or less write AVX-style code in AVX-512 if you want, and take advantage of the width increase.

The main difference I think for most code is the way the comparison operators compare into a mask register. It would have been nice if they had just extended the existing compare into SIMD reg (0/-1 result) instructions too, to ease porting.