Sort out the AVX-512 débâcle and maybe adopt some form of the insanely cool vect...

dragontamer · on May 2, 2022

> insanely cool vector instructions arm64

Which ones would those be?

SVE is somewhat interesting, but I've generally found the AVX512 instructions more innovative. I really like AVX512's "compress" and "expand" instructions, for example... as well as the classic "vpermb" (but vector-permutation has been around since SSE and is an old trick: the old pshufb instruction).

Since SVE doesn't want to "set" its SIMD-width, it seems like these permute instructions (vpermb, or even compress/expand) aren't possible?

-------

I've always enjoyed Intel's innovative new instructions: PEXT, PDEP, and now AVX512 compress and AVX512 expand.

AVX512 also includes gather/scatter (but that's not innovative, been around for a long time but still nice to see it in prosumer systems)

atq2119 · on May 2, 2022

Compress/Expand seems like a natural fit for something like SVE since it can still be phrased rather generically and I can easily see it fitting into loops that are written generically over vector length.

Free-form permutation does indeed seem like less of a fit. Though it still makes sense to define a minimum vector length of N for the ISA and support permutation ops that apply the same permutation on groups of N lanes.

mochomocha · on May 2, 2022

Can you expand on why you find AVX512 instructions more innovative? I haven't had a chance to try SVE yet, but on paper it sounds very innovative and offers a wide range of new capabilities.

Gather/scatter have been around for a while, but it hasn't been until more recent Intel uarch that their cost makes them worth using in practice. Zen3 is still lagging quite a bit.

dragontamer · on May 2, 2022

I've seen real-life situations in the past 5 years (albeit with my personal hobby code, nothing professionally), where VCOMPRESSPS or VEXPANDPS would quickly and simply solve my problem.

I personally would have never thought of making such an instruction, despite having written multiple sets of code that use a SIMD-compress or SIMD-expand pattern.

-------

Case in point, vpcompressb (byte-wise compress) is the most blatantly obvious way to "remove redundant XML whitespace" that I've ever seen.

Its just a thing that has obvious wide-spread applicability to many algorithms I've seen and keeps coming up again-and-again. Or determining which rays (in a raytracer) are "dead" vs "alive" (separating out hits vs misses). Or implementing quicksort (compress all items "less than pivot" to X array. Compress all items "greater than pivot" to a Y array. Quicksort done).

astrange · on May 3, 2022

SVE is interesting but I'm surprised it actually works. A vector instruction set where you can change the vector width sounds like the classic CISC instruction that ends up unusably slow because it's microcoded. And yet ARM has it and x86 doesn't?

Also, it's an implementation choice to let you set SVE widths that aren't a power of two?