Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sort out the AVX-512 débâcle and maybe adopt some form of the insanely cool vector instructions arm64 uses for a start ...


> insanely cool vector instructions arm64

Which ones would those be?

SVE is somewhat interesting, but I've generally found the AVX512 instructions more innovative. I really like AVX512's "compress" and "expand" instructions, for example... as well as the classic "vpermb" (but vector-permutation has been around since SSE and is an old trick: the old pshufb instruction).

Since SVE doesn't want to "set" its SIMD-width, it seems like these permute instructions (vpermb, or even compress/expand) aren't possible?

-------

I've always enjoyed Intel's innovative new instructions: PEXT, PDEP, and now AVX512 compress and AVX512 expand.

AVX512 also includes gather/scatter (but that's not innovative, been around for a long time but still nice to see it in prosumer systems)


Compress/Expand seems like a natural fit for something like SVE since it can still be phrased rather generically and I can easily see it fitting into loops that are written generically over vector length.

Free-form permutation does indeed seem like less of a fit. Though it still makes sense to define a minimum vector length of N for the ISA and support permutation ops that apply the same permutation on groups of N lanes.


Can you expand on why you find AVX512 instructions more innovative? I haven't had a chance to try SVE yet, but on paper it sounds very innovative and offers a wide range of new capabilities.

Gather/scatter have been around for a while, but it hasn't been until more recent Intel uarch that their cost makes them worth using in practice. Zen3 is still lagging quite a bit.


I've seen real-life situations in the past 5 years (albeit with my personal hobby code, nothing professionally), where VCOMPRESSPS or VEXPANDPS would quickly and simply solve my problem.

I personally would have never thought of making such an instruction, despite having written multiple sets of code that use a SIMD-compress or SIMD-expand pattern.

-------

Case in point, vpcompressb (byte-wise compress) is the most blatantly obvious way to "remove redundant XML whitespace" that I've ever seen.

Its just a thing that has obvious wide-spread applicability to many algorithms I've seen and keeps coming up again-and-again. Or determining which rays (in a raytracer) are "dead" vs "alive" (separating out hits vs misses). Or implementing quicksort (compress all items "less than pivot" to X array. Compress all items "greater than pivot" to a Y array. Quicksort done).


SVE is interesting but I'm surprised it actually works. A vector instruction set where you can change the vector width sounds like the classic CISC instruction that ends up unusably slow because it's microcoded. And yet ARM has it and x86 doesn't?

Also, it's an implementation choice to let you set SVE widths that aren't a power of two?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: