Bounds checking gets in the way for simple cases but anything even slightly more complicated needs to be written extremely carefully in any language, for autovectorisation to work. It is like, essentially, writing explicit SIMD without using intrinsics and without guarantees it will work as desired.
And, that is assuming that the autovectoriser is able to synthesise the desired instructions, e.g. I believe SSE2's packssdw & packusdw ("pack with signed/unsigned saturation") and pmaddwd ("multiply and add packed integers") are useful in a JPEG codec but I find it extremely unlikely that any compiler will autovectorise to them.
There are thousands of vendor intrinsics and no compiler that I'm aware of is able to just automatically use all of them in a reliable way. The idea that "Rust needs explicit SIMD due to bounds checking" is very wrong.
Because SIMD ins throughput is highly processor specific? Rust will also not 'automatically use all of them' there is no magic abstraction would make any compiler use some of the really fancy and useful SIMD ins.
I don't know what you're talking about unfortunately. My statement about compilers and SIMD isn't Rust-specific. My point was that "rust needs explicit SIMD due to bounds checking" is factually wrong.
No it isn't, it is one of the reasons that rust is getting SIMD, if it cannot eluide the bounds checking then obviously it will not vectorize the code in question.
I'm one of the people working on adding SIMD to Rust, so I'm telling you, you're wrong. If you want better vectorization and bounds checking is standing in your way, then you can elide the bounds checks explicitly. That doesn't require explicit SIMD.
How do you safely elide bounds for something the compiler cannot reason about? How would Rust handle SIMD differences when trying to generate specific code as you would in C?
> How do you safely elide bounds for something the compiler cannot reason about?
Who said anything about doing it safely? You can elide the bounds checks explicitly with calls to get_unchecked (or whatever) using unsafe.
> How would Rust handle SIMD differences when trying to generate specific code as you would in C?
Please be more specific. This question is so broad that it's impossible to answer. At some levels, this is the responsibility of the code generator (i.e., LLVM). At other levels, it's the responsibility of the programmer to write code that checks what the current CPU supports, and then call the correct code. Both Clang and gcc have support for the former using conditional compilation, and both Clang and gcc have support for the latter by annotating specific function definitions with specific target features. In the case of the latter, it can be UB to call those functions on CPUs that don't support those features. (Most often the worse that will happen is a SIGILL, but if you somehow muck of the ABIs between functions, then you're in for some pain.) The plan for Rust is to basically do what Clang does.
The question of safety in Rust and SIMD is a completely different story from auto-vectorization. Figuring out how to make calling arbitrary vendor intrinsics safe is an open question that we probably won't be able to solve in the immediate future, so we'll make it unsafe to call them.
And even that is all completely orthogonal to a nice platform independent SIMD API (like you might find in Javascript's support for SIMD[1]), since most of that surface area is handled by LLVM and we should be able to enable using SIMD at that level in safe Rust.
And all of that is still completely and utterly orthogonal to whether bounds checks are elided. Even with the cross platform abstractions, you still might want to write unsafe code to elide bounds checks when copying data from a slice into a vector in a tight loop.
I want it to turn it into SIMD instructions. What I don't want is to write classes, functions, or loops that have to be automatically converted to SIMD. I want a simple built-in type for these three size vectors. I also mentioned that they should be passed by value and be able to be returned by value in a (SIMD) register. This is the most efficient way to write and execute vector math.
Can't a library just add that? Make some types, implement some functions and/or overload some ops. If you're defining the special type anyway, I'm not sure why it has to be built in.