Your issue with intrinsics are that the different ISAs have different specs. Fine. But if that was your only issue, then your use case is that you're manually vectoring hot loops correct?
Assuming that's true, you want to maximise performance by using as much of the parallelism that vectors give you. So if you're dealing with [4 x int32], on a 128 bit vector ISA you would be fully utilising your vector registers, but moving to say AVX-512 you're now only using 1/4 of your potential parallelism.
Your architecture independent vector types would have to target the lowest common denominator, and completely defeat the purpose of vectorisation.
I want to do math with 2,3, or 4-element vectors. These will typically represent 2d or 3d coordinates or velocities. 4-element vectors may be used for homogeneous coordinates or similar. My point is that these are very common mathematical entities and should be explicitly supported by the language.
How these map to any particular processors resources is not my problem - though the three major vector extensions today all have 4-element vectors. Some support more, but that's not terribly relevant to the math I want to do. A smart compiler could pack multiple small vectors into a wide vector register just like they try to pack multiple scalars in there today.
I am not interested in vectorizing loops. I want to write code like this:
Vec3double position = {5.8,3.9,2.1};
Vec3double velocity = {1.0,0.0,0.0};
double timestep = 0.01;
position += timestep*velocity;
and so on. Yes, I also want the common use case of multiplication of vector and a scalar to be that easy.
Any paint program or graphics library (including font rendering) does a ton of this stuff. So does every 2d or 3d physics engine. Ray tracing. FEA software. CAD. The list of uses for these vector sizes is long and has nothing to do with auto-vectorizing loops. Of course there are plenty of applications where loop vectorization is valuable and I don't want to take anything away from that. I just want built-in support for these common mathematical entities in the base language.
In another comment you say that you want these to turn into SIMD instructions. FYI vectorizing vectors of elements in an array of structs fashion is usually less performant than structures of arrays.
On ARM you have specialised load and store instructions that can de-interleave into vector registers such that register a contains VecType.x and register b has VecType.y etc, but are a bit slower.
If you don't care about SIMD performance then fair enough, but if you care enough about this issue to want the compiler to generate SIMD instructions, you better be willing to change your code to be performant on your particular target because even small changes can impact whether or not it's worth vectoring vs leaving it as scalar code.
Assuming that's true, you want to maximise performance by using as much of the parallelism that vectors give you. So if you're dealing with [4 x int32], on a 128 bit vector ISA you would be fully utilising your vector registers, but moving to say AVX-512 you're now only using 1/4 of your potential parallelism.
Your architecture independent vector types would have to target the lowest common denominator, and completely defeat the purpose of vectorisation.