Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Seems like a step backward. If the constraint was the number of bits in the mask then each bit could have represented a register pair.


The constraint absolutely was not that, it was that they don't want to have an instruction that can load more than two cache lines worth of data, or that can write to more than two registers.


Limiting the registers to be specific pairs is awful if you're implementing a register allocator.


Is it? These are stack management instructions; you know in advance from the platform ABI which registers are "scratch" and which are "save", so if you allocate any of the "save" registers in your function you emit the corresponding push/pop pairs in the start and end of the function. At worst you push/pop a register you don't need to - but in ARM64, the stack has to be 128-bit aligned, so you have to push/pop pairs.


Then your allocator would need to know that if it's already decided to use one register of a pair that the other half of the pair gets the save/restore for 'free' and is now better than using a different callee-saves register. I suspect that unless your allocator was designed from the start to be able to deal with that kind of "my choice of register here affects costs and thus my decisions about allocation for a completely different value over there" it's not going to be able to do a great job under that kind of constraint.


Callee-saved registers are saved on function entry, all at once. There is no interaction other than the register allocation step choosing how many values need to be preserved across function calls.


OK, but that only works for prologue/epilogue --- it's better if you can use the instructions elsewhere too.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: