Cool. Brings back memories of implementing various jump instructions and carry p...

Cool. Brings back memories of implementing various jump instructions and carry prediction adders (do adds twice, pick result based on actual carry later) (I recall I had the smallest microprogram in the course class and it used the fewest microcycles by far... everyone hated me. :)

Would it be nice to be able to access the carry in and carry out as "variables", as well as the full imul/idiv if i.e. 32 args -> 64 bit full-precision results without dropping down into assembly. Perhaps it's unreasonable, but it seems there are few limited differences per generic instruction across processors that assuming they are all each "special unique snowflakes" is obviously untrue FUD.

Understanding architecture / assembly comes in handy to replace high-level branching code with branch-free code to avoid pipeline stalls when branch predictions (eg the branch prediction infrastructure) is guessing incorrectly. Also, being able to go down the stack is really helpful because there are most definitely bugs the way down.