Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Flags don't have to add an extra implicit input/output everywhere. Both ARM and PowerPC avoid updating the flags unless explicitly requested.

Besides fixed-size instructions and the traditional variable-size instructions, one can do variable-size instructions in bundles. An example would be 25-bit and 50-bit instructions packed into 128-bit bundles, with the remaining 3 bits used to specify all the sizes. (eight patterns: nnnnn, nww, wnw, wwn, nnnw, wwnw, wnww, wnnn) Extending that out to a typical cache line of 512 bits might be better. Another option is to use 1 of every 16 bits to indicate where instructions start.

Where RISC-V got wasteful was the registers. Compilers are seldom able to use anywhere near 32 registers. On normal code, normal compilers seem to need about 8 to 10 registers free after deducting the ones reserved by the ABI. The ABI might need 3 to 5 registers. (stack, PLT, GOT, TLS, etc.) That means that roughly 11 to 15 registers are needed. Clearly, 4 bits (16 registers) is enough. Shoving some of those ABI-reserved registers out of the general-purpose set wouldn't be a bad idea; most of those are just used for addressing.



> Flags don't have to add an extra implicit input/output everywhere. Both ARM and PowerPC avoid updating the flags unless explicitly requested.

Well ultimately they do, not updating the flags means more options for the compiler (things can be scheduled in between the compare and jump), although with cmp+jmp fusion that's now a bad idea making the concept dates.

Ultimately each instruction pending execution in an OOO core needs to sit somewhere waiting for its inputs to be available. If you are x86 and you suggest cmov then potentially you need to wait for 3 registers and flags, meaning every slot in this structure needs to be capable of waiting for 4 things to happen before becoming ready. In RISC-V you only need to wait for 2 things for any instruction.


> Flags don't have to add an extra implicit input/output everywhere. Both ARM and PowerPC avoid updating the flags unless explicitly requested.

You mean things like having variants of common arithmetic instructions that update or don't update flags?

> Besides fixed-size instructions and the traditional variable-size instructions, one can do variable-size instructions in bundles. An example would be 25-bit and 50-bit instructions packed into 128-bit bundles, with the remaining 3 bits used to specify all the sizes. (eight patterns: nnnnn, nww, wnw, wwn, nnnw, wwnw, wnww, wnnn) Extending that out to a typical cache line of 512 bits might be better. Another option is to use 1 of every 16 bits to indicate where instructions start.

Yeah, something like that could be nice. Though how would jump instructions be encoded? Bundle + offset within bundle?

> Where RISC-V got wasteful was the registers. Compilers are seldom able to use anywhere near 32 registers. On normal code, normal compilers seem to need about 8 to 10 registers free after deducting the ones reserved by the ABI. The ABI might need 3 to 5 registers. (stack, PLT, GOT, TLS, etc.) That means that roughly 11 to 15 registers are needed. Clearly, 4 bits (16 registers) is enough. Shoving some of those ABI-reserved registers out of the general-purpose set wouldn't be a bad idea; most of those are just used for addressing.

Nah, I think 32 registers was a good choice. (Relatively) common loop optimizations like unrolling or pipelining need more registers. Also, some of those registers are callee saved and some are call clobbered; by making use of this information the compiler can avoid spilling and reloading of registers around function calls.

For x86-64 16 registers is fine, partly because in many cases one can operate directly on memory without needing to explicitly load/store to architectural registers, and partly because the target was and is OoO cores that aren't as dependent on those register-consuming compiler optimizations.


It is common to have a bit which causes an instruction to update flag bits. PowerPC arithmetic instructions have an "Rc" field, usually the LSB, indicated by a trailing "." in the assembly syntax. ARM arithmetic instructions have an "S" field, usually bit 20, indicated by a trailing "S" in the assembly syntax.

Bundle + offset is fine. The offsets don't need to be real. In the example given with 25-bit and 50-bit instructions, allowable low nibbles of instruction addresses might be: 0 1 2 3 4 (so it goes 0x77777773, 0x77777774, 0x77777780, 0x77777781, etc.)

I disassemble binary executables as my full-time job. I've dealt with over a dozen different architectures. I commonly deal with PowerPC, ARM, MIPS, x86-64, and ColdFire. The extra registers of PowerPC and MIPS are always wasted. Even with ARM and x86-64, unused registers are the norm. It simply isn't normal for a compiler to be able to make effective use of lots of registers. Surely there is an example somewhere that I haven't yet seen, but that would be highly abnormal code.

If more registers could be used by compilers, the Itanium would have been a success.


I'm not an expert so pardon any ignorance, but couldn't compilers be acting conservative about registers due to the "long shadow" of x86? Perhaps the modest increase of 8 registers for x64 didn't cause compiler developers to ever start considering registers as a generally abundant resource, thus constraining their designs.

> If more registers could be used by compilers, the Itanium would have been a success.

I kind of feel the Itanic never made it far enough for its register count to have mattered to anyone. I wonder if SPARC would be a better comparison... it was somewhat popular in the 90's and 00's, and was a RISC chip with oodles of registers, wasn't it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: