Even for regular in-order cores, it makes branch prediction a massive pain becau...

Even for regular in-order cores, it makes branch prediction a massive pain because now your fast frontend predictors need to essentially fully decode the instruction in order to determine if it can be considered a branch. Most other ISAs make this simple because there are only a few opcodes that change control flow and so you can very easily just stuff that in your early frontend decoder.

RISCV unfortunately didn't quite do this well since return uses the same opcode for call, return, and indirect branch and so you have to fully decode the instruction in order to determine whether you should use the RAS or your other predictors. This isn't a problem that can't be overcome (next line predictors help a lot for these early predictions) but it makes something very performance critical just that much harder.