That's an important detail, not all macro-ops are more complex than micro-ops, a...

Const-me · on March 2, 2023

> complexity ceiling is much higher on macro-ops than micro-ops, right?

Other examples are crc32, sha1rnds4, aesdec, aeskeygenassist - the math they do is rather complicated, yet on modern CPUs they are single micro-op each.

> one (vector) FMA operation on two (vector) registers and stores the result to RAM.

It loads from there.

> Those are undoubtedly implemented in terms of lots of micro-ops.

Indeed, but I don't think it's complexity. I think they use microcode for 2 things: instructions which load or store more than 1 value (a value is up to 32 bytes on AVX, 64 bytes on AVX512 processors), or rarely used instructions.