Tragically LOOP is alive and well in x86-64.

monocasa · on Oct 15, 2020

EDIT: Blergh, confused LOOP with REP, but keeping the below comment so the rest of the thread still makes sense.

FWIW LOOP isn't the worse thing in the world once you have dedicated silicon for it anyway generating micro ops in the instruction decode pathway. It's just a pretty cute run length encoding scheme for the instruction stream.

Tuna-Fish · on Oct 15, 2020

It's slow as sin, though. Just straight emulating it using more common instructions is like 4x better in most modern Intel CPUs. For some insane reason, it emits 8 uops on Skylake.

wolfgke · on Oct 15, 2020

There is a reason why loop is (was made) slow: It was (in the 90s) explicitly made slow because it was used for timing loops. Making it faster would have broken existing software.

Source: https://stackoverflow.com/a/35743699

See also https://stackoverflow.com/questions/35742570/why-is-the-loop...

CJefferson · on Oct 15, 2020

The things you link to don't seem to say that at all, they seem to say it got slow because it was hard to implement and no-one cared about it.

wolfgke · on Oct 15, 2020

"IIRC LOOP was used in some software for timing loops; there was (important) software that did not work on CPUs where LOOP was too fast (this was in the early 90s or so). So CPU makers learned to make LOOP slow."

"(My opinion: Intel is probably still making it slow on purpose, and hasn't bothered to rewrite their microcode for it for a long time. Modern CPUs are probably too fast for anything using loop in a naive way to work correctly.)"

userbinator · on Oct 15, 2020

It's also very fast on AMD (not any slower than the equivalent dec/jnz), so use it if you want your software to run faster on AMD and slower on Intel...

monocasa · on Oct 15, 2020

Sure, it doesn't matter anymore because anyone who cares is going through the vector unit to do bulk transfers. But there was issues with doing unaligned base and length memory transfers for the longest time, well through x86_64's original design.

gpderetta · on Oct 15, 2020

are you confusing LOOP with the REP prefix?

monocasa · on Oct 15, 2020

I absolutely am, thanks!

roca · on Oct 16, 2020

My gripe with LOOP and other crappy instructions is that they use up valuable space in the instruction encoding.