Shouldn't a superscalar CPU figure out concurrent execution chains by itself wit...

jdsully · on April 25, 2020

It will do a bit of that, but remember it has to work in real time and and can only look ahead so far. Give it a fighting chance by helping out whenever you can.

manaskarekar · on April 25, 2020

Tangent, but this reminds me of this great talk by Matt Godbolt from CppCon 2017: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid”

https://youtu.be/bSkpMdDe4g4.

And the follow up https://youtu.be/nAbCKa0FzjQ.

battery_cowboy · on April 25, 2020

It's comments like these that make me realize I have a lot to learn about computers, I understand those words separately but together it sounds like a line from star trek.

celrod · on April 26, 2020

CPUs can execute multiple instructions per clock cycle (recent x86_64 can do 4-5). However, instructions can take 1 or (commonly) several clock cycles to complete, before their results are available for any instructions depending on said result to start executing. Such dependencies are the babe of superscalar parallelism.

But sometimes things that look like dependencies are fake.

XMM0 = XMM1 + XMM2 XMM3 = XMM0 + XMM4 XMM0 = XMM5 + XMM6 XMM7 = XMM0 + XMM8

At first glance it may look like you have to calculate these instructions serially. But, by renaming the last two `XMM0` you eliminate the dependency on the specific register, and can calculate instructions 1 and 3 in parallel, followed by 2 and 4 in parallel.

bluGill · on April 26, 2020

Cpus are designed with compiler optimizer experts in the loop. If the compiler can do it the cpu won't try. Instead the cpu does things that the optimizer can't do. Note that this goes both ways, if an optimizer won't use something the cpu won't do it, if the optimizer wants something the cpu will do it. (obviously within the limits of what is possible, and all the other trade offs)

Regiser renaming is designed assuming the optimizer

vardump · on April 25, 2020

Except all those times it fails miserably. Like doing this to AVX SIMD.