It's comments like these that make me realize I have a lot to learn about computers, I understand those words separately but together it sounds like a line from star trek.
CPUs can execute multiple instructions per clock cycle (recent x86_64 can do 4-5).
However, instructions can take 1 or (commonly) several clock cycles to complete, before their results are available for any instructions depending on said result to start executing. Such dependencies are the babe of superscalar parallelism.
But sometimes things that look like dependencies are fake.
At first glance it may look like you have to calculate these instructions serially. But, by renaming the last two `XMM0` you eliminate the dependency on the specific register, and can calculate instructions 1 and 3 in parallel, followed by 2 and 4 in parallel.