Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's comments like these that make me realize I have a lot to learn about computers, I understand those words separately but together it sounds like a line from star trek.


CPUs can execute multiple instructions per clock cycle (recent x86_64 can do 4-5). However, instructions can take 1 or (commonly) several clock cycles to complete, before their results are available for any instructions depending on said result to start executing. Such dependencies are the babe of superscalar parallelism.

But sometimes things that look like dependencies are fake.

XMM0 = XMM1 + XMM2 XMM3 = XMM0 + XMM4 XMM0 = XMM5 + XMM6 XMM7 = XMM0 + XMM8

At first glance it may look like you have to calculate these instructions serially. But, by renaming the last two `XMM0` you eliminate the dependency on the specific register, and can calculate instructions 1 and 3 in parallel, followed by 2 and 4 in parallel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: