I was appalled at the Stack Overflow answer; thankfully the author corrected it. This sort of exercise is a first homework assignment in a computer architecture course.
It's worth remembering that C is an abstraction. Even assembly is an abstraction. If you care about performance, you need to understand the limitations of the abstraction level you're working at - where your abstraction leaks, and what it is that leaks through.
This is a good article, because it illustrates the counter-intuitive aspect of computer programming. We would think initially that they both should take the same amount of time, but due to the way information is represented in the computer, the times are vastly different.
What would be even better is if the compiler could realize that the loops are independent of each other and compiled either way to the better machine code. I don't think that'd be possible with C, but possibly C# or Java could have the JIT engine recognize such a scenario and make the proper optimization.
Nice article, I know I for one can give these issues little thought when coding at a higher level of abstraction but certainly necessary knowledge for performance critical code.
It's worth remembering that C is an abstraction. Even assembly is an abstraction. If you care about performance, you need to understand the limitations of the abstraction level you're working at - where your abstraction leaks, and what it is that leaks through.