I don't really believe you ran either the Mojo or the Julia code. There's no way your single-threaded C code outperformed multi-threaded simd optimized Julia or Mojo. It's flat out impossible.
The only other explanation is if you ran the non-simd Julia version under a single thread.
I did. Running with threads improves performance by 50%, but is still nowhere near C performance. My machine only has two cores so threading doesn't help much.
That's interesting. It makes sense that a two core machine doesn't benefit too much from multithreading, but "nowhere near C performance" is pretty surprising. I'll try out both the programs around this weekend on my own fairly anaemic machine, and see how they fair for me. Thanks for responding!
Cool. If Julia runs much faster for you than for me I'd be interested in hearing it. I was honestly surprised the performance was so bad so perhaps I did something wrong.
The only other explanation is if you ran the non-simd Julia version under a single thread.