Give me a break, we're talking about vectors here. I like Clojure, spent some ti...

rossjudson · on March 13, 2013

Well, no. A linear scan over a large memory array is going to crap all over the CPU caches if you have to do it more than once.

Break into blocks < CPU cache size, perform multiple stages on each block.

Having all that handy control-flow stuff makes it easier to get the block-oriented behavior you need to maximize performance, which in these cases is all about memory bandwidth.