The disagreeable anecdote shows that the scale upon which memory efficiency matt...

btilly · on April 23, 2013

The disagreeable anecdote shows that the scale upon which memory efficiency matters is significantly greater than that of the example where the issue is raised. It matters for "very large number[s]" not when a template is used to overload a function for numeric types - it matters when memory constraints appear on the back of the envelop...or in the profile.

Wrong.

It is true that computers are fast enough that lack of performance will not be an issue unless the volume of work to be done is very, very large. However once you look at performance, you find that memory access patterns in the small matter a great deal.

For instance take a recent Intel Sandy Bridge CPU has, per core, 64KB L1 cache, 256 KB L2 cache, and 1 to 8 MB L3 cache. Half of each cache is available for code, the other half for data. If hyperthreading is turned on, these caches may be split by two concurrent processes. So if either the data accessed in, or the code for, a particular loop exceeds 32KB, you will experience a noticeable slowdown, and if possible you'd like to keep it under 16 KB. So it is better to do sequential access of large data structures, and avoid random access.

Note that is KB, not MB. Memory efficiency matters on very small data structures. You won't notice until you have a lot of data. But when you have to fix it, you have to think at multiple scales.

Templates are syntactic sugar. Cancer of the semicolon I can understand. But, why would one be fearful that a standard feature of C++ is grossly memory inefficient?

There historically have been a lot of complaints floating around about how nice templated code blew up into monstrosities when compiled. Given that, rumor control does not seem unwarranted.

yyqux · on April 23, 2013

I've seen dramatic speedups reducing an array from ~10mb to ~1mb. The problem was that the algorithm did multiple passes over the array, and each pass it would pull in blocks from main memory, only to evict them later on before they could be reused.

It depends a lot on the workload, but you'll see dramatic performance differences well before you fill up main memory.