While reading I was expecting to hit moving/copying/compacting GC discussion at ...

skybrian · on Nov 30, 2019

Some memory allocators allocate objects of different sizes from different blocks, which can be far away from each other. This reduces fragmentation, but I'm wondering how much data locality we typically get, even in the best case, for composite objects composed of different-sized parts?

It seems like it will vary a lot depending on details of the language implementation. An array of value types should be laid out linearly, but beyond that it seems hard to tell what's going on?

cracauer · on Dec 1, 2019

I am a friend of simple garbage collectors. Let initial placement be decided by running code.

Let later placement be decided by the order in which the heap is scavenged. That is what e.g. re-unifies array-of-pointer target instances. You scavenge the array of pointers and in the default case you line up the instances the pointers point to beautifully one after another. Even if the initial allocation had interleaving other memory. In that case your code gets faster after GC than before the data is first GCed.

skybrian · on Dec 1, 2019

I don't see how we can know whether ordering things in memory by breadth-first or depth-first traversal will make things faster, though? It seems like it depends whether it's more common to iterate over the array or access single elements of it. And when iterating over the array, how deep does the code go when looking at each array element?

BubRoss · on Nov 30, 2019

When you say locality, you mean moving memory allocations next to each other so that there is no wasted space between them right?

cracauer · on Dec 1, 2019

Not wasted space. Unreleated space would do.

Keep in mind that you only have a tiny amount of L1 data cache lines. They are gone so quickly. If you can get a couple more struct instances in an array into those cache lines (without the cache lines holding unrelated nonsense as a byproduct of a memory fetch) that is a huge win.

The issue of L1 cache lines is more important than the size of the L1 cache. The granularity of the cache lines uses up the size of the cache very quickly if all cache lines are padded up with 3/4rd nonsense that you don't need right now.

c-smile · on Nov 30, 2019

> no wasted space between them

That too but also, if you have array of objects, then compacting GC will replace those objects in continuous area in memory. So conceptually it will optimize access to those objects. Cache prefetching by modern processors, all that.