What a bizarre article. Garbage collection is several orders of magnitude slower...

berkut · on June 23, 2012

Of course it is.

Programs are rarely CPU-bound from a number of instructions-per-second point-of-view. 80% of the time I've profiled high-performance applications, it's been memory allocations and deallocations that have been the issue.

Garbage collectors still aren't (and I doubt they ever will be for all usage cases) smart enough to work out things like how to organise memory struct alignment so it's aligned to L1/L2 cache boundaries, exact pre-allocation size (for things like slab allocators for loads of small allocations), thread contention with concurrency while allocating memory... And that's ignoring things like even being able to allocate onto the stack which things like Java can't even do for anything other than base primitive types.

A garbage collector might be able to pick up some of these things to a primitive degree, but certainly not on the first run of the code, which means the first run will be slow.

Games and embedded systems often allocate a fixed size of memory up front and NEVER free it, re-using it instead.

qznc · on June 23, 2012

Garbage collection is not that bad.

Memory allocation being a bottleneck in high-performance applications? I'm not sure what your "high-performance" means (HPC? Games?), but profile an application without garbage collection and 80% of the time a significant part of the run time will be within malloc and free.

A garbage collector might even be more efficient, since it can delay/avoid the management. When will a GC actually do a collection run? At the moment, when it cannot provide an allocation request from its pool. A program which needs little memory might never hit this barrier. This is like a never calling 'free'. Of course, whether this can be done can only be decided at runtime.

There is nothing in a garbage collector which prevents alignments to cache boundaries, preallocation, or stack allocation. In the case of Java the Hotspot JVM can allocate objects on the stack. However, this is a compiler optimization in this case and cannot be directly controlled by the programmer. The upside is that the programmer cannot introduce memory bugs.

If you write a game with Java you can allocate a fixed size of memory up front as well. I do not see a problem with garbage collection here.

berkut · on June 23, 2012

> Memory allocation being a bottleneck in high-performance applications? I'm not sure what your "high-performance" means (HPC? Games?)

Games, 3D raytracers, particle systems, fluid simulations. That's my experience, and in every one, GC would be a complete no-no for at least the main core algorithms. Games often use Lua or Python as the gameplay language (scripting events), but the number of times I know of those parts being re-written in C++ due to issues with memory allocation in the language is significant.

loup-vaillant · on June 23, 2012

Alan Kay once mentioned in passing (in Early History of Smalltalk I think) that newer processors tend to be optimized for languages like C. Of course it would be difficult to make a garbage collector perform well on such platforms.

Really, we have it backward. The question shouldn't be which languages run faster on current platforms, but which languages are easier to use (depends on the problem of course). Once you know which programming patterns humans best deal with, you can optimize the implementation stack all the way to the Nand gates. It's a pity, a shame, that we currently have to stop before touching the silicon.

rbanffy · on June 24, 2012

The sad fact is that most current computers are built to run Windows, meaning, yes, their processors are optimized to run code written in C. This is not entirely bad, because it helps make unixes run efficiently too, but we cannot expect anything very revolutionary at the ISA level unless we are willing too burn a couple billions. Azul Systems July processors designed to run Java, and, in particular, to run garbage collection efficiently, but it is so costly to build silicon they left the hardware business.

loup-vaillant · on June 25, 2012

> Azul Systems July processors designed to run Java, and, in particular, to run garbage collection efficiently

That sounds really interesting. Do we know what kind of feature made these processors more suitable to Java byte-code and garbage collection? How did it fare compared to then current mainstream processors? Can we speculate on how it would have fared if Intel or AMD build this kind of processor instead?

shrughes · on June 24, 2012

> Once you know which programming patterns humans best deal with, you can optimize the implementation stack all the way to the Nand gates.

This is a complete hand-waving fantasy.

gaius · on June 24, 2012

Not at all. You only need to look at even simple CPU specs like number and type of registers to see this kind of thinking in action, and this has been going on for a long time.

deadc0de · on June 24, 2012

Not really. Allocation and deallocation for a system that does not support object movement (and hence compaction) is inherently non-trivial. In most modern GCs allocation is bump-the-pointer with TLABs, you can get cheaper than that. If the first generation GC is a copying GC you pay absolutely nothing for collecting a dead object. Compare that to explicit allocation/deallocation schemes.

The biggest problem with GC however is not the cost of allocation/deallocation but the fact that it's pretty hard to make GCs incremental at a very fine-grain level.

Java, btw, at least with Hotspot JVM, does escape analysis, that will allocate non-escaping objects on stack. Heck, these objects may even just end up in a register (with unused data members discarded).

pcwalton · on June 24, 2012

Nit: Collecting a dead object is not free with a copying GC. You still have to scan it during the Cheney scan or mark phase.

rayiner · on June 23, 2012

The garbage collector has nothing to do with things like memory struct alignment. And garbage collector makes concurrent allocation easier, not harder. And garbage collection doesn't prevent stack allocation. You seem to have no idea what you're talking about.

berkut · on June 23, 2012

> The garbage collector has nothing to do with things like memory struct alignment.

Exactly - that's my point - with garbage collectors you can't do things like align to 16-byte boundaries for SSE, or so they fit in cache lines...

> And garbage collector makes concurrent allocation easier, not harder.

If it pre-allocates a huge amount of memory (i.e. a memory arena), yes. If it has to allocate the memory from the OS (i.e. it hasn't got any more available to the application), then it's a further painfully slow allocation. Allocating that extra memory is an extra step for the GC language (GC allocates amount at startup, but then when the program executes that isn't enough so needs to allocate more), whereas the C/C++ version can do it all in one allocation.

> And garbage collection doesn't prevent stack allocation.

I didn't say it didn't, I said Java did.

> You seem to have no idea what you're talking about.

If you say so.

barrkel · on June 24, 2012

I don't really understand why you think GCs are so limited, when you are comparing them not to the runtime heap provided by a C or C++ vendor, but instead some custom scheme. To level the playing field, you should be considering a GC specialized to the purpose. There's no reason why a GC can't allocate certain things specially, or have a specified initial heap size. And GC actually makes it easier to make better use of L1/L2 cache without needing specific optimization, because caches are nice sizes to use for generations in generational GC - they're extremely quick to collect.

Saying that games and embedded systems "allocate a fixed size of memory up front and NEVER free it, re-using it instead" is almost meaningless. They are necessarily implementing their own GC or memory allocator; pretty much the definition of an allocator is something which controls how memory is reused. The only thing they've insulated themselves from is the vendor or OS's memory allocator, by writing their own allocator. This can make lots of sense when your application is all written by a single relatively small team, and has nicely understandable lifetimes for various bits of memory - games are a good example, because memory can typically be classed as existing for the entire game, a level, a frame, or a call stack. When you have such a specialized use case, it makes sense to take advantage of it. But not all, or even most, applications are like that.

berkut · on June 24, 2012

I'm not saying it's not theoretically possible for GC'd languages to do that, I'm just saying in my experience of some of them (Python, Lua, Java) limitations of the language in terms of how/where/when to allocate memory has limited its use significantly for the domain I work in.

> Saying that games and embedded systems "allocate a fixed size of memory up front and NEVER free it, re-using it instead" is almost meaningless.

I was using that as an example of being in complete control of the memory...

> But not all, or even most, applications are like that.

Well again, it must be the domains I've worked in, because at least when doing embedded and desktop software, it's often been a big concern.

krakensden · on June 23, 2012

Java has a lot of memory management problems, and as the flagship for garbage collection for so many years, many people confuse its issues with gc generally. So you're right, but in the applications most people work on, he's right too.

digitalzombie · on June 25, 2012

Minecraft server will eat rams like it's chasing the dragon.