Thursday, September 23, 2010

JavaOne 2010: The Garbage Collection Mythbusters

As members of the Garbage Collector Group of the HotSpot Virtual Machine Development Team, John Coomes and Tony Printezis have the credentials for presenting "The Garbage Collection Mythbusters" at JavaOne 2010. The speakers took turns talking and they started by Coomes stating they wanted to "cover the strengths and weaknesses of Java garbage collection or at least the perceived strengths and weaknesses of Java garbage collection." They then proceeded to provide a brief background ("refresher course") into the basics of garbage collection.

Tracing-based garbage collectors are considered passive and "discover the live objects" and reclaim those that aren't live.


Myth #1: Malloc/free always performs better than garbage collection

Garbage collection enables object relocation which in turn provides many benefits (eliminated fragmentation, decrease garbage collection overhead, and supports linear allocation). Other benefits of object relocation include compaction (improves page locality) and relocation ordering (improve cache locality).

They also stated that "Generational Garbage Collection is Fast!" They cited a "recent publication" that malloc/free outperforms GC when space if tight, but GC can match or even beat malloc/free when there is "room to breathe."


Myth #2: Reference counting would solve all my Garbage Collection problems

Traditional reference counting has extra space overhead and extra time overhead.  It is also non-moving and is not always incremental or prompt. Lastly, this approach cannot deal with cyclic garbage collection. Advanced reference counting deals with some of the limitations of traditional reference counting. Two-bit reference counts can help with extra space overhead problem. It is also common to combine reference counting with copying GC. Must use a backup GC algorithm to deal with cyclic garbage. There is complexity in having two garbage collectors involved and still non-moving. A convincing argument for busting this myth is that Coomes is not aware of any modern garbage collection mechanism that uses reference counting.


Myth #3: Garbage Collection with explicit deallocation would drastically  improve performance

Printezis's first argument against this myth is the philosophical issue of increased change of compromised safety. He also made more "practical" arguments against the ability to explicitly deallocate. One thing that made particular sense to me was the concept that garbage collectors tend to "reclaim objects in bulk," so having to deal with explicit single deallocation cases could actually impact overall performance negatively.


Myth #4: Finalizers can (and should) be called as soon as objects become unreachable

Hopefully, most Java developers today know this important fact highlighted in this presentation: "Finalizers are not like C++ destructors." They have no guarantees. If you want "prompt external resource reclamation," then dispose your resources explicitly.

In conjunction with discussion on Myth #4, they referenced Reference Objects (WeakReferences, SoftReferences, etc.). They referred the audience to the "Garbage Collection-Friendly Programming" presentation they gave at 2007 JavaOne conference for more details.


Myth #5: Garbage Collection eliminates all memory leaks

Printezis stated he wished this one was true. Sadly, it's not. They provided a slide with a code sample showing "Unused Reachable Objects." Their example's simple ImageMap class had a static reference to itself. They showed that the garbage collector could never reclaim the File added to the internal map. Although the garbage collector reclaims unreachable objects, it will not reclaim unused reachable objects. These memory leaks require effort to track down and some tooling is getting better to help track them down.


Myth #6: I can get a garbage collector delivers that both very high throughput and very low latency

The speakers discussed throughput versus latency. Throughput garbage collectors try to shift most of the work to GC pauses to improve throughput for the application threads. The result is the least overall garbage collection overhead at the cost of garbage collection pauses. Latency garbage collectors move work out of the garbage collector pauses, putting more work on the application threads. The result is greater garbage collector overhead for the benefit of smaller pauses. As the above makes clear, their goals are conflicting. The bullet said it all: "One GC does not rule them all." Instead, "must choose the best GC for the job." In the future, hints might be helpful, but it will always be up to a human to decide based on the particular need.


Myth #7: I need to disable GC in critical sections of my code

Disabling the garbage collector often means not being able to allocate either because heap is full or nearly full. This might impact other threads as well. A possible solution to that conundrum is to allocate in advance, but that requires knowing exactly what data is necessary. Many Java libraries freely allocate objects and it seems unlikely that Java developers can ensure that these are avoided in the critical sections in which garbage collection is turned off. This approach has high potential for deadlocks, exceptions, and other unintended side effects. This approach might work in a "few, limited cases," but is "not a general-purpose solution" because it provides "too many ways to shoot yourself in the foot."


Myth #8: GC saves development time and doesn't cost anything

No need for "reclamation design" and fewer bugs, but the costs come out at deployment. Defaults typically work for "modest application requirements," but "stringent application requirements" require choice to be made regarding issues. Applications get very little control over garbage collection (when, how much, how long).

Speakers agreed that part of this myth is is true ("saves development time"), but they busted the myth because it does cost effort.


Myth #9: GC settings that worked for my last application will also work for my next application

The speaker began busting this myth by explaining how many different factors affect garbage collection performance. If somehow you could keep all of these factors exactly the same across environments and applications, then the same settings would likely work. There's not much chance of that happening, of course. "Transferring parameters" from old application to new application has "mixed results at best." If applications are very similar, might consider using older application's parameters as a starting point only, but plan to spend time and effort to tune those parameters.


Myth #10: Anything I can write in a system with GC, I can write with alloc/free

Technically, this is not a myth: one can write anything written with garbage collector with alloc/free because the garbage collector uses that approach in its implementation. However, it is much more difficult to do this directly with alloc/free. To turn it into a myth, they added the words "just as easily as": "Anything I can write in a system with garbage collection, I can write just as easily with alloc/free."


Conclusion

The speakers used a style that was a nice fit for this type of presentation. They took turns being the advocate for busting a particular myth and the other speaker would then be the judge to consistently conclude that the point was proven and the myth is debunked. This format allowed them to pretend to ask tough questions and be talked out of it. There was some acting involved that won't win any Oscars, but it did fit the format nicely and kept the presentation engaging. They also used humor at the end.  Myth #11 was "This talk is over" and that myth was Confirmed rather than Busted. Most of the over-capacity audience stayed through the entire presentation.

No comments: