Managing GC-visible resources
Contents
Abstract
There is pressure to expose weak references in some way to content, or if not that, at least to use weak references internally for resource management. Weak references have unfortunate consequences for garbage collector (GC) tuning. If we wish to avoid weak references but still handle (many/most of?) the use cases that ask for weak references, then it would be helpful to document actual use cases of interest and suggest alternative solutions for them.
Why not weak references?
In brief, a GC wants to balance mutator throughput with latency with promptness. (Mutator = "the main program doing useful work". Promptness = "How quickly after an object is unreachable does it get collected?") Weak references make promptness visible, and many uses of them cause a program's correctness to depend on a certain promptness. This prevents an engine from delaying collection, and a GC's performance (measured in throughput and latency) derives from delaying collection. (In the limit, if you maximize promptness, you have reference counting. Plus something to deal with cycles.)
Determinism and GC visibility
Solutions that involve GC visibility can be described as "nondeterministic" because their behavior depends on the timing of GC. With a deterministic solution, the GC has the option of delaying collection as long as needed, because the user cannot tell the difference (other than in terms of memory usage and performance).
What about WeakMaps?
WeakMap does not have the issues of weak references. They are completely deterministic with respect to GC. Their use of the term "weak" is deceptive; it has nothing to do with the "weak" in "weak reference" other than both having something to do with GC. A WeakMap can be viewed as a collection of (strong) references from pairs of objects (the map and a key) to values. These references are "AND edges" in the reachability graph, where both the map and key must be live in order for the reference to keep the value alive. WeakMaps add implementation complexity to the GC engine, and in fact a straightforward iterative implementation will turn the graph traversal from linear to quadratic, but that is not fundamental -- it is perfectly possible to implement a linear marking pass that will clean up cyclic references through WeakMaps (though it is arguable that it's worth the trouble and possible overhead in the common case.) It's not just the basic marking pass performance, either; nursery-only collections in a generational GC may not want the overhead of handling WeakMaps, other graph traversals (eg for a separate cycle collector) might end up requiring read barriers on WeakMap accesses, etc.
Use cases
Model/View event handler cycles
Statement of problem
In MVC, you might have multiple views of the same model. Each view registers itself to receive events from the model. When the view dies (eg its containing window is closed), you want to both (1) allow the object structure to be discarded and (2) allow the model to stop sending useless updates to the invisible view.
Object graph
The model has a set of references to its views. The views also have back-references to the model, but that is not relevant here. (It is true that you might want the model to be discarded when all views of it are dead, but that is not the problem explored here.)
GC visibility
If the invisible view has visible side effects, then it is detectable when the view is collected. It might update shared state or emit log messages, for example.
Solution with weak refs
The model's references to the views are weak. When the view is not otherwise referenced, it gets collected and frees any resources associated with it. Optionally, the event listener edge could get cleaned up from its containing table. (Not doing so is a memory leak, but a very small one.)
Deterministic solutions
The straightforward solution is for whatever deactivates the view to explicitly unregister all event handlers. This can be complex and error-prone.
If the view can be proven (or forced) to be free of side effects as soon as it is deactivated, then the actual cleanup can happen at any time and the arguments against weak references do not apply. So for example if you could construct the view within some sort of managed global environment that guarantees to stop side-effecting anything after deactivation, then you can make the event listener references weak. So for example, a "console.log" call within this environment would need to either do nothing or throw some sort of RevokedReferenceError if you call it after deactivation.
This would still require an explicit deactivation call, but if you want determinism you have to either have that or have some magical way of knowing immediately when a subgraph becomes unreachable, because *something* has to know when to make console.log stop logging.