Sfink/Moving GC

From MozillaWiki
Jump to: navigation, search

All garbage collection requires a way to find the set of objects that are currently reachable. This requires knowing about all pointers into the JS heap ("roots"), as well as all pointers within the JS heap (the "heap graph"). GC will do a graph traversal starting from the roots and visiting everything reachable. This requires being able to find all roots during a GC.

A generational garbage collector (GGC) defines a subset of the JS heap as the "nursery", and allocates most objects out of this nursery in the expectation that many of them will die quickly (become unused and unreferenced). The collector periodically sweeps through the nursery, packing only the live objects into another area so that the nursery is empty and available for new allocations again. It is important that this nursery sweep (a "minor collection") only need to look at the *live* objects (plus some bookkeeping information), NOT the full heap (or even a percentage of it) -- you want the cost of minor collections to be proportional only to the amount of live data in the nursery, and independent of the amount of garbage.

This requires two main capabilities: (1) the ability to move objects around, and (2) a record of all external pointers into the nursery.

Incremental garbage collection (IGC) adds an additional requirement: at the beginning of an incremental collection, all roots (external pointers) into the heap are recorded, then the main program is allowed to continue running but is interrupted periodically so the incremental collector can gradually trace the whole heap graph. Because the graph can change during this sweep, any intra-heap reference that is deleted must be marked (treated as live): such references usually are either dead or reachable from somewhere else, but it's also possible that the main program moved a reference from a part of the heap that has not yet been scanned to a part that has already been scanned, and so it will not get marked during the rest of the incremental collection. This adds a required capability (3): a record of all pointers within the heap that were deleted while incremental GC was active. Note that this is actually accomplished by putting them on the same mark stack as is used to record the current state of the incremental GC.

Capability #1 means that we need to be able to find and update all pointers from outside the JS heap into the JS heap. (Intra-heap pointers can be found and updated by scanning, so they don't need any special handling.) The embedding uses APIs to register certain data structures as part of the root set, as well as callbacks to record other root pointers during the GC. For pointers on the stack, we use the Stack Rooting API; see js/public/RootingAPI.h.

Capabilities #2 and #3 mean that we need "write barriers" for all JS heap pointers stored within the C++ heap, which is a fancy way of saying that if you modify a pointer into the heap, you may need to do something before and/or after the write to update bookkeeping information. Capability #2 requires a "post-barrier", an action taken after the write occurs (or at least, using the written value), to store a record of any new pointer value that points (or might point) into the nursery. Capability #3 requires a "pre-barrier", an action taken before the write occurs, so it can mark the overwritten value as live (before it is lost from the graph).

---

You can divide memory into 4 partitions: the stack, the non-JS (C++) heap, the JS nursery, and the JS tenured area. (Or you could say there's the stack and the heap, and the heap is divided into JS and non-JS parts, and the JS part is separated into a nursery and a tenured area, but it's the pointers between these areas that matter so we'll keep them separate.)

Pointers from the stack to either part of the JS heap must be findable and updatable during a moving GC. This is usually accomplished with Rooted<T>, Handle<T>, and MutableHandle<T>, though aggregates of various sorts will also use the Auto*Rooter classes as well. Pointers from the stack to either the stack or the C++ heap don't matter.

Pointers from the C++ heap to the JS heap must be findable and updatable, and writes must be barriered. Pointers from the C++ heap to the C++ heap don't matter to the GC. Pointers from the C++ heap to the stack are kind of funky but don't matter to the GC.

Pointers from the tenured area to anywhere in the JS heap are automatically findable and updatable, since you can scan the JS heap. Writes do need to be barriered, in case they change to point to the nursery (for GGC) or move an unscanned subgraph into the already-scanned region (for IGC). Pointers from tenured to the stack (?!) or the C++ heap don't matter.

Pointers from the nursery are the same as those from tenured, except it doesn't matter for GGC if a nursery pointer points to the nursery vs tenured. So the post barrier is not needed. The pre barrier is not needed because incremental GC does a nursery collection before each slice.