Sfink/Draft - GC Pointer Handling

From MozillaWiki
Jump to: navigation, search

SpiderMonkey Garbage Collector Handle API

Background

Some background: the fundamental issue at hand is that SpiderMonkey uses garbage collection, which is a way of releasing unused memory, or "garbage". It works by scanning through memory that is known to be in use, and throwing out everything else.

How does the garbage collector (GC) know what memory is in use? The GC is informed of some memory via explicit "rooting" JSAPI calls like JS_AddNamedObjectRoot. If any code is currently running, anything needed by that code (eg all running scripts on the stack and their variables) will automatically be considered in use. XPConnect and similar code use various hooks to register objects as being in use. Then the GC scans the fields of all these in-use objects, recursively, and regards anything reachable as being in use. (This is the origin of the term "root": the GC scanner begins with a root set of objects and scans outward from them recursively.)

The one large category not covered by the above is local variables. Take for example:

 JSObject *obj1 = JS_NewObject(...);
 JSObject *obj2 = JS_NewObject(...);
 JS_DefineProperty(obj1, ...);

Consider what happens to obj1 if a garbage collection is triggered by the JS_NewObject() call for obj2. Nothing currently points to (refers to) obj1 other than the local variable 'obj1' itself. Without any other code or mechanism in place, the GC will delete obj1 and the JS_DefineProperty() will crash.

Currently, SpiderMonkey handles this problem with a conservative stack scanner. During a GC, it scans the C++ stack to find anything that looks like a pointer to a GC-able object ("gcthing"), and treats anything it finds as being in use. It may find some things that are no longer in use or that only look like a pointer to a GC-able thing (gcthing). In that case, it erroneously mark them as in use, as well as anything they point to, resulting in some garbage not getting collected. This is generally a short-term problem, however.

Note that, for correctness, we only need to find a single reference to each gcthing we need to mark as being in use. This is convenient because it is common to have extra gcthing pointers that are not traced through by the GC scanner -- eg external tables, local variables and function parameters, and back pointers of various sorts. In fact, any code that does not create new gcthings or store pointers to them between calls can completely ignore the garbage collector.

We now wish to implement some forms of garbage collection that move gcthings from one memory location to another. This is necessary for both generational and compacting GC. Unfortunately, that means it is now necessary to locate *all* gcthing pointers, because if a gcthing is moved then it is invalid for anything to use its old address. Or stated differently, *all* pointers to a moved gcthing must be updated. Also, it is no longer possible to use a conservative stack scanner, because that could result in a non-gcthing being mangled because it was incorrectly interpreted as a moved gcthing. As a result, the above code is now invalid.

Rooting

To deal with this, we have provided a new API for temporarily marking local variables as roots. The above code becomes:

 Rooted<JSObject*> obj1(cx, JS_NewObject(...));
 Rooted<JSObject*> obj2(cx, JS_NewObject(...));
 JS_DefineProperty(obj1, ...);

Note that obj1 parameter of JS_DefineProperty has been modified to be of type 'Handle<JSObject*>'.

A Rooted<JSObject*> is treated as a GC root until the function returns. A Handle<JSObject*> is a reference to a rooted object. (Handle<JSObject*> is like a const ref to a rooted JSObject*.) Anything stack-allocated (local variables, parameters) needs to be rooted if it is live across a call that could trigger the garbage collector. You should generally assume that any JSAPI call could invoke the garbage collector unless labeled otherwise -- you might at first expect that only JSAPI calls that create things could GC, but in fact many operations will create gcthings internally. For example, anything that could throw a JS exception can GC, because the exception object will be allocated. (And just about anything that accepts a gcthing pointer as a parameter can throw a JS exception -- eg, if the object is a proxy for something in a different compartment, the compartment crossing may be disallowed, which would trigger an exception to be thrown.)

The rule to follow is "any gcthing pointer must be rooted if it is ever live across a call that might GC." Combined with the fact that just about any JSAPI call can GC, that is pretty close to "any gcthing pointer must be rooted if is live across a JSAPI call." To root something, store it in a Rooted<T> type and never use the bare pointer value for anything. If you need to declare a function that accepts gcthing pointers, declare them as Handle<T>. For out or in/out parameters, use MutableHandle<T>. The resulting values may be used as if they were the original bare pointers for almost all purposes.

The available types are:

 JS::Rooted<JSObject*> / JS::Handle<JSObject*> / JS::MutableHandle<JSObject*>
 JS::Rooted<Value> / JS::Handle<Value> / JS::MutableHandle<Value>
 JS::Rooted<String> / JS::Handle<JSObject*>
 JS::Rooted<Id> / JS::Handle<JSObject*>

The canonical example is to convert something like:

 JSObject *foo(JSContext *cx, JSObject *obj, JSObject **inout)
 {
   JSObject *obj1 = JS_Foo(obj);
   JSObject *obj2 = JS_Bar(obj);
   *inout = obj1;
   return obj2;
 }

to

 JSObject *foo(JSContext *cx, Handle<JSObject*> obj, MutableHandle<JSObject*> inout)
 {
   Rooted<JSObject*> obj1(cx, JS_Foo(obj));
   Rooted<JSObject*> obj2(cx, JS_Bar(obj));
   inout.set(obj1);
   return obj2;
 }

Internals

RootedObject is a typedef of Rooted<JSObject*>. Rooted<T> is an RAII class template that tells the given JSContext that the contained pointer is a root, and should be updated if a GC occurs that moves the object pointed to. The destructor will unregister that root. The registration is implemented as a simple stack (LIFO queue), which means that registrations and unregistrations must be properly nested. Given that this is an RAII class, this is almost automatic. The compiler will prevent you from using them for temporaries as in:

 Rooted<JSObject*> result(cx, JS_Foo(cx, Rooted<JSObject*>(cx, obj)));

which would not work because the temporaries would be destroyed in the wrong order. You *can* still get into trouble by heap-allocating:

 Rooted<JSObject*> *obj1pointer;
 {
   Rooted<JSObject*> obj2(cx, JS_Foo());
   obj1pointer = new Rooted<JSObject*>(cx, obj1);
 }
 delete obj1pointer;

A separate static analysis will detect these errors, but you'll probably have to wait for that failure to be reported on tbpl to notice. Heap-allocating Rooted<T> values are rarely desireable, but very occasionally necessary. If you use them, you are required to maintain the LIFO ordering manually.

HandleObject is a typedef of Handle<JSObject*>. Handle<T> is another class template that serves as an additional layer of indirection on top of a (rooted) pointer, so that if the pointer is updated, users of the Handle will automatically use the new value. In other words, HandleObject is really just a JSObject** with some compilation-time checking added. If a Handle<T> escapes an enclosing Rooted<T>'s scope, Bad Things will happen -- the pointed-to pointer may be on a reused address on the stack, and so may be overwritten at any time, or a GC may occur that moves the pointer and does not update it. As long as Handle<T> is not used as a return value type, it is difficult to cause this to happen. (Ok, you could do it with a pointer or reference outparam, too.)

A MutableHandle<JSObject*> is also a pointer to a pointer to a gcthing, but it allows updating the gcthing pointer. It is essentially a pointer to a Rooted<T>, and will implicitly convert from that. Usage is:

 Rooted<JSObject*> obj(cx);
 JS_UpdateParameter(&obj);

where JS_UpdateParameter looks like:

 void JS_UpdateParameter(MutableHandle<JSObject*> obj)
 {
   obj.set(JS_Foo(obj));
 }

Collections

Collections of gcthing pointers, either heap- or stack-allocated, require special handling. A simple stack-allocated array:

 JSObject *objects[20];

will not be traced to keep the objects live, nor will those pointers get updated on a moving GC. For local (stack-allocated) vectors, the easiest fix is to use AutoValueVector, AutoIdVector, or AutoVectorRooter<JSObject*>, which register a GC callback to trace through the vector during a GC. For heap-allocated vectors, you should convert the vectors' contents to patterns of chicken entrails splattered onto a plate glass window, and infer which gcthing was intended by meditating while staring at the window while it is illuminated only by a green strobe light.

Instance Methods

One tricky case is methods on gcthings. The following code is invalid:

 class JSObject {
   void foo() {
     Rooted<JSObject*> tmp(cx, JS_NewObject(...));
     ...do something with data members...
   }
 };

The problem is that inside of JSObject::foo(), the 'this' pointer may be relocated, and the subsequent access of its data members will use the old pointer value.

To fix this, make any method static if it could GC. This will force the gcthing pointer to be passed as a parameter, and it can then be handled normally. (An alternative would be to explicitly root 'this' at the very beginning of the method and call it eg 'self', but this is error-prone in that any reference to a data member or virtual method without a preceding 'self->' will be a GC bug.)

Performance

Using a rooted value adds a layer of indirection. Creating a rooted value does a little bit of pointer chasing and appends to a stack stored in JSContext. Handles are pointers to pointers, so passing them around is no added cost. Rooted<T> is a set of 3 pointers, but so they don't get passed around that doesn't really matter.

In hot code, the overhead of rooting can be avoided if you are careful. Remember that the only thing that needs to be avoided is holding a gcthing pointer live across a call that might GC. So if a function takes a gcthing pointer but never GCs, it is valid for it to declare a bare pointer as a parameter. Even if a function might GC, it is ok to pass in a bare pointer variable as long as that variable is dead after the call. The function will have to root the value itself, if needed, but it can declare that capability by accepting a bare pointer instead of a Handle.

Persistent (heap-allocated) Pointers

The rules are different for storing pointers on the heap (i.e., into structures that have been allocated with new or malloc.) Once again, you must arrange for any pointers to be (1) discovered during tracing and (2) updated by the moving GC. For the most part, the same mechanism is used for both -- when tracing, the GC is handed an indirect pointer through which the gcthing pointer is updated if needed. Many exceptional cases may arise, however:

  • The gcthing pointer value may be used to construct a hash code
  • The gcthing pointer value may be tagged
  • The structure containing the pointer is sometimes allocated on the stack, sometimes on the heap

Also, persistent pointer updates may be subject to write barriers for incremental and/or generational GC, where any modification must be monitored to maintain various invariants. Basically, you can't add, remove, or change a heap-stored gcthing pointer without informing the GC about it. See https://developer.mozilla.org/En/SpiderMonkey/Internals/Garbage_collection for details.

The simplest approach is to use HeapPtr<T> as the type of gcthing pointer fields. Any writes through a HeapPtr will trigger the needed barriers. There's also EncapsulatedPtr<T>, which is the same but different. Or RelocatablePtr<T>, which is different but the same.

Funky examples

Double-rooting vs raw pointers

Rekeying

Hash-based collections that use the pointer value as an input to the hash function need to be "rekeyed" when that pointer value changes. js/public/HashTable.h has a rekeyFront() operation that may be used while enumerating the hash to perform the rekeying. ...((ignore rekeyFrontUnchecked() for now))...

lookupForAdd

tagged pointers

C code