GC SafetySpec
Garbage Collection Safety Analysis of SpiderMonkey
NOTE: Do not hack on this page in the wiki. I, Daniel Wilkerson, am maintaining it as a text file on disk and I am editing it there and then re-pasting the entire document. If someone else makes a change and I don't notice it then it will get wiped out unfortunately. Send changes to me instead.
Introduction
SpiderMonkey is a JavaScript (JS) interpreter implemented in C. Objects on the JavaScript heap are also C objects and can be pointed to by C code. Any C object is either a "JS object" if on the JS heap or a "non-JS object" if not.
The JS interpreter has a garbage collector (GC). Note that in many cases we assume that the program is synchronous; that is, we do not consider any asynchronous computation: threading, signals, nor exceptions. The GC has a set of pointers called the "(JS) root set". The set of objects reachable from the root set are the "JS-live" objects. Consider the C globals and stack to be the "C root set". The set of objects reachable from the C root set are the "C-live" objects.
Since the JS interpreter is implemented in C, JS-live is a subset of C-live. However it is possible for a JS object to be C-live but not JS-live. If the GC were to run under such a condition, the JS object could be collected and then the C pointer to the object would dangle. The goal of this analysis is to prevent that situation.
Theorem: At all times all JS objects that are C-live are JS-live.
Below we prove the theorem assuming some rules constraining the code are followed. After that we specify which rules have had an Oink checker written for them.
Rootedness
Annotate any JS object in the type-system with the qualifier $js. Pointers that can dangle are of the type $js*.
Definitions:
- A $js* instance is "directly rooted" if it's address is in the GC root set.
- A $js* instance is "transitively rooted" if it is member of a JavaScript object (not C object) a pointer to which is transitively rooted or rooted.
- A $js* instance is "copy-of-rooted" if it has the same value as an instance that is directly rooted or transitively rooted.
- A $js* instance is "rooted" that is directly rooted, transitively rooted, or copy-of-rooted.
If all $js*-s are rooted at all times then any C-live $js is JS-live and our goal is satisfied.
Lemma: All values of type $js* are rooted at all times.
Partition into allocation classes
There are six allocation classes of objects in C:
- global,
- sole-heap,
- member-heap,
- parameter,
- stack, and
- temporary.
By "sole-heap" we mean just a pointer or an array of pointers allocated on the heap and by "member-heap" we mean a member of an object on the heap. By "parameter" we mean stack variables that are on the parameter list and by "stack" we mean stack variables that are not on the parameter list. By "temporary" we mean stack variables computed in compiler-written code as arguments to functions or as temporaries in other expressions; they have no first-class names to the user.
We will adopt a set of rules that can be checked by an analysis and then show that a $js* cannot dangle in each of these classes. We establish three invariants on all $js* values in order to do this.
Rule global-0.0: We only need five cases for each invariant because we disallow global $js* variables completely, which handles this case.
A note on temporaries: A "Full Expression" is a C technical term; it means an expression maximal in AST height, usually terminated by a semi-colon. Consider a Full Expression e containing a temporary expression $js *e2; note that 'e2' is a meta-name not a first-class program name because temporaries do not have names in a program. Note that the entire lifetime of e2 is the execution Full Expression e.
Note that we assume that other than running the GC the program is otherwise memory safe.
Simon points out the usual problem with Oink/Cqual analyses, that the type-system is or model of the program; therefore we need to prevent casting to or away from $js*.
Rule cast-0.0: Check that nothing is ever cast to or from a data type containing $js.
Incomplete
Look for any implicit rules that I did not make explicit.
Simon suggests that we provide a set of legal idioms.
Invariant 1: $js* values do not dangle upon creation
Sole-heap: These are made by alloc_rooted_ptr() which we trust to initialize them.
Member-heap: We trust the containing JS object ctor to initialize them. Simon points out that we need this rule.
Rule member-heap-1.0: Check no C/C struct/class/union contain a $js*, only JS objects can contain a $js* member.
Stack: Rule stack-1.0: We could require them to simply be initialized and check this syntactically; however since we will require register(