GC SafetySpec
Garbage Collection Safety Analysis of SpiderMonkey
Introduction
SpiderMonkey is a JavaScript (JS) interpreter implemented in C. Objects on the JavaScript heap are also C objects and can be pointed to by C code. Any C object is either a "JS object" if on the JS heap or a "non-JS object" if not.
The JS interpreter has a garbage collector (GC). Note that in many cases we assume that the program is synchronous; that is, we do not consider any asynchronous computation: threading, signals, nor exceptions. The GC has a set of pointers called the "(JS) root set". The set of objects reachable from the root set are the "JS-live" objects. Consider the C globals and stack to be the "C root set". The set of objects reachable from the C root set are the "C-live" objects.
Since the JS interpreter is implemented in C, JS-live is a subset of C-live. However it is possible for a JS object to be C-live but not JS-live. If the GC were to run under such a condition, the JS object could be collected and then the C pointer to the object would dangle. The goal of this analysis is to prevent that situation.
Theorem: At all times all JS objects that are C-live are JS-live.
Below we prove the theorem assuming some rules constraining the code are followed. After that we specify which rules have had an Oink checker written for them.
Rootedness
Annotate any JS object in the type-system with the qualifier $js. Pointers that can dangle are of the type $js*.
Definitions:
- A $js* instance is "directly rooted" if it's address is in the GC root set.
- A $js* instance is "transitively rooted" if it is member of a JavaScript object (not C++ object) a pointer to which is transitively rooted or rooted.
- A $js* instance is "copy-of-rooted" if it has the same value as an instance that is directly rooted or transitively rooted.
- A $js* instance is "rooted" that is directly rooted, transitively rooted, or copy-of-rooted.
If all $js*-s are rooted at all times then any C-live $js is JS-live and our goal is satisfied.
Lemma: All values of type $js* are rooted at all times.
Partition into allocation classes
There are six allocation classes of objects in C:
- global,
- sole-heap,
- member-heap,
- parameter,
- stack, and
- temporary.
By "sole-heap" we mean just a pointer or an array of pointers allocated on the heap and by "member-heap" we mean a member of an object on the heap. By "parameter" we mean stack variables that are on the parameter list and by "stack" we mean stack variables that are not on the parameter list. By "temporary" we mean stack variables computed in compiler-written code as arguments to functions or as temporaries in other expressions; they have no first-class names to the user.
We will adopt a set of rules that can be checked by an analysis and then show that a $js* cannot dangle in each of these classes. We establish three invariants on all $js* values in order to do this.
Rule global-0.0: We only need five cases for each invariant because we disallow global $js* variables completely, which handles this case.
A note on temporaries: A "Full Expression" is a C++ technical term; it means an expression maximal in AST height, usually terminated by a semi-colon. Consider a Full Expression e containing a temporary expression $js *e2; note that 'e2' is a meta-name not a first-class program name because temporaries do not have names in a program. Note that the entire lifetime of e2 is the execution Full Expression e.
Note that we assume that other than running the GC the program is otherwise memory safe.
Simon points out the usual problem with Oink/Cqual++ analyses, that the type-system is or model of the program; therefore we need to prevent casting to or away from $js*.
Rule cast-0.0: Check that nothing is ever cast to or from a data type containing $js.
Incomplete
Look for any implicit rules that I did not make explicit.
Simon suggests that we provide a set of legal idioms.
Invariant 1: Initialized to NULL and therefore do not dangle upon creation
Sole-heap: These are made by alloc_rooted_ptr() which we trust to initialize them.
Member-heap: We trust the containing JS object ctor to initialize them. Simon points out that we need this rule.
Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.
Stack: Rule stack-1.0: We check syntactically that the initialization exists. Alternative: (dsw) Igor keeps saying that this is not necessary, since we will require register(&v) to be called immediately and it must initialize the variable.
Simon points out that a $js[] must only occur on the heap and be registered with alloc_rooted_ptr() upon creation or on the stack if register() were called (register also takes a length argument). Now, if there is a $js* that has pointer arithmetic done on it, we get a temporary that is copy-of-rooted because the initial pointer pointed into a rooted array and therefore all of the members of the array are rooted.
Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.
Temporary: It is not possible for them to fail to be initialized.
Parameter: It is not possible for them to fail to be initialized.
Invariant 2: Rooted upon creation
Sole-heap: We trust alloc_rooted_ptr() to put the pointer to the newly created $js* (or $js[]) object into the GC root set; therefore directly-rooted.
Member-heap: These are members of another object which by induction is pointed to by a rooted pointer; therefore transitively-rooted. NOTE: as pointed out above, we use the assumption that, other than sole-heap objects, only JS objects can contain $js*.
Stack: Rule stack-2.0: We check initialization is immediately followed by a call to register(&v); therefore directly-rooted. (dsw) I think Igor suggests that perhaps this is where they should be initialized; see above.
Temporary: By induction is copy-of-rooted upon creation.
Parameter: By induction is copy-of-rooted upon creation.
Invariant 3: Remain rooted until "about to cease to exist"
Definition: At a particular time an object is "about to be cease to exist" iff from that time until object de-allocation the garbage collector cannot run.
Sole-heap: We trust the garbage collector to not un-root the object until it is about to cease to exist.
Member-heap: We trust the containing JS object destructor to not un-root the object until it is about to cease to exist.
Stack: We must check that stack-allocated variables are only unregistered when the GC cannot run and the frame is about to cease to exist.
Rule stack-3.0: We check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.
Rule stack-3.1: We check that once unregister(&v) is called, v is not referred to again in the function body.
FIX: factor out the lack-of-alias to v to show that v cannot be used indirectly either.
Temporary: All temporaries are const: the language provides no way to write them. We must simply show that they are a copy of does not change during their lifetimes.
Rule temporary-3.1: After operator overloading, an expression of type $js * must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.
Prevents this situation since a->v is of type $js* but not allowed by the above rule:
   struct A $js {
     $js *v;
   };
   void g(struct A *a) {
     a->v = NULL;
     // invoke gc; left argument to plus (that is, a->v) in h() is not rooted
   }
   void h(struct A *a) {
     // assume that the arguments are evaluated in the order numbered
     *( /*1*/ a->v + ( /*2*/ g(a), /*3*/ 0) );
   }
Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').
Note that rule temporary-3.1 partitions the cases. For case 1), rule temporary-3.2 prevents v from having any aliases in the FullExpression nor being changed directly. For case 2), we simply assume that the compiler will not write code to do anything at all between the creation and the destruction of the temporary.
Parameter: Consider a function f($js *v). Recall also that the entire lifetime of v is the execution of f().
Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.
Recall from the rules about temporaries that a parameter $js *v is copy-of-rooted upon initialization from a temporary argument and that argument is const (since a temporary) and by rule parameter-3.0 a copy of a stack- or parameter-allocated variable in the caller $js *v2. We show below that, with some additional rules, the following two properties hold.
1) The parameter v itself cannot change during its lifetime.
2) No alias to v2 exists during the execution of f().
Note that 2) implies that v2 cannot change during the lifetime of v. Recalling that v2 is rooted upon creation, parameter v therefore remains copy-of-rooted (of v2) during its lifetime.
Situation:
   void f($js *v) {
     // point 1: v is const
     // point 2: no alias to v2 other than v is available here,
     //          so for the duration of f(), v2 is const
   }
   void g() {
     $js *v2;
     f(v2);
   }
Establishing invariant 3 parameter property 1
Lemma: The parameter v itself cannot change during its lifetime.
Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.
Establishing invariant 3 parameter property 2
Lemma: No alias to any stack- or parameter-allocated $js *v2 in the caller to f() exists during the execution of f().
Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.
Prevents this situation:
   $js **x;
   void f($js *v) {
     *x = NULL;  // stomp on v2
     // v is no longer copy-of-rooted
     gc();
     *v; // can fail
   }
   void g() {
     $js *v2;
     x = &v2;
     f(v2);
   }
Rule parameter-3.3: For any argument to f() (other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v) annotate any $js* at any depth below further *'s or []'s in the expression type as non-stack.
Prevents this situation:
   $js *v2;
   $js **v3 = &v2;
   f(v3) // can't do this because the $js * withing the $js ** of v3
         // gets annotated as non-stack
Rule parameter-3.4: For stack- or parameter-allocated $js *v, prohibit both v and &v as arguments to the same function call site. Note that this rule is a consequence of rule temporary-3.2.
We now prove that the address of a parameter- or stack-allocated $js *v2 has no alias that is available to f(v2, ...) nor any other frame called from f(v2, ...). First no alias to v2 was stored on the heap nor in a global. Second other than for an argument of the form &v3, any $js* anywhere in a type passed in as an argument was marked non-stack and so does not point to $js *v2. Third for an argument of the form &v3 we check that the name 'v3' is not 'v2'.
Making the analysis function-local
Igor suggests the following rules which seem to make the analysis completely function-local. Such a solution is more low-tech as depending on inter-function dataflow of course depends on the oink linker-imitator being correct and it is known to have bugs.
Rule parameter-3.1': Parameters of type $js *v by not allowing the address &v to be taken nor for v to occur in an lvalue context.
Rule parameter 3.1 is a consequence of rule parameter-3.1' because the const qualifier is only relevant for pointer to const and we cannot make pointers to const in this case.
Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').
As a consequence you cannot take the address of a $js** unless you plan on immediately throwing it away or de-referencing it, which is pointless. Also as a consequence you cannot take the address of a $js* except as function argument.
Rule parameter-3.2 is a consequence because if you cannot store a $js** anywhere, you can't store it on the heap or in a global.
Rule parameter-3.3 is a consequence because if you cannot store a $js** anywhere you cannot build up a type where an underlying stack- or parameter-allocated $js *v because the value of the type would have to have been constructed by storing &v somewhere.
Rule summary
Important rules
General rules about registering stack $js* variables
Parametrize the register name with another flag.
Implement register_array and parametrize in the same way.
Implement checking that register(&v) is called sometime after v is declared, not just immediately. Brendan says it is not important to check that v is not used before it is called.
Implement checking that once unregister(&v) is called, then v is not used later.
Make '&*' illegal.
DONE: ./oink -fo-exclude-extra-star-amp
Rule global-0.0: Disallow global variables of type $js, $js * (or more layers of stars).
For each declarator, if global, check the type does not contain a $js, using a non-function-parameter type walk.
DONE: -q-exclude-global '$$!nonglobal'
Rule stack-2.0: Check stack-allocated $js* declarator is immediately followed by a call to register(&v).
for each compound statement, for each statement decl of $js *v, check next statement is call to register(&v), check it off.
DONE: -q-reg-stack '$$!js*'
Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').
as above, mark every de-reference-ed expression. for each expression, if it is of type $js**, must be a function argument or must be in the context of a de-reference.
one visitation marks function arguments the other marks dereference arguments the third finds all $js ** expressions and checks that they are in one of the above contexts
DONE: -q-exl-perm-alias '$$!js**'
Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').
for each FullExpression fe, for each variable v in it in an r-value context, if it is of type $js*, check that there is no pointer or ref to it.
DONE: -q-exl-value-ref-call '$$!js*'
Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.
for each function call, for each argument, if has type $js*, the argument is a variable expression and the variable is stack or parameter allocated
DONE: -q-arg-local-var '$$!js*'
Rule temporary-3.1: After operator overloading, an expression of type $js* must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.
for all expressions, if a deref or initializer as above, mark as such. for all expressions, if of type $js*, check the above predicate using the previous pass.
DONE: -q-temp-immediately-used '$$!js*'
Rule parameter-3.1': Parameters of type $js *v may not have the address &v taken nor may v occur in an lvalue context.
for each addr-of expression and reference-type expression, the argument is not a variable that is a parameter of type $js*
nah, this just means that if it occurs in a parameter that it is const. const_when_param will work for this. DONE: -q-config ../libqual/config/lattice -q-const-when-param '$$!js*'
Covered by other rules and therefore not implemented
Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.
(omit for now due to Rule parameter-3.1') for each parameter, if it is of type $js * make it $js * const. do a const analysis
Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.
(omit for now due to Rule parameter-3.2/3.3')
Rule parameter-3.3: For any argument to f(), other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v, annotate any $js* at any depth in the expression type as non-stack.
(omit for now due to Rule parameter-3.2/3.3')
Less important and therefore not implemented
I considered these to be less important because for various reasons we thought they were less likely to be violated.
Typechecking for $js: we don't check that $js always flows to $js and non-$js always flows to non-$js.
Brendan says that this is not important.
Rule cast-0.0: Check that nothing is ever cast 1) to or 2) from a data type containing $js.
Add flag -q-exclude-cast '$!js' which excludes the qualifier from casts.
DONE for cast 'to': -q-exclude-cast '$$!noncast' do cast 'from':
Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.
Rule stack-3.0: Check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.
check off the ones above and on another pass, see if any were missed. also check that they all say register(&v), for some stack-allocated v
Rule stack-3.1: after unregister(&v) v is not referred to until the end of the frame.
Show that also v is not aliased (probably a consequence of the rules that we already have).
Make all $js*** and higher illegal also: I think Igor said this was ok.
for each expression and declarator variable, if it contains a $js, it must be a $js, $js*, or $js**.
Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.