GC SafetySpec

From MozillaWiki
Jump to: navigation, search

Garbage Collection Safety Analysis of SpiderMonkey

Daniel S. Wilkerson, Igor Bukanov, Taras Glek, Simon Goldsmith

NOTE: Do not hack on this page in the wiki. I, Daniel Wilkerson, am maintaining it as a text file on disk and I am editing it there and then re-pasting the entire document. If someone else makes a change and I don't notice it then it will get wiped out unfortunately. Send changes to me instead.

Introduction

SpiderMonkey is a JavaScript (JS) interpreter implemented in C. Objects on the JavaScript heap are also C objects and can be pointed to by C code. Any C object is either a "JS object" if on the JS heap or a "non-JS object" if not.

The JS interpreter has a garbage collector (GC). Note that in many cases we assume that the program is synchronous; that is, we do not consider any asynchronous computation: threading, signals, nor exceptions. The GC has a set of pointers called the "(JS) root set". The set of objects reachable from the root set are the "JS-live" objects. Consider the C globals and stack to be the "C root set". The set of objects reachable from the C root set are the "C-live" objects.

Since the JS interpreter is implemented in C, JS-live is a subset of C-live. However it is possible for a JS object to be C-live but not JS-live. If the GC were to run under such a condition, the JS object could be collected and then the C pointer to the object would dangle. The goal of this analysis is to prevent that situation.

Theorem: At all times all JS objects that are C-live are JS-live.

Below we prove the theorem assuming some rules constraining the code are followed. After that we specify which rules have had an Oink checker written for them.


Rootedness

Annotate any JS object in the type-system with the qualifier $js. Pointers that can dangle are of the type $js*.

Definitions:

  • A $js* instance is "directly rooted" if it's address is in the GC root set.
  • A $js* instance is "transitively rooted" if it is member of a JavaScript object (not C++ object) a pointer to which is transitively rooted or rooted.
  • A $js* instance is "copy-of-rooted" if it has the same value as an instance that is directly rooted or transitively rooted.
  • A $js* instance is "rooted" that is directly rooted, transitively rooted, or copy-of-rooted.

If all $js*-s are rooted at all times then any C-live $js is JS-live and our goal is satisfied.

Lemma: All values of type $js* are rooted at all times.


Partition into allocation classes

There are six allocation classes of objects in C:

  • global,
  • sole-heap,
  • member-heap,
  • parameter,
  • stack, and
  • temporary.

By "sole-heap" we mean just a pointer or an array of pointers allocated on the heap and by "member-heap" we mean a member of an object on the heap. By "parameter" we mean stack variables that are on the parameter list and by "stack" we mean stack variables that are not on the parameter list. By "temporary" we mean stack variables computed in compiler-written code as arguments to functions or as temporaries in other expressions; they have no first-class names to the user.

We will adopt a set of rules that can be checked by an analysis and then show that a $js* cannot dangle in each of these classes. We establish three invariants on all $js* values in order to do this.

Rule global-0.0: We only need five cases for each invariant because we disallow global $js* variables completely, which handles this case.

A note on temporaries: A "Full Expression" is a C++ technical term; it means an expression maximal in AST height, usually terminated by a semi-colon. Consider a Full Expression e containing a temporary expression $js *e2; note that 'e2' is a meta-name not a first-class program name because temporaries do not have names in a program. Note that the entire lifetime of e2 is the execution Full Expression e.

Note that we assume that other than running the GC the program is otherwise memory safe.

Simon points out the usual problem with Oink/Cqual++ analyses, that the type-system is or model of the program; therefore we need to prevent casting to or away from $js*.

Rule cast-0.0: Check that nothing is ever cast to or from a data type containing $js.


Incomplete

Look for any implicit rules that I did not make explicit.

Simon suggests that we provide a set of legal idioms.


Invariant 1: $js* values do not dangle upon creation

Sole-heap: These are made by alloc_rooted_ptr() which we trust to initialize them.

Member-heap: We trust the containing JS object ctor to initialize them. Simon points out that we need this rule.

Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.

Stack: Rule stack-1.0: We could require them to simply be initialized and check this syntactically; however since we will require register(&v) to be called before the first use, we will follow Igor's suggestion and trust that register initializes the pointer upon creation.

Simon points out that a $js*[] must either 1) be allocated on the heap and be registered with alloc_rooted_ptr() upon creation or 2) be allocated on the stack if register_array() were called to register it (register also takes a length argument). Now, if there is a $js* that has pointer arithmetic done on it, the result is a temporary that is copy-of-rooted; this is because the initial pointer pointed into a rooted array and therefore all of the members of the array are rooted.

Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.

Temporary: It is not possible for them to fail to be initialized.

Parameter: It is not possible for them to fail to be initialized.


Invariant 2: $js* values are rooted upon creation

Sole-heap: We trust alloc_rooted_ptr() to put the pointer to the newly created $js* (or $js*[]) object into the GC root set; therefore directly-rooted.

Member-heap: These are members of another object which by induction is pointed to by a rooted pointer; therefore transitively-rooted. NOTE: as pointed out above, we use the assumption that, other than sole-heap objects, only JS objects can contain $js*.

Stack: Rule stack-2.0: We check declaration is followed by a call to register(&v) before any use; therefore directly-rooted. (dsw) I think Igor suggests that perhaps this is where they should be initialized; see above.

Note that other rules establish that a stack-allocated variable v is not aliased except temporarily and on the stack in the frame of a called-function only. Therefore a check that v is not used before register(&v) nor after unregister(&v) is simply a syntactic check for the name of v.

Temporary: By induction is copy-of-rooted upon creation.

Parameter: By induction is copy-of-rooted upon creation.


Invariant 3: $js* values remain rooted until unusable

Definition: At a particular time an object is "unusable" iff from that time until object de-allocation the object cannot be used.

One way for it to be clear that an object is unusable is if it is neither named-in nor is an alias-to-it-available-in the code remaining to be executed until the object is destructed.

Sole-heap: We trust the garbage collector to not un-root the object until it is unusable.

Member-heap: We trust the containing JS object destructor to not un-root the object until it is unusable.

Stack: We must check that stack-allocated variables are only unregistered when unusable.

Rule stack-3.0: We check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.

Rule stack-3.1: We check that once unregister(&v) is called, v is not referred to again in the function body. See note regarding aliasing and checking at rule stack-2.0 above.

Temporary: All temporaries are const: the language provides no way to write them. We must simply show that they are a copy of does not change during their lifetimes.

Rule temporary-3.1: After operator overloading, an expression of type $js * must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.

Prevents this situation since a->v is of type $js* but not allowed by the above rule:

   struct A $js {
     $js *v;
   };
   void g(struct A *a) {
     a->v = NULL;
     // invoke gc; left argument to plus (that is, a->v) in h() is not rooted
   }
   void h(struct A *a) {
     // assume that the arguments are evaluated in the order numbered
     *( /*1*/ a->v + ( /*2*/ g(a), /*3*/ 0) );
   }

Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').

Note that rule temporary-3.1 partitions the cases. For case 1), rule temporary-3.2 prevents v from having any aliases in the FullExpression nor being changed directly. For case 2), we simply assume that the compiler will not write code to do anything at all between the creation and the destruction of the temporary.

Parameter: Consider a function f($js *v). Recall also that the entire lifetime of v is the execution of f().

Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.

Recall from the rules about temporaries that a parameter $js *v is copy-of-rooted upon initialization from a temporary argument and that argument is const (since a temporary) and by rule parameter-3.0 a copy of a stack- or parameter-allocated variable in the caller $js *v2. We show below that, with some additional rules, the following two properties hold.

1) The parameter v itself cannot change during its lifetime.

2) No alias to v2 exists during the execution of f().

Note that 2) implies that v2 cannot change during the lifetime of v. Recalling that v2 is rooted upon creation, parameter v therefore remains copy-of-rooted (of v2) during its lifetime.

Situation:

   void f($js *v) {
     // point 1: v is const
     // point 2: no alias to v2 other than v is available here,
     //          so for the duration of f(), v2 is const
   }
   void g() {
     $js *v2;
     f(v2);
   }


Establishing invariant 3 parameter property 1

Lemma: The parameter v itself cannot change during its lifetime.

Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.


Establishing invariant 3 parameter property 2

Lemma: No alias to any stack- or parameter-allocated $js *v2 in the caller to f() exists during the execution of f().

Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.

Prevents this situation:

   $js **x;
   void f($js *v) {
     *x = NULL;  // stomp on v2
     // v is no longer copy-of-rooted
     gc();
     *v; // can fail
   }
   void g() {
     $js *v2;
     x = &v2;
     f(v2);
   }

Rule parameter-3.3: For any argument to f() (other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v) annotate any $js* at any depth below further *'s or []'s in the expression type as non-stack.

Prevents this situation:

   $js *v2;
   $js **v3 = &v2;
   f(v3) // can't do this because the $js * withing the $js ** of v3
         // gets annotated as non-stack

Rule parameter-3.4: For stack- or parameter-allocated $js *v, prohibit both v and &v as arguments to the same function call site. Note that this rule is a consequence of rule temporary-3.2.

We now prove that the address of a parameter- or stack-allocated $js *v2 has no alias that is available to f(v2, ...) nor any other frame called from f(v2, ...). First no alias to v2 was stored on the heap nor in a global. Second other than for an argument of the form &v3, any $js* anywhere in a type passed in as an argument was marked non-stack and so does not point to $js *v2. Third for an argument of the form &v3 we check that the name 'v3' is not 'v2'.


Making the analysis function-local

Igor suggests the following rules which seem to make the analysis completely function-local. Such a solution is more low-tech as depending on inter-function dataflow of course depends on the oink linker-imitator being correct and it is known to have bugs.

Rule parameter-3.1': Parameters of type $js *v by not allowing the address &v to be taken nor for v to occur in an lvalue context.

Rule parameter 3.1 is a consequence of rule parameter-3.1' because the const qualifier is only relevant for pointer to const and we cannot make pointers to const in this case.

Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').

As a consequence you cannot take the address of a $js** unless you plan on immediately throwing it away or de-referencing it, which is pointless. Also as a consequence you cannot take the address of a $js* except as function argument.

Rule parameter-3.2 is a consequence because if you cannot store a $js** anywhere, you can't store it on the heap or in a global.

Rule parameter-3.3 is a consequence because if you cannot store a $js** anywhere you cannot build up a type where an underlying stack- or parameter-allocated $js *v because the value of the type would have to have been constructed by storing &v somewhere.


Rule summary

We present and summarize all of the above rules together with implementation notes. They are categorized as follows:

  • Important rules: These are rules that we think are likely to be violated and therefore really need to be checked. The implementation plan and resulting oink flag(s) are noted. These rules may be checked today using Oink. More documentation on how to use the flags can be obtained by typing oink/qual --help. Tests that demonstrate their usage can be found by grepping for the flag name in oink/*.mk files.
  • Covered by other rules and therefore not implemented: These rules are implied by other rules.
  • Less important and therefore not implemented: I considered these to be less important because for various reasons we thought they were less likely to be violated.


Important rules

Make '&*' illegal. We need this so that when we locally see a '*' de-referencing a value, we know that this is not being defeated by being wrapped in a '&'. I see no good reason for '&*' or '*&' in any case, except that composing macros might lead to it.

DONE: ./oink -fo-exclude-extra-star-amp

Rule global-0.0: Disallow global variables of type $js, $js * (or more layers of stars).

For each declarator, if global, check the type does not contain a $js, using a non-function-parameter type walk.

DONE: -q-exclude-global '$!js'

Rule stack-2.0 and Rule stack-3.1: We check declaration is followed by a call to register(&v) before any use and unregister(&v) after all uses; therefore directly-rooted. Note that for stack-allocated arrays of $js*, we have an array version of register and unregister.

DONE: -q-reg-stack '$!js*' -q-reg-stack-regfunc register_var -q-reg-stack-un-regfunc unregister_var -q-reg-stack-array-regfunc register_array_var -q-reg-stack-array-un-regfunc unregister_array_var

Schema for the analysis:

   PtrMap<Variable_O, StackAllocVarLifecycle> var2lifecycle;
   class StackAllocVarLifecycle {
     Variable_O *var;
     bool array;
     int declPoint;
     int registerPoint;
     int unregisterPoint;
   };

For each S_compound:

  • Pass 1, shallow - We want to recognize a stack allocated $js* or $js*[]. We record the variable name and whether or not it is an array and the iteration count where we found it.
  • Pass 2, shallow - Record all of the locations where any of the variables of interest in the first pass have register(v) and unregister(&v) called on them. Report errors if these three points are not in monotonic increasing order.
  • Pass 3, variable - Check that each variable is both registered and unregistered; that they are done in the right order is checked above; this is only really necessary if the variable is never used as otherwise it could be checked at the usage point
  • Pass 4, deep - Check that all useages of the variable are between the register and unregister point inclusive-inclusive.

Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').

as above, mark every de-reference-ed expression. for each expression, if it is of type $js**, must be a function argument or must be in the context of a de-reference.

one visitation marks function arguments the other marks dereference arguments the third finds all $js ** expressions and checks that they are in one of the above contexts

DONE: -q-exl-perm-alias '$!js**'

Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').

for each FullExpression fe, for each variable v in it in an r-value context, if it is of type $js*, check that there is no pointer or ref to it.

DONE: -q-exl-value-ref-call '$!js*'

Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.

for each function call, for each argument, if has type $js*, the argument is a variable expression and the variable is stack or parameter allocated

DONE: -q-arg-local-var '$!js*'

Rule temporary-3.1: After operator overloading, an expression of type $js* must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.

for all expressions, if a deref or initializer as above, mark as such. for all expressions, if of type $js*, check the above predicate using the previous pass.

DONE: -q-temp-immediately-used '$!js*'

Rule parameter-3.1': Parameters of type $js *v may not have the address &v taken nor may v occur in an lvalue context.

for each addr-of expression and reference-type expression, the argument is not a variable that is a parameter of type $js*

nah, this just means that if it occurs in a parameter that it is const. const_when_param will work for this. DONE: -q-config ../libqual/config/lattice -q-const-when-param '$!js*'


Covered by other rules and therefore not implemented

Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.

(omit for now due to Rule parameter-3.1') for each parameter, if it is of type $js * make it $js * const. do a const analysis

Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.

(omit for now due to Rule parameter-3.2/3.3')

Rule parameter-3.3: For any argument to f(), other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v, annotate any $js* at any depth in the expression type as non-stack.

(omit for now due to Rule parameter-3.2/3.3')


Less important and therefore not implemented

Typechecking for $js: we don't check that $js always flows to $js and non-$js always flows to non-$js.

Brendan says that this is not important.

Rule cast-0.0: Check that nothing is ever cast 1) to or 2) from a data type containing $js.

Add flag -q-exclude-cast '$!js' which excludes the qualifier from casts.

DONE for cast 'to': -q-exclude-cast '$!js' do cast 'from':

Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.

Rule stack-3.0: Check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.

check off the ones above and on another pass, see if any were missed. also check that they all say register(&v), for some stack-allocated v

Make all $js*** and higher illegal also: I think Igor said this was ok to do but I don't think we need it.

for each expression and declarator variable, if it contains a $js, it must be a $js, $js*, or $js**.

Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.