GC SafetySpec: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
(initial submission)
 
m (Reverted edit of BooM53, changed back to last version by Dsw)
 
(8 intermediate revisions by 2 users not shown)
Line 1: Line 1:
<center>Garbage Collection Safety Analysis of SpiderMonkey</center>
<div style="border-style:solid; border-width:1px; padding:5px; margin-bottom:0; text-align:center;">
<center>Daniel S. Wilkerson, Igor Bukanov, Taras Glek, Simon Goldsmith</center>
<big>'''Garbage Collection Safety Analysis of SpiderMonkey'''</big>
<center>[http://danielwilkerson.com/ Daniel S. Wilkerson], Igor Bukanov, Taras Glek, Simon Goldsmith</center>
</div>
 
<font color=red>NOTE: Do not hack on this page in the wiki.  I, Daniel
Wilkerson, am maintaining it as a text file on disk and I am editing
it there and then re-pasting the entire document.  If someone else
makes a change and I don't notice it then it will get wiped out
unfortunately.  Send changes to me instead.</font>


= Introduction =
= Introduction =
Line 31: Line 39:




= Rootedness =
== Rootedness ==


Annotate any JS object in the type-system with the qualifier $js.
Annotate any JS object in the type-system with the qualifier $js.
Line 52: Line 60:




= Partition into allocation classes =
== Partition into allocation classes ==


There are six allocation classes of objects in C:
There are six allocation classes of objects in C:
Line 96: Line 104:




= Incomplete =
== Incomplete ==


Look for any implicit rules that I did not make explicit.
Look for any implicit rules that I did not make explicit.
Line 103: Line 111:




= Invariant 1: Initialized to NULL and therefore do not dangle upon creation =
= Invariant 1: $js* values do not dangle upon creation =


'''Sole-heap''': These are made by alloc_rooted_ptr() which we trust to
'''Sole-heap''': These are made by alloc_rooted_ptr() which we trust to
Line 114: Line 122:
$js*, only JS objects can contain a $js* member.
$js*, only JS objects can contain a $js* member.


'''Stack''': <u>Rule stack-1.0</u>: We check syntactically that the
'''Stack''': <u>Rule stack-1.0</u>: We could require them to simply be
initialization exists.  Alternative: (dsw) Igor keeps saying that this
initialized and check this syntactically; however since we will
is not necessary, since we will require register(&v) to be called
require register(&v) to be called before the first use, we will follow
immediately and it must initialize the variable.
Igor's suggestion and trust that register initializes the pointer upon
creation.


Simon points out that a $js[] must only occur on the heap and be
Simon points out that a $js*[] must either 1) be allocated on the heap
registered with alloc_rooted_ptr() upon creation or on the stack if
and be registered with alloc_rooted_ptr() upon creation or 2) be
register() were called (register also takes a length argument).  Now,
allocated on the stack if register_array() were called to register it
if there is a $js* that has pointer arithmetic done on it, we get a
(register also takes a length argument).  Now, if there is a $js* that
temporary that is copy-of-rooted because the initial pointer pointed
has pointer arithmetic done on it, the result is a temporary that is
into a rooted array and therefore all of the members of the array are
copy-of-rooted; this is because the initial pointer pointed into a
rooted.
rooted array and therefore all of the members of the array are rooted.


<u>Rule stack-1.1</u>: should check that register gets the right length
<u>Rule stack-1.1</u>: should check that register gets the right length
Line 135: Line 144:




= Invariant 2: Rooted upon creation =
= Invariant 2: $js* values are rooted upon creation =


'''Sole-heap''': We trust alloc_rooted_ptr() to put the pointer to the
'''Sole-heap''': We trust alloc_rooted_ptr() to put the pointer to the
newly created $js* (or $js[]) object into the GC root set; therefore
newly created $js* (or $js*[]) object into the GC root set; therefore
directly-rooted.
directly-rooted.


Line 146: Line 155:
sole-heap objects, only JS objects can contain $js*.
sole-heap objects, only JS objects can contain $js*.


'''Stack''': <u>Rule stack-2.0</u>: We check initialization is
'''Stack''': <u>Rule stack-2.0</u>: We check declaration is followed
immediately followed by a call to register(&v); therefore
by a call to register(&v) before any use; therefore directly-rooted.
directly-rooted. (dsw) I think Igor suggests that perhaps this is
(dsw) I think Igor suggests that perhaps this is where they should be
where they should be initialized; see above.
initialized; see above.
 
Note that other rules establish that a stack-allocated variable v is
not aliased except temporarily and on the stack in the frame of a
called-function only.  Therefore a check that v is not used before
register(&v) nor after unregister(&v) is simply a syntactic check for
the name of v.


'''Temporary''': By induction is copy-of-rooted upon creation.
'''Temporary''': By induction is copy-of-rooted upon creation.
Line 156: Line 171:




= Invariant 3: Remain rooted until "about to cease to exist" =
= Invariant 3: $js* values remain rooted until unusable =
 
Definition: At a particular time an object is "unusable" iff from that
time until object de-allocation the object cannot be used.


Definition: At a particular time an object is "about to be cease to
One way for it to be clear that an object is unusable is if it is
exist" iff from that time until object de-allocation the garbage
neither named-in nor is an alias-to-it-available-in the code remaining
collector cannot run.
to be executed until the object is destructed.


'''Sole-heap''': We trust the garbage collector to not un-root the object
'''Sole-heap''': We trust the garbage collector to not un-root the
until it is about to cease to exist.
object until it is unusable.


'''Member-heap''': We trust the containing JS object destructor to not
'''Member-heap''': We trust the containing JS object destructor to not
un-root the object until it is about to cease to exist.
un-root the object until it is unusable.


'''Stack''': We must check that stack-allocated variables are only
'''Stack''': We must check that stack-allocated variables are only
unregistered when the GC cannot run and the frame is about to cease to
unregistered when unusable.
exist.


<u>Rule stack-3.0</u>: We check that unregister() can only take as an
<u>Rule stack-3.0</u>: We check that unregister() can only take as an
Line 176: Line 193:
$js*.
$js*.


<u>Rule stack-3.1</u>: We check that once unregister(&v) is called, v is not
<u>Rule stack-3.1</u>: We check that once unregister(&v) is called, v
referred to again in the function body.
is not referred to again in the function body. See note regarding
 
aliasing and checking at rule stack-2.0 above.
FIX: factor out the lack-of-alias to v to show that v cannot be used
indirectly either.


'''Temporary''': All temporaries are const: the language provides no way
'''Temporary''': All temporaries are const: the language provides no way
Line 253: Line 268:




= Establishing invariant 3 parameter property 1 =
== Establishing invariant 3 parameter property 1 ==


Lemma: The parameter v itself cannot change during its lifetime.
Lemma: The parameter v itself cannot change during its lifetime.
Line 261: Line 276:




= Establishing invariant 3 parameter property 2 =
== Establishing invariant 3 parameter property 2 ==


Lemma: No alias to any stack- or parameter-allocated $js *v2 in the
Lemma: No alias to any stack- or parameter-allocated $js *v2 in the
Line 317: Line 332:
linker-imitator being correct and it is known to have bugs.
linker-imitator being correct and it is known to have bugs.


* <u>Rule parameter-3.1'</u>: Parameters of type $js *v by not allowing the
'''<u>Rule parameter-3.1'</u>''': Parameters of type $js *v by not allowing the
address &v to be taken nor for v to occur in an lvalue context.
address &v to be taken nor for v to occur in an lvalue context.


Line 324: Line 339:
make pointers to const in this case.
make pointers to const in this case.


* <u>Rule parameter-3.2/3.3'</u>: You may not copy the value of an expression
'''<u>Rule parameter-3.2/3.3'</u>''': You may not copy the value of an expression
of type $js** anywhere except when using it as a function argument.
of type $js** anywhere except when using it as a function argument.
You may only use a $js** expression when de-referencing (and also not
You may only use a $js** expression when de-referencing (and also not
Line 341: Line 356:
or parameter-allocated $js *v because the value of the type would have
or parameter-allocated $js *v because the value of the type would have
to have been constructed by storing &v somewhere.
to have been constructed by storing &v somewhere.


= Rule summary =
= Rule summary =


== Important rules ==
We present and summarize all of the above rules together with
implementation notes.  They are categorized as follows:


* register
* Important rules: These are rules that we think are likely to be violated and therefore really need to be checked.  The implementation plan and resulting oink flag(s) are noted.  These rules may be checked today using Oink.  More documentation on how to use the flags can be obtained by typing oink/qual --help.  Tests that demonstrate their usage can be found by grepping for the flag name in oink/*.mk files.


Parametrize the register name with another flag.
* Covered by other rules and therefore not implemented: These rules are implied by other rules.


Implement register_array and parametrize in the same way.
* Less important and therefore not implemented: I considered these to be less important because for various reasons we thought they were less likely to be violated.


Implement checking that register(&v) is called sometime after v is
declared, not just immediately.  Brendan says it is not important to
check that v is not used before it is called.


Implement checking that once unregister(&v) is called, then v is not
== Important rules ==
used later.


* make '&*' illegal.
'''Make '&*' illegal'''.  We need this so that when we locally see a
'*' de-referencing a value, we know that this is not being defeated by
being wrapped in a '&'.  I see no good reason for '&*' or '*&' in any
case, except that composing macros might lead to it.


DONE: ./oink -fo-exclude-extra-star-amp
DONE: ./oink -fo-exclude-extra-star-amp


* <u>Rule global-0.0</u>: Disallow global variables of type $js, $js * (or more
'''<u>Rule global-0.0</u>''': Disallow global variables of type $js, $js * (or more
layers of stars).
layers of stars).


Line 371: Line 387:
using a non-function-parameter type walk.
using a non-function-parameter type walk.


DONE: -q-exclude-global '$$!nonglobal'
DONE: -q-exclude-global '$!js'


* <u>Rule stack-2.0</u>: Check stack-allocated $js* declarator is immediately
'''<u>Rule stack-2.0</u>''' and '''<u>Rule stack-3.1</u>''': We check
followed by a call to register(&v).
declaration is followed by a call to register(&v) before any use and
unregister(&v) after all uses; therefore directly-rooted.  Note that
for stack-allocated arrays of $js*, we have an array version of
register and unregister.


for each compound statement,
DONE: -q-reg-stack '$!js*'
for each statement decl of $js *v,
-q-reg-stack-regfunc register_var
check next statement is call to register(&v),
-q-reg-stack-un-regfunc unregister_var
check it off.
-q-reg-stack-array-regfunc register_array_var
-q-reg-stack-array-un-regfunc unregister_array_var


DONE: -q-reg-stack '$$!js*'
Schema for the analysis:


* <u>Rule parameter-3.2/3.3'</u>: You may not copy the value of an
    PtrMap<Variable_O, StackAllocVarLifecycle> var2lifecycle;
 
    class StackAllocVarLifecycle {
      Variable_O *var;
      bool array;
      int declPoint;
      int registerPoint;
      int unregisterPoint;
    };
 
For each S_compound:
 
* Pass 1, shallow - We want to recognize a stack allocated $js* or $js*[].  We record the variable name and whether or not it is an array and the iteration count where we found it.
 
* Pass 2, shallow - Record all of the locations where any of the variables of interest in the first pass have register(v) and unregister(&v) called on them.  Report errors if these three points are not in monotonic increasing order.
 
* Pass 3, variable - Check that each variable is both registered and unregistered; that they are done in the right order is checked above; this is only really necessary if the variable is never used as otherwise it could be checked at the usage point
 
* Pass 4, deep - Check that all useages of the variable are between the register and unregister point inclusive-inclusive.
 
'''<u>Rule parameter-3.2/3.3'</u>''': You may not copy the value of an
expression of type $js** anywhere except when using it as a function
expression of type $js** anywhere except when using it as a function
argument.  You may only use a $js** expression when de-referencing
argument.  You may only use a $js** expression when de-referencing
Line 399: Line 439:
are in one of the above contexts
are in one of the above contexts


DONE: -q-exl-perm-alias '$$!js**'
DONE: -q-exl-perm-alias '$!js**'


* <u>Rule temporary-3.2</u>: If a variable $js *v occurs in a Full Expression
'''<u>Rule temporary-3.2</u>''': If a variable $js *v occurs in a Full Expression
(it must be stack- or parameter-allocated), a) the expression '&v' may
(it must be stack- or parameter-allocated), a) the expression '&v' may
not occur in the same full expression b) nor may 'v' initialize a
not occur in the same full expression b) nor may 'v' initialize a
Line 412: Line 452:
check that there is no pointer or ref to it.
check that there is no pointer or ref to it.


DONE: -q-exl-value-ref-call '$$!js*'
DONE: -q-exl-value-ref-call '$!js*'


* <u>Rule parameter-3.0</u>: After operator overloading, an argument of type
'''<u>Rule parameter-3.0</u>''': After operator overloading, an argument of type
$js* to a function call must be a stack- or parameter-allocated
$js* to a function call must be a stack- or parameter-allocated
variable $js *v2.
variable $js *v2.
Line 424: Line 464:
and the variable is stack or parameter allocated
and the variable is stack or parameter allocated


DONE: -q-arg-local-var '$$!js*'
DONE: -q-arg-local-var '$!js*'


* <u>Rule temporary-3.1</u>: After operator overloading, an expression of
'''<u>Rule temporary-3.1</u>''': After operator overloading, an expression of
type $js* must be 1) a stack- or parameter-allocated variable or 2) in
type $js* must be 1) a stack- or parameter-allocated variable or 2) in
the context of a) an immediate '*' (and not subsequent '&') or b) in
the context of a) an immediate '*' (and not subsequent '&') or b) in
Line 437: Line 477:
check the above predicate using the previous pass.
check the above predicate using the previous pass.


DONE: -q-temp-immediately-used '$$!js*'
DONE: -q-temp-immediately-used '$!js*'


* <u>Rule parameter-3.1'</u>: Parameters of type $js *v may not have the
'''<u>Rule parameter-3.1'</u>''': Parameters of type $js *v may not have the
address &v taken nor may v occur in an lvalue context.
address &v taken nor may v occur in an lvalue context.


Line 447: Line 487:
nah, this just means that if it occurs in a parameter that it
nah, this just means that if it occurs in a parameter that it
is const.  const_when_param will work for this.
is const.  const_when_param will work for this.
DONE: -q-config ../libqual/config/lattice -q-const-when-param '$$!js*'
DONE: -q-config ../libqual/config/lattice -q-const-when-param '$!js*'
 


== Covered by other rules and therefore not implemented ==
== Covered by other rules and therefore not implemented ==


<u>Rule parameter-3.1</u>: We make all parameters $js *v into $js * const v
'''<u>Rule parameter-3.1</u>''': We make all parameters $js *v into $js * const v
and flow the const.
and flow the const.


Line 460: Line 501:
do a const analysis
do a const analysis


<u>Rule parameter-3.2</u>: We check with a stackness analysis that the
'''<u>Rule parameter-3.2</u>''': We check with a stackness analysis that the
address of a stack- or parameter-allocated $js *v2 is never stored on
address of a stack- or parameter-allocated $js *v2 is never stored on
the heap or in a global.
the heap or in a global.
Line 466: Line 507:
(omit for now due to Rule parameter-3.2/3.3')
(omit for now due to Rule parameter-3.2/3.3')


<u>Rule parameter-3.3</u>: For any argument to f(), other than 'v' or '&v'
'''<u>Rule parameter-3.3</u>''': For any argument to f(), other than 'v' or '&v'
where v is stack-allocated in the caller frame as $js *v, annotate any
where v is stack-allocated in the caller frame as $js *v, annotate any
$js* at any depth in the expression type as non-stack.
$js* at any depth in the expression type as non-stack.
Line 475: Line 516:
== Less important and therefore not implemented ==
== Less important and therefore not implemented ==


I considered these to be less important because for various reasons we
'''Typechecking for $js''': we don't check that $js always flows to $js
thought they were less likely to be violated.
 
* Typechecking for $js: we don't check that $js always flows to $js
and non-$js always flows to non-$js.
and non-$js always flows to non-$js.


Brendan says that this is not important.
Brendan says that this is not important.


* <u>Rule cast-0.0</u>: Check that nothing is ever cast 1) to or 2) from a
'''<u>Rule cast-0.0</u>''': Check that nothing is ever cast 1) to or 2) from a
data type containing $js.
data type containing $js.


Line 489: Line 527:
from casts.
from casts.


DONE for cast 'to': -q-exclude-cast '$$!noncast'
DONE for cast 'to': -q-exclude-cast '$!js'
do cast 'from':
do cast 'from':


* <u>Rule stack-1.1</u>: should check that register gets the right length
'''<u>Rule stack-1.1</u>''': should check that register gets the right length
argument for stack allocated arrays.
argument for stack allocated arrays.


* <u>Rule stack-3.0</u>: Check that unregister() can only take as an argument
'''<u>Rule stack-3.0</u>''': Check that unregister() can only take as an argument
an expression of the form &v where v is a stack-allocated $js*.
an expression of the form &v where v is a stack-allocated $js*.


Line 503: Line 541:
for some stack-allocated v
for some stack-allocated v


* <u>Rule stack-3.1</u>: after unregister(&v) v is not referred to until
'''Make all $js*** and higher illegal also''': I think Igor said this
the end of the frame.
was ok to do but I don't think we need it.
 
Show that also v is not aliased (probably a consequence of the rules
that we already have).
 
* I think we can make all $js*** and higher illegal also.


for each expression and declarator variable,
for each expression and declarator variable,
Line 515: Line 548:
it must be a $js, $js*, or $js**.
it must be a $js, $js*, or $js**.


* <u>Rule member-heap-1.0</u>: Check no C/C++ struct/class/union contain a
'''<u>Rule member-heap-1.0</u>''': Check no C/C++ struct/class/union contain a
$js*, only JS objects can contain a $js* member.
$js*, only JS objects can contain a $js* member.

Latest revision as of 18:00, 10 April 2007

Garbage Collection Safety Analysis of SpiderMonkey

Daniel S. Wilkerson, Igor Bukanov, Taras Glek, Simon Goldsmith

NOTE: Do not hack on this page in the wiki. I, Daniel Wilkerson, am maintaining it as a text file on disk and I am editing it there and then re-pasting the entire document. If someone else makes a change and I don't notice it then it will get wiped out unfortunately. Send changes to me instead.

Introduction

SpiderMonkey is a JavaScript (JS) interpreter implemented in C. Objects on the JavaScript heap are also C objects and can be pointed to by C code. Any C object is either a "JS object" if on the JS heap or a "non-JS object" if not.

The JS interpreter has a garbage collector (GC). Note that in many cases we assume that the program is synchronous; that is, we do not consider any asynchronous computation: threading, signals, nor exceptions. The GC has a set of pointers called the "(JS) root set". The set of objects reachable from the root set are the "JS-live" objects. Consider the C globals and stack to be the "C root set". The set of objects reachable from the C root set are the "C-live" objects.

Since the JS interpreter is implemented in C, JS-live is a subset of C-live. However it is possible for a JS object to be C-live but not JS-live. If the GC were to run under such a condition, the JS object could be collected and then the C pointer to the object would dangle. The goal of this analysis is to prevent that situation.

Theorem: At all times all JS objects that are C-live are JS-live.

Below we prove the theorem assuming some rules constraining the code are followed. After that we specify which rules have had an Oink checker written for them.


Rootedness

Annotate any JS object in the type-system with the qualifier $js. Pointers that can dangle are of the type $js*.

Definitions:

  • A $js* instance is "directly rooted" if it's address is in the GC root set.
  • A $js* instance is "transitively rooted" if it is member of a JavaScript object (not C++ object) a pointer to which is transitively rooted or rooted.
  • A $js* instance is "copy-of-rooted" if it has the same value as an instance that is directly rooted or transitively rooted.
  • A $js* instance is "rooted" that is directly rooted, transitively rooted, or copy-of-rooted.

If all $js*-s are rooted at all times then any C-live $js is JS-live and our goal is satisfied.

Lemma: All values of type $js* are rooted at all times.


Partition into allocation classes

There are six allocation classes of objects in C:

  • global,
  • sole-heap,
  • member-heap,
  • parameter,
  • stack, and
  • temporary.

By "sole-heap" we mean just a pointer or an array of pointers allocated on the heap and by "member-heap" we mean a member of an object on the heap. By "parameter" we mean stack variables that are on the parameter list and by "stack" we mean stack variables that are not on the parameter list. By "temporary" we mean stack variables computed in compiler-written code as arguments to functions or as temporaries in other expressions; they have no first-class names to the user.

We will adopt a set of rules that can be checked by an analysis and then show that a $js* cannot dangle in each of these classes. We establish three invariants on all $js* values in order to do this.

Rule global-0.0: We only need five cases for each invariant because we disallow global $js* variables completely, which handles this case.

A note on temporaries: A "Full Expression" is a C++ technical term; it means an expression maximal in AST height, usually terminated by a semi-colon. Consider a Full Expression e containing a temporary expression $js *e2; note that 'e2' is a meta-name not a first-class program name because temporaries do not have names in a program. Note that the entire lifetime of e2 is the execution Full Expression e.

Note that we assume that other than running the GC the program is otherwise memory safe.

Simon points out the usual problem with Oink/Cqual++ analyses, that the type-system is or model of the program; therefore we need to prevent casting to or away from $js*.

Rule cast-0.0: Check that nothing is ever cast to or from a data type containing $js.


Incomplete

Look for any implicit rules that I did not make explicit.

Simon suggests that we provide a set of legal idioms.


Invariant 1: $js* values do not dangle upon creation

Sole-heap: These are made by alloc_rooted_ptr() which we trust to initialize them.

Member-heap: We trust the containing JS object ctor to initialize them. Simon points out that we need this rule.

Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.

Stack: Rule stack-1.0: We could require them to simply be initialized and check this syntactically; however since we will require register(&v) to be called before the first use, we will follow Igor's suggestion and trust that register initializes the pointer upon creation.

Simon points out that a $js*[] must either 1) be allocated on the heap and be registered with alloc_rooted_ptr() upon creation or 2) be allocated on the stack if register_array() were called to register it (register also takes a length argument). Now, if there is a $js* that has pointer arithmetic done on it, the result is a temporary that is copy-of-rooted; this is because the initial pointer pointed into a rooted array and therefore all of the members of the array are rooted.

Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.

Temporary: It is not possible for them to fail to be initialized.

Parameter: It is not possible for them to fail to be initialized.


Invariant 2: $js* values are rooted upon creation

Sole-heap: We trust alloc_rooted_ptr() to put the pointer to the newly created $js* (or $js*[]) object into the GC root set; therefore directly-rooted.

Member-heap: These are members of another object which by induction is pointed to by a rooted pointer; therefore transitively-rooted. NOTE: as pointed out above, we use the assumption that, other than sole-heap objects, only JS objects can contain $js*.

Stack: Rule stack-2.0: We check declaration is followed by a call to register(&v) before any use; therefore directly-rooted. (dsw) I think Igor suggests that perhaps this is where they should be initialized; see above.

Note that other rules establish that a stack-allocated variable v is not aliased except temporarily and on the stack in the frame of a called-function only. Therefore a check that v is not used before register(&v) nor after unregister(&v) is simply a syntactic check for the name of v.

Temporary: By induction is copy-of-rooted upon creation.

Parameter: By induction is copy-of-rooted upon creation.


Invariant 3: $js* values remain rooted until unusable

Definition: At a particular time an object is "unusable" iff from that time until object de-allocation the object cannot be used.

One way for it to be clear that an object is unusable is if it is neither named-in nor is an alias-to-it-available-in the code remaining to be executed until the object is destructed.

Sole-heap: We trust the garbage collector to not un-root the object until it is unusable.

Member-heap: We trust the containing JS object destructor to not un-root the object until it is unusable.

Stack: We must check that stack-allocated variables are only unregistered when unusable.

Rule stack-3.0: We check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.

Rule stack-3.1: We check that once unregister(&v) is called, v is not referred to again in the function body. See note regarding aliasing and checking at rule stack-2.0 above.

Temporary: All temporaries are const: the language provides no way to write them. We must simply show that they are a copy of does not change during their lifetimes.

Rule temporary-3.1: After operator overloading, an expression of type $js * must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.

Prevents this situation since a->v is of type $js* but not allowed by the above rule:

   struct A $js {
     $js *v;
   };
   void g(struct A *a) {
     a->v = NULL;
     // invoke gc; left argument to plus (that is, a->v) in h() is not rooted
   }
   void h(struct A *a) {
     // assume that the arguments are evaluated in the order numbered
     *( /*1*/ a->v + ( /*2*/ g(a), /*3*/ 0) );
   }

Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').

Note that rule temporary-3.1 partitions the cases. For case 1), rule temporary-3.2 prevents v from having any aliases in the FullExpression nor being changed directly. For case 2), we simply assume that the compiler will not write code to do anything at all between the creation and the destruction of the temporary.

Parameter: Consider a function f($js *v). Recall also that the entire lifetime of v is the execution of f().

Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.

Recall from the rules about temporaries that a parameter $js *v is copy-of-rooted upon initialization from a temporary argument and that argument is const (since a temporary) and by rule parameter-3.0 a copy of a stack- or parameter-allocated variable in the caller $js *v2. We show below that, with some additional rules, the following two properties hold.

1) The parameter v itself cannot change during its lifetime.

2) No alias to v2 exists during the execution of f().

Note that 2) implies that v2 cannot change during the lifetime of v. Recalling that v2 is rooted upon creation, parameter v therefore remains copy-of-rooted (of v2) during its lifetime.

Situation:

   void f($js *v) {
     // point 1: v is const
     // point 2: no alias to v2 other than v is available here,
     //          so for the duration of f(), v2 is const
   }
   void g() {
     $js *v2;
     f(v2);
   }


Establishing invariant 3 parameter property 1

Lemma: The parameter v itself cannot change during its lifetime.

Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.


Establishing invariant 3 parameter property 2

Lemma: No alias to any stack- or parameter-allocated $js *v2 in the caller to f() exists during the execution of f().

Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.

Prevents this situation:

   $js **x;
   void f($js *v) {
     *x = NULL;  // stomp on v2
     // v is no longer copy-of-rooted
     gc();
     *v; // can fail
   }
   void g() {
     $js *v2;
     x = &v2;
     f(v2);
   }

Rule parameter-3.3: For any argument to f() (other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v) annotate any $js* at any depth below further *'s or []'s in the expression type as non-stack.

Prevents this situation:

   $js *v2;
   $js **v3 = &v2;
   f(v3) // can't do this because the $js * withing the $js ** of v3
         // gets annotated as non-stack

Rule parameter-3.4: For stack- or parameter-allocated $js *v, prohibit both v and &v as arguments to the same function call site. Note that this rule is a consequence of rule temporary-3.2.

We now prove that the address of a parameter- or stack-allocated $js *v2 has no alias that is available to f(v2, ...) nor any other frame called from f(v2, ...). First no alias to v2 was stored on the heap nor in a global. Second other than for an argument of the form &v3, any $js* anywhere in a type passed in as an argument was marked non-stack and so does not point to $js *v2. Third for an argument of the form &v3 we check that the name 'v3' is not 'v2'.


Making the analysis function-local

Igor suggests the following rules which seem to make the analysis completely function-local. Such a solution is more low-tech as depending on inter-function dataflow of course depends on the oink linker-imitator being correct and it is known to have bugs.

Rule parameter-3.1': Parameters of type $js *v by not allowing the address &v to be taken nor for v to occur in an lvalue context.

Rule parameter 3.1 is a consequence of rule parameter-3.1' because the const qualifier is only relevant for pointer to const and we cannot make pointers to const in this case.

Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').

As a consequence you cannot take the address of a $js** unless you plan on immediately throwing it away or de-referencing it, which is pointless. Also as a consequence you cannot take the address of a $js* except as function argument.

Rule parameter-3.2 is a consequence because if you cannot store a $js** anywhere, you can't store it on the heap or in a global.

Rule parameter-3.3 is a consequence because if you cannot store a $js** anywhere you cannot build up a type where an underlying stack- or parameter-allocated $js *v because the value of the type would have to have been constructed by storing &v somewhere.


Rule summary

We present and summarize all of the above rules together with implementation notes. They are categorized as follows:

  • Important rules: These are rules that we think are likely to be violated and therefore really need to be checked. The implementation plan and resulting oink flag(s) are noted. These rules may be checked today using Oink. More documentation on how to use the flags can be obtained by typing oink/qual --help. Tests that demonstrate their usage can be found by grepping for the flag name in oink/*.mk files.
  • Covered by other rules and therefore not implemented: These rules are implied by other rules.
  • Less important and therefore not implemented: I considered these to be less important because for various reasons we thought they were less likely to be violated.


Important rules

Make '&*' illegal. We need this so that when we locally see a '*' de-referencing a value, we know that this is not being defeated by being wrapped in a '&'. I see no good reason for '&*' or '*&' in any case, except that composing macros might lead to it.

DONE: ./oink -fo-exclude-extra-star-amp

Rule global-0.0: Disallow global variables of type $js, $js * (or more layers of stars).

For each declarator, if global, check the type does not contain a $js, using a non-function-parameter type walk.

DONE: -q-exclude-global '$!js'

Rule stack-2.0 and Rule stack-3.1: We check declaration is followed by a call to register(&v) before any use and unregister(&v) after all uses; therefore directly-rooted. Note that for stack-allocated arrays of $js*, we have an array version of register and unregister.

DONE: -q-reg-stack '$!js*' -q-reg-stack-regfunc register_var -q-reg-stack-un-regfunc unregister_var -q-reg-stack-array-regfunc register_array_var -q-reg-stack-array-un-regfunc unregister_array_var

Schema for the analysis:

   PtrMap<Variable_O, StackAllocVarLifecycle> var2lifecycle;
   class StackAllocVarLifecycle {
     Variable_O *var;
     bool array;
     int declPoint;
     int registerPoint;
     int unregisterPoint;
   };

For each S_compound:

  • Pass 1, shallow - We want to recognize a stack allocated $js* or $js*[]. We record the variable name and whether or not it is an array and the iteration count where we found it.
  • Pass 2, shallow - Record all of the locations where any of the variables of interest in the first pass have register(v) and unregister(&v) called on them. Report errors if these three points are not in monotonic increasing order.
  • Pass 3, variable - Check that each variable is both registered and unregistered; that they are done in the right order is checked above; this is only really necessary if the variable is never used as otherwise it could be checked at the usage point
  • Pass 4, deep - Check that all useages of the variable are between the register and unregister point inclusive-inclusive.

Rule parameter-3.2/3.3': You may not copy the value of an expression of type $js** anywhere except when using it as a function argument. You may only use a $js** expression when de-referencing (and also not take immediately the address: no '&*vp').

as above, mark every de-reference-ed expression. for each expression, if it is of type $js**, must be a function argument or must be in the context of a de-reference.

one visitation marks function arguments the other marks dereference arguments the third finds all $js ** expressions and checks that they are in one of the above contexts

DONE: -q-exl-perm-alias '$!js**'

Rule temporary-3.2: If a variable $js *v occurs in a Full Expression (it must be stack- or parameter-allocated), a) the expression '&v' may not occur in the same full expression b) nor may 'v' initialize a reference type variable (such as '$js*&v2=v') nor may c) 'v' be written-to directly in the full expression (such as 'v=3').

for each FullExpression fe, for each variable v in it in an r-value context, if it is of type $js*, check that there is no pointer or ref to it.

DONE: -q-exl-value-ref-call '$!js*'

Rule parameter-3.0: After operator overloading, an argument of type $js* to a function call must be a stack- or parameter-allocated variable $js *v2.

for each function call, for each argument, if has type $js*, the argument is a variable expression and the variable is stack or parameter allocated

DONE: -q-arg-local-var '$!js*'

Rule temporary-3.1: After operator overloading, an expression of type $js* must be 1) a stack- or parameter-allocated variable or 2) in the context of a) an immediate '*' (and not subsequent '&') or b) in an initializer.

for all expressions, if a deref or initializer as above, mark as such. for all expressions, if of type $js*, check the above predicate using the previous pass.

DONE: -q-temp-immediately-used '$!js*'

Rule parameter-3.1': Parameters of type $js *v may not have the address &v taken nor may v occur in an lvalue context.

for each addr-of expression and reference-type expression, the argument is not a variable that is a parameter of type $js*

nah, this just means that if it occurs in a parameter that it is const. const_when_param will work for this. DONE: -q-config ../libqual/config/lattice -q-const-when-param '$!js*'


Covered by other rules and therefore not implemented

Rule parameter-3.1: We make all parameters $js *v into $js * const v and flow the const.

(omit for now due to Rule parameter-3.1') for each parameter, if it is of type $js * make it $js * const. do a const analysis

Rule parameter-3.2: We check with a stackness analysis that the address of a stack- or parameter-allocated $js *v2 is never stored on the heap or in a global.

(omit for now due to Rule parameter-3.2/3.3')

Rule parameter-3.3: For any argument to f(), other than 'v' or '&v' where v is stack-allocated in the caller frame as $js *v, annotate any $js* at any depth in the expression type as non-stack.

(omit for now due to Rule parameter-3.2/3.3')


Less important and therefore not implemented

Typechecking for $js: we don't check that $js always flows to $js and non-$js always flows to non-$js.

Brendan says that this is not important.

Rule cast-0.0: Check that nothing is ever cast 1) to or 2) from a data type containing $js.

Add flag -q-exclude-cast '$!js' which excludes the qualifier from casts.

DONE for cast 'to': -q-exclude-cast '$!js' do cast 'from':

Rule stack-1.1: should check that register gets the right length argument for stack allocated arrays.

Rule stack-3.0: Check that unregister() can only take as an argument an expression of the form &v where v is a stack-allocated $js*.

check off the ones above and on another pass, see if any were missed. also check that they all say register(&v), for some stack-allocated v

Make all $js*** and higher illegal also: I think Igor said this was ok to do but I don't think we need it.

for each expression and declarator variable, if it contains a $js, it must be a $js, $js*, or $js**.

Rule member-heap-1.0: Check no C/C++ struct/class/union contain a $js*, only JS objects can contain a $js* member.