MMgc thread safety annotations: Difference between revisions

From MozillaWiki
Jump to navigation Jump to search
Line 103: Line 103:
=== Thread state expressions ===
=== Thread state expressions ===


A '''thread state expression''' is a logical expression that combines
A '''thread state expression''' combines thread state variables with logical operators.


  tsexpr ::= ''tsvar'' | (''tsexpr'' AND ''tsexpr'') | (''tsexpr'' OR ''tsexpr'') | NOT ''tsexpr''
  tsexpr ::= ''tsvar'' | (''tsexpr'' AND ''tsexpr'') | (''tsexpr'' OR ''tsexpr'') | NOT ''tsexpr''


(Note on logic:  With a few applications of the distributive law and de Morgan's laws, some double negative elimination, and a little sorting, this can be simplified to a canonical form, which is like
When a thread state expression appears in C++ code, it looks like this:


  tsexpr ::= sorted list of ''case''s joined by OR
  tsexpr ::= ''tsvar''
case  ::= sorted list of ''simple-assertion''s joined by AND
          | MMGC_TS_OR(''tsexpr'', ''tsexpr'')
simple-assertion ::= ''tsvar'' | NOT ''tsvar''
          | MMGC_TS_AND(''tsexpr'', ''tsexpr'')
          | MMGC_TS_NOT(''tsexpr'')


Ignore the NOT symbol when you're sorting and this will put contradictions like "A AND NOT A" right next to each other like that.)
Same meaning, slightly different syntax.


When a thread state expression appears in C++ code, it looks like this instead:
(Note on logic:  You can define a canonical form on thread state expressions.  Here's an algorithm that defines a particularly verbose canonical form; there are lots of ways to improve on it:


tsexpr ::= ''tsvar'' | MMGC_TS_OR(''tsexpr'', ''tsexpr'') | MMGC_TS_AND(''tsexpr'', ''tsexpr'') | MMGC_TS_NOT(''tsexpr'')
# Get the sorted list ''V'' of all thread state variables that appear in the expression.
# Generate all 2<sup>|''V''|</sup> possibilities in some canonical order, e.g. if V=[A,B] then the possibilities are {A: true, B: true}, {A: true, B: false}, {A: false, B: true}, and {A: false, B: false}.
# For each possibility, plug the values into the thread state expression and evaluate.
# Your canonical form is a big OR-expression collecting all the possibilities for which the expression evaluates true.  For example, if the expression is true for the first three possibilities above, but not the fourth, then the canonical form is: <code>((A AND B) OR (A AND NOT B) OR (NOT A AND B))</code>.


Same meaning, slightly different syntax.
A smarter algorithm would make the canonical form for this expression something more like <code>(A OR B)</code>.
 
A canonical form like this one that's all "multiplied out", with ORs at the top tier, any ANDs below that, and any NOTs appearing only on individual variables, is easy to work with.)
 
A thread state expression is '''inconsistent''' if it's self-contradictory, like "(P AND NOT P)".  This means this thread state expression can never be satisfied; it's never true.  Note that "((P AND NOT P) OR Q)" is not inconsistent; it simplifies to "Q".
 
One thread state expression may '''imply'' another, as in logic: "(P AND Q)" implies "(P OR R)".  It means any time the former is true, the latter will be true.


=== Known thread state ===
=== Known thread state ===

Revision as of 14:08, 20 October 2007

Basic idea

I want to use static analysis to find race conditions in the MMGC_THREADSAFE stuff I'm working on.

So I've started putting annotations on each member function and each member variable of class MMgc::GC. Like this:

namespace MMgc {
	class GC  //...
	{
		// ...

		/** @access Requires(pageMapLock) */
		uintptr memStart;
		/** @access Requires(pageMapLock) */
		uintptr memEnd;

		/** @access Requires(m_lock) */
		size_t totalGCPages;

		/**
		 * This spinlock covers memStart, memEnd, and the contents of pageMap.
		 */
		mutable GCSpinLock pageMapLock;

		// ...

		/**
		 * Zero out the pageMap bits for the given address.
		 *
		 * @access Requires(pageMapLock)
		 */
		void ClearPageMapValue(uintptr addr);

		// ...

	}
}

For member variables, the @access restriction applies to any thread trying to read or write the variable. For member functions, it applies to the calling thread.

I also have some conditions I'd like to demand of the calling thread that aren't the names of actual lock objects. I've been using Requires(request) to mean "the thread must be in a matching BeginRequest/EndRequest pair", and likewise Requires(exclusiveGC) means we're in "stop-the-world" mode; all application threads have stopped and the calling thread is the single thread doing GC work.

Conditions can be combined, using && and ||, as in:

		/**
		 * Points to the head of a linked list of edge callback objects.
		 *
		 * @access Requires((request && m_callbackListLock) || exclusiveGC)
		 *
		 * In an MMGC_THREADSAFE build, this linked list is protected by the
		 * request model.  A thread must only access the list (a) from
		 * application code that is within a request AND holds
		 * m_callbackListLock; or (b) from MMgc code in the
		 * m_exclusiveGCThread.
		 *
		 * This policy is different from the one that covers m_callbacks for
		 * two reasons.  First, m_callbacks can fire the precollect callback
		 * even if the calling thread is not in a request at all, so this
		 * policy would be insufficient for m_callbacks.  Second,
		 * m_edgeCallbacks fires very frequently during marking, so a
		 * lock-free policy is probably much faster.
		 */
		GCEdgeCallback *m_edgeCallbacks;

Lastly, a ReadWrite access restriction indicates that one condition suffices for code that only wants to read the variable, while a different condition confers full read/write access.

		/**
		 * True if incremental marking is on and some objects have been marked.
		 * This means write barriers are enabled.
		 *
		 * @access ReadWrite(request, exclusiveGC)
		 *
		 * The GC thread may read and write this flag.  Application threads in
		 * requests have read-only access.
		 */
		bool marking;

--jorendorff 15:12, 19 October 2007 (PDT)

Taras suggests something more like

 /**
  * True if incremental marking is on and some objects have been marked.
  * This means write barriers are enabled.
  */
 MMGC_THREADSAFE_READ_WRITE(request, exclusiveGC)
 bool marking;

--jorendorff 17:21, 19 October 2007 (PDT)

Specification

(Incomplete.)

The goal is to prove a certain minimal consistency between what states we think a multi-threaded program can get into and what the code actually says.

We first declare a set of boolean variables that captures the status of each thread. These are not C++ variables. They're per-thread, not shared; and they are true-false statements about the thread, not physical memory locations.

MMGC_TS_VAR(name);
Syntactically this is a declaration, including the semicolon. It's only legal at top-level (outside all classes, namespaces, etc.).
Introduces a thread state variable with the given name. Regardless of C++'s scoping rules, all thread state variables logically inhabit the same global namespace.

Thread state expressions

A thread state expression combines thread state variables with logical operators.

tsexpr ::= tsvar | (tsexpr AND tsexpr) | (tsexpr OR tsexpr) | NOT tsexpr

When a thread state expression appears in C++ code, it looks like this:

tsexpr ::= tsvar
         | MMGC_TS_OR(tsexpr, tsexpr)
         | MMGC_TS_AND(tsexpr, tsexpr)
         | MMGC_TS_NOT(tsexpr)

Same meaning, slightly different syntax.

(Note on logic: You can define a canonical form on thread state expressions. Here's an algorithm that defines a particularly verbose canonical form; there are lots of ways to improve on it:

  1. Get the sorted list V of all thread state variables that appear in the expression.
  2. Generate all 2|V| possibilities in some canonical order, e.g. if V=[A,B] then the possibilities are {A: true, B: true}, {A: true, B: false}, {A: false, B: true}, and {A: false, B: false}.
  3. For each possibility, plug the values into the thread state expression and evaluate.
  4. Your canonical form is a big OR-expression collecting all the possibilities for which the expression evaluates true. For example, if the expression is true for the first three possibilities above, but not the fourth, then the canonical form is: ((A AND B) OR (A AND NOT B) OR (NOT A AND B)).

A smarter algorithm would make the canonical form for this expression something more like (A OR B).

A canonical form like this one that's all "multiplied out", with ORs at the top tier, any ANDs below that, and any NOTs appearing only on individual variables, is easy to work with.)

A thread state expression is inconsistent if it's self-contradictory, like "(P AND NOT P)". This means this thread state expression can never be satisfied; it's never true. Note that "((P AND NOT P) OR Q)" is not inconsistent; it simplifies to "Q".

One thread state expression may 'imply another, as in logic: "(P AND Q)" implies "(P OR R)". It means any time the former is true, the latter will be true.

Known thread state

For a given translation unit, for each position in a function, we'll define the known thread state at that point. The known thread state is a thread state expression.

  • At the top of a function, the known thread state is ts-expr if the function (definition or declaration) is annotated with MMGC_TS_METHOD_REQUIRES(ts-expr), and {} otherwise (indicating nothing is known).
  • Each arc may modify the state, as follows:
    • Stepping past an MMGC_TS_TRANSITION(name, newState) statement causes the variable name in the known state to transition from !newState to newState. Reaching this statement is an error if the known state (before) does not include the pair name: !newState.
    • Entering the scope of a sentry object is equivalent to stepping past an MMGC_TS_TRANSITION(name, true) statement.
    • Leaving the scope of a sentry object is equivalent to stepping past an MMGC_TS_TRANSITION(name, false) statement.

A ts-expr is one of:

name
The name of a thread state variable.
MMGC_TS_OR(ts-expr, ts-expr)
Logical OR.
MMGC_TS_AND(ts-expr)
Logical AND.

The function tsExprToState is defined as follows:

tsExprToState name = {name: true}
tsExprToState


MMGC_TS_METHOD_REQUIRES(ts-expr)
.
MMGC_TS_FIELD_REQUIRES(ts-expr)
.
MMGC_TS_FIELD_RW_REQUIRES(read-ts-expr, write-ts-expr)
.


MMGC_TS_LOCK(name)
This is allowed before any variable declaration.
Indicates that the variable being declared is a lock, and that the
MMGC_TS_LOCK_SENTRY_CLASS
This is allowed before any class declaration.
Indicates that the class is a lock sentry class. Objects of this class then cause state transitions to happen during flow analysis.
MMGC_TS_TRANSITION(name, {true|false});
Syntactically this is a statement, including the semicolon.
Passing this statement causes the named thread state variable to transition to the specified value.