IPC Protocols: Difference between revisions

no edit summary
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
There will be many IPC protocols, and they should all be as error-free as possible by design.  Having separate protocols for separate uses is a good thing, and they all shouldn't be multiplexed by hand over a shared transport layer.  Protocol implementations should only have to check for errors as infrequently as possible; the heavy lifting should be done generically, or by code generated from high-level specifications.  Finally, explicit protocols make automatically generating test cases (of "normal" ''and'' "failure" conditions) much easier.
There will be many IPC protocols, and they should all be as error-free as possible by design.  Having separate protocols for separate uses is a good thing, and they all shouldn't be multiplexed by hand over a shared transport layer.  Protocol implementations should only have to check for errors as infrequently as possible; the heavy lifting should be done generically, or by code generated from high-level specifications.  Finally, explicit protocols make automatically generating test cases (of "normal" ''and'' "failure" conditions) much easier.
This work has the imprimatur of the [[Content Processes|'''Electrolysis''']] project.


== Terminology ==
== Terminology ==
Line 30: Line 32:
== Protocol-definition Language (PDL) ==
== Protocol-definition Language (PDL) ==


A protocol is between a ''parent'' actor and a ''child'' actor.  Child actors are not trusted, and if there is any evidence the child is misbehaving, it is terminated.  Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killed.  The parent actor must, however, carefully handle errors.
A protocol is between a ''parent'' actor and a ''child'' actor.  A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and ''vice versa''.
 
A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and ''vice versa''.


A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
Line 59: Line 59:
   MessageArgument :: Type Identifier
   MessageArgument :: Type Identifier
    
    
   Type :: SharingQualifier BasicType
   Type :: SharingQualifier BasicType | ActorHandle
   SharingQualifier :: (\epsilon | 'transfer' | 'share')
   SharingQualifier :: (\epsilon | 'transfer' | 'share')
   BasicType :: BuiltinType | ImportedType
   BasicType :: BuiltinType | ImportedType
  ActorHandle :: ('parent' | 'child') 'actor' ProtocolName (':' StateName)?


A few items are worth calling out.
A few items are worth calling out.
Line 77: Line 78:


A BasicType is a C++ type that can be transferred in a message.  We will provide a set of BuiltinTypes like void, int, and so on.  Protocol writers can also ''import'' foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory.  (We may need to support union types for NPAPI.)
A BasicType is a C++ type that can be transferred in a message.  We will provide a set of BuiltinTypes like void, int, and so on.  Protocol writers can also ''import'' foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory.  (We may need to support union types for NPAPI.)
An ActorHandle is a handle to an already-created protocol actor; this can be thought of by analogy to <code>sockaddr_t</code>; it's a way to refer to a created actor.  The actor is either a ''parent'' or ''child'', and may be required to be in a particular state.  '''TODO''': the semantics of this still aren't 100% clear to me.  I'll need to work through a concrete example to figure them out.


If desired, we can add a notion of "privilege" to the message type system.  For example, chrome is more privileged than content, which is more privileged than plugin.  With this system, we can cheaply guarantee that privileged objects don't flow to unprivileged actors.
If desired, we can add a notion of "privilege" to the message type system.  For example, chrome is more privileged than content, which is more privileged than plugin.  With this system, we can cheaply guarantee that privileged objects don't flow to unprivileged actors.
Line 100: Line 103:


'''TODO''': there are more things we can integrate into the transition grammar, but concrete use cases are necessary.
'''TODO''': there are more things we can integrate into the transition grammar, but concrete use cases are necessary.
== Error Handling ==
Errors include
* Message type errors: bad data sent
* Protocol errors: e.g., wrong message, wrong state
* Transport errors: e.g., nesting asynchronicity within synchronicity
* Security errors: e.g., trying to send privileged info to plugins.  '''TODO''': this is not officially part of the design yet
Error handling is different for parents and children.  Child actors are not trusted, and if the parent detects the least whiff of malfeasance, the parent will invoke <code>child.KILLITWITHFIRE()</code>.  Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killed.
If the child detects a parent error, there's not much it can do.  The child will execute <code>self.SEPPUKU()</code>, but we may wish to have it first attempt to notify the parent of the error.
The parent actor must, however, carefully handle and recover from errors.
'''TODO''' more details
== Deadlocks ==
Deadlocks can be caused by bad sequences of synchronous messages (possibly RPC also, but this isn't completely understood yet).  Luckily, it is easy and cheap to detect whether sending a synchronous IPC message will immediately cause deadlock.The algorithm is:
  Actor:  # something that can receive a synchronous message
    Multiset blocked = { };  # Actors blocked on this target
 
    synchronousSend(target, msg):
      blocked = blocked.union(self)
      if target in blocked:
        deadlock!
      transportLayer.send(blocked.serialize(), msg.serialize())
      blocked = blocked.difference(self)
 
    synchronousReceive(msg):
      blocked = blocked.union(msg.blocked)
      # [1] handle message
      transportLayer.reply()
      blocked = blocked.difference(msg.blocked)
Note that the <code>blocked</code> variable needs to be a Multiset, and <code>Multiset.difference()</code> needs to only remove one instance of each element in the differenced set, to make |synchronousReceive()| work correctly.  This should be efficient because the |blocked| multiset shouldn't grow very big, and we should be able to identify an IPC target by perhaps a 4-byte ID.  So serializing |blocked| is cheap.
Doing a dynamic "may-deadlock" analysis, like what we for locks and monitors, is more expensive.  First, we need to keep state that is global across all processes.  Second, when calling |synchronousSend()|, we need to send another asynchronous IPC message to whichever process manages the global state.  But it's doable, and we can choose the granularity at which we detect possible deadlocks:
# Actor x may deadlock with y
# Protocol x may deadlock with y
# ProtocolMessage x may deadlock with y
Each of these represents a compromise of precision with number of deadlocks detected.  Option (1) detects the most deadlocks, but is very imprecise.  Option (3) detects the fewest, but is most precise.  Option (2) is somewhere in the middle.  It may be best to start with option (1), and if there are too many false positives, go down to (2).  And again, if there are too many false positives, we can switch to (3).
Doing a static "may-deadlock" analysis appears to be hard, but tractable (for two processes at least).  Researchers have done similar things in the past, and it is known approximately what is required.  However, we will need some machinery in our static analysis toolkit that currently doesn't exist.


== Implementation ==
== Implementation ==
Line 117: Line 167:


Protocol instances will not be thread safe in the usual multi-threaded, shared-memory sense; they will instead be "thread safe" in the sense of HTML5 Worker Threads: share-nothing.  We should use these protocols to avoid multi-threaded, shared-memory code.
Protocol instances will not be thread safe in the usual multi-threaded, shared-memory sense; they will instead be "thread safe" in the sense of HTML5 Worker Threads: share-nothing.  We should use these protocols to avoid multi-threaded, shared-memory code.
We will do much of our static checking with the [http://spinroot.com/spin/whatispin.html SPIN model checker].  Our PDL compiler will know how to generate SPIN models, and additionally, we can use treehydra to create more complicated models from the Mozilla source itself.  This could be used to check invariants across protocols in addition to within protocols.
Confirmed users
699

edits