IPC Protocols: Difference between revisions

← Older edit

IPC Protocols (view source)

Revision as of 20:33, 18 May 2009

5,113 bytes added , 18 May 2009

no edit summary

Cgj

Confirmed users

699

edits

@@ Line 1: / Line 1: @@
 There will be many IPC protocols, and they should all be as error-free as possible by design.  Having separate protocols for separate uses is a good thing, and they all shouldn't be multiplexed by hand over a shared transport layer.  Protocol implementations should only have to check for errors as infrequently as possible; the heavy lifting should be done generically, or by code generated from high-level specifications.  Finally, explicit protocols make automatically generating test cases (of "normal" ''and'' "failure" conditions) much easier.
+This work has the imprimatur of the [[Content Processes|'''Electrolysis''']] project.
 == Terminology ==
@@ Line 30: / Line 32: @@
 == Protocol-definition Language (PDL) ==
-A protocol has a ''server'' and a ''client''.  Loosely speaking for IPC, the ''client'' '''MAY BE''' the process that forks the ''server'' process.  Again loosely speaking, the ''server'' provides capabilities not inherent to the ''client''.
+A protocol is between a ''parent'' actor and a ''child'' actor.  A protocol is specified with respect to the parent; that is, messages the parent is allowed to receive are exactly the messages a child is allowed to sent, and ''vice versa''.
-'''TODO''': this isn't a complete definition.  Forking a plugin process fits it, but a chrome-local addon to content doesn't.
+A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
-A protocol is specified with respect to the server; that is, messages the server is allowed to receive are exactly the messages a client is allowed to sent, and ''vice versa''.
-A protocol consists of declarations of ''messages'' and specifications of ''state machine transitions''.  (This ties us to state-machine semantics.)  The message declarations are essentially type declarations for the transport layer, and the state machine transitions capture the semantics of the protocol itself.
+A protocol is specified by first declaring the protocol
-A protocol is specified by first naming the protocol
+ Protocol :: (\epsilon | 'sync' | 'rpc') 'protocol' ProtocolName '{' Messages Transitions '}'
-  protocol Foo {
+This implies the creation of <code>FooParent</code> and <code>FooChild</code> actors.  (Hereafter, this document speaks from the perspective of the <code>FooParent</code>.)
-This implies the creation of a <code>FooServer</code> and <code>FooClient</code>.  (Hereafter, this document speaks from the perspective of the <code>FooServer</code>.)
+By default, protocols can only exchange asynchronous messages.  A protocol must explicitly allow synchronous and RPC (see below) messages by using the <code>sync</code> or <code>rpc</code> qualifiers.  This is enforced statically by the PDL type system.
-Conceptually (but not necessarily syntactically) next are message definitions.  What underlying types we allow in these, and with what qualifiers, is likely to be a central topic of debate.  Another hot topic will likely be what message semantics we provide; possibilities are
+Conceptually (but not necessarily syntactically) next are message definitions.  Messages definitions are somewhat analogous to function signatures in C/C++.  Messages can have one of three semantics
 * '''asynchronous''': the sending actor does not expect nor listen for a response to the sent message
 * '''synchronous''': the sending actor is completely blocked until it receives a response
 * '''RPC++''': the sending actor is partially blocked until it receives a response to message ''m''.  It is only allowed to process RPC++ messages sent by the actor receiving ''m'', direct resulting from the receiving actor receiving ''m''.  (This is intended to model function call semantics.)
-(From e-mail discussions, it appears that we may want RPC++ messages, excluding synchronous messages (since they're a special case of RPC++ messages).  However, I'm not convinced RPC++ is necessary, and I'm writing the strawman grammar below assuming we only require synchronous messages.)
+Asynchronous messages are the default.  The list above is sorted by decreasing simplicity and efficiency; synchronous and RPC++ messages should not be used without a compelling reason.
 === Strawman message grammar ===
-   Message :: (SyncMessage | AsyncMessage) ';'
+   Message :: (\epsilon | 'sync' | 'rpc') MessageBody ';'
-  SyncMessage :: 'sync' ('in' | 'out') Type MessageName '(' MessageArguments ')'
+   MessageBody :: Type MessageName '(' MessageArguments ')'
-   AsyncMessage :: 'async' ('in' | 'out') MessageName '(' MessageArguments ')'
-   MessageArguments :: (MessageArgument ',' | \epsilon)*
+   MessageArguments :: (MessageArgument ',')* MessageArgument?
    MessageArgument :: Type Identifier
-   Type :: SharingQualifier BasicType
+   Type :: SharingQualifier BasicType | ActorHandle
-   SharingQualifier :: ('share' | 'transfer' | \epsilon)
+   SharingQualifier :: (\epsilon | 'transfer' | 'share')
-   BasicType :: ('void' | 'int' | ... ???)
+   BasicType :: BuiltinType | ImportedType
+  ActorHandle :: ('parent' | 'child') 'actor' ProtocolName (':' StateName)?
 A few items are worth calling out.
-* As mentioned above, will SyncMessage be sufficient for us?
+SharingQualifiers define transport semantics for objects sent in a message.  By default, objects are sent "by value" (i.e., marshalled then unmarshalled).  How we classes are marshalled is not a concern of the protocol layer, but very important nonetheless.  This is likely to be another security concern.  But large objects can also be transported through shared memory.
-* SyncMessages have a return type, whereas AsyncMessages don't.
-* SharingQualifiers are a discussion unto themselves.
-'''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the client and server.  If the receiving actor does not already co-own ''o'', it does after receiving the message.  A lower layer needs to enforce that this is implemented correctly:
+The qualifier '''share''' means that the object ''o'' named lives in shared memory, and is co-owned by the parent and child actors.  If the receiving actor does not already co-own ''o'', it does after receiving the message.  A lower layer needs to enforce that this is implemented correctly
 # ''o'' lives in shared memory
 # all objects reachable from ''o'' live in shared memory
 # all accesses to members of ''o'' are synchronized across the client and server
-'''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver.  This means that requirement (3) above is removed for '''transfer''' types.
+'''transfer''' means that the sending actor owns ''o'', and when the receiving actor receives ''o'', ownership transfers from the sender to the receiver.  This means that requirement (3) above is removed for '''transfer''' types.  This is the preferred sharing semantics; '''share''' probably won't be implemented initially.
+'''NOTE''': '''share''' and '''transfer''' are optimizations.  These don't need to be included in the initial language implementation, but are worth keeping in mind.
-No SharingQualifier (\epsilon) means that that object sent is serialized.  How we classes are serialized is probably not a concern of the protocol layer, but very important nonetheless.  This is likely to be another security concern.
+A BasicType is a C++ type that can be transferred in a message.  We will provide a set of BuiltinTypes like void, int, and so on.  Protocol writers can also ''import'' foreign types for which marshall/unmarshall traits are defined, and/or that can be allocated in shared memory.  (We may need to support union types for NPAPI.)
-'''NOTE''': '''share''' and '''transfer''' are optimizations.  These don't need to be included in the initial language implementation, but are worth keeping in mind.
+An ActorHandle is a handle to an already-created protocol actor; this can be thought of by analogy to <code>sockaddr_t</code>; it's a way to refer to a created actor.  The actor is either a ''parent'' or ''child'', and may be required to be in a particular state.  '''TODO''': the semantics of this still aren't 100% clear to me.  I'll need to work through a concrete example to figure them out.
-'''NOTE''': what <code>BasicType</code> means should be a fruitful topic for discussion
+If desired, we can add a notion of "privilege" to the message type system.  For example, chrome is more privileged than content, which is more privileged than plugin.  With this system, we can cheaply guarantee that privileged objects don't flow to unprivileged actors.
 === Strawman transition grammar ===
    Transition :: 'state' StateName '{' Actions '}'
-   Actions :: (Action ';' | \epsilon)
+   Actions :: (Action ';')* Action?
+  Action :: MessageAction | RPCAction
-   Action :: MessageName ('!' | '?') '->' StateName
+   MessageAction :: ('send' | 'rcv') MessageName 'goto' StateName
+  RPCAction :: ('call' | 'answer') MessageName ('push' StateName)?
+'''TODO''': the above grammar may lead to unnecessarily verbose specifications, since there's only one "action" permitted per state transition.  We can add additional syntax to reduce verbosity if it becomes a problem.
+A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives ("calls" or "answers" for RPC) one of a set of allowed messages.  The syntax <code>send MessageName</code> means "send MessageName", and <code>rcv MessageName</code> means "receive MessageName".  After the action, the actor then transitions into another state.  For RPC, an action causes the current state to be pushed on a stack, then the "push STATE" to be transitioned into.
+Unfortunately, the syntax for async/sync messages and RPC calls diverge because the semantics are so different.  Sync/async messages only model message passing, whereas RPC models function calls.  After a message-passing action occurs, the actor only makes a state transition (<code>goto STATE</code>).  However, an RPC action pushes a new state onto an "RPC stack" (<code>push STATE</code>).  When the RPC call returns, the "pushed" state is "popped."
+'''TODO''': this may be confusing.  Any ideas for simplifying it?
+We can check almost any property of the protocol specification itself statically, since it's a state machine + well-defined stack.  What all of these static invariants should be is not yet known; one invariant is that an asynchronous message can't be nested within a synchronous one.  From this static specification, we can generate a C++ dynamic checker to ensure that message processors (code utilizing actors) adhere to the protocol spec.  We may be able to check some of this statically as well, but it's harder.
+'''TODO''': there are more things we can integrate into the transition grammar, but concrete use cases are necessary.
+== Error Handling ==
+Errors include
+* Message type errors: bad data sent
+* Protocol errors: e.g., wrong message, wrong state
+* Transport errors: e.g., nesting asynchronicity within synchronicity
+* Security errors: e.g., trying to send privileged info to plugins.  '''TODO''': this is not officially part of the design yet
+Error handling is different for parents and children.  Child actors are not trusted, and if the parent detects the least whiff of malfeasance, the parent will invoke <code>child.KILLITWITHFIRE()</code>.  Code utilizing the child actor therefore will never see type or protocol errors; if they occur, the child is killed.
+If the child detects a parent error, there's not much it can do.  The child will execute <code>self.SEPPUKU()</code>, but we may wish to have it first attempt to notify the parent of the error.
+The parent actor must, however, carefully handle and recover from errors.
+'''TODO''' more details
+== Deadlocks ==
-This is a dirt-simple grammar but should capture all we need in a first pass.  A transition starts from a particular state (the lexically first state is the start state), and then either sends or receives one of a set of allowed messages.  The syntax <code>MessageName !</code> means "send MessageName", and <code>MessageName ?</code> means "receive MessageName".  The actor then transitions into another state.
+Deadlocks can be caused by bad sequences of synchronous messages (possibly RPC also, but this isn't completely understood yet).  Luckily, it is easy and cheap to detect whether sending a synchronous IPC message will immediately cause deadlock.The algorithm is:
-From a particular state, an actor can either ''only receive'' or ''only send'' messages (we could relax this, but it complicates the implementation).  This is extremely easy to check statically (we could make it part of the grammar, too).
+  Actor:  # something that can receive a synchronous message
+    Multiset blocked = { };  # Actors blocked on this target
+    synchronousSend(target, msg):
+      blocked = blocked.union(self)
+      if target in blocked:
+        deadlock!
+      transportLayer.send(blocked.serialize(), msg.serialize())
+      blocked = blocked.difference(self)
+    synchronousReceive(msg):
+      blocked = blocked.union(msg.blocked)
+      # [1] handle message
+      transportLayer.reply()
+      blocked = blocked.difference(msg.blocked)
-Transitions only happen when the underlying message operation was "completed."  For messages sent asynchronously, this means sent over the wire (resp., received).  For messages sent synchronously, this means sent over the wire ''and'' replied to by the other side (resp., received and reply sent).
+Note that the <code>blocked</code> variable needs to be a Multiset, and <code>Multiset.difference()</code> needs to only remove one instance of each element in the differenced set, to make |synchronousReceive()| work correctly.  This should be efficient because the |blocked| multiset shouldn't grow very big, and we should be able to identify an IPC target by perhaps a 4-byte ID.  So serializing |blocked| is cheap.
-We can support sending/receiving multiple messages per transition.  As this complicates the implementation, it's probably best to add that only when necessary.
+Doing a dynamic "may-deadlock" analysis, like what we for locks and monitors, is more expensive.  First, we need to keep state that is global across all processes.  Second, when calling |synchronousSend()|, we need to send another asynchronous IPC message to whichever process manages the global state.  But it's doable, and we can choose the granularity at which we detect possible deadlocks:
+# Actor x may deadlock with y
+# Protocol x may deadlock with y
+# ProtocolMessage x may deadlock with y
+Each of these represents a compromise of precision with number of deadlocks detected.  Option (1) detects the most deadlocks, but is very imprecise.  Option (3) detects the fewest, but is most precise.  Option (2) is somewhere in the middle.  It may be best to start with option (1), and if there are too many false positives, go down to (2).  And again, if there are too many false positives, we can switch to (3).
-'''TODO''': there are many things we can integrate here, but concrete use cases are necessary.  This should be a main point of discussion.
+Doing a static "may-deadlock" analysis appears to be hard, but tractable (for two processes at least).  Researchers have done similar things in the past, and it is known approximately what is required.  However, we will need some machinery in our static analysis toolkit that currently doesn't exist.
 == Implementation ==
@@ Line 104: / Line 155: @@
 The transport layer should hopefully be mainly chromium IPC code.  We will likely need to extend this to support objects sent through shared memory, if that feature is desired.
-The PDL compiler will take as input a FooProtocol.pdl source and generate FooProtocolServer.h and FooProtocolClient.h C++ headers.  These headers will declare message types for the transport layer, declare protocol states, and define functions that take care of state transitions and invoke message handlers.  The headers will leave undefined the ''implementations'' of message handlers; these are up to mozilla code to provide.  (Perhaps through subclassing.)
+The PDL compiler will take as input a FooProtocol.pdl source and generate FooProtocolParent.h and FooProtocolChild.h C++ headers.  These headers will declare message types for the transport layer, declare protocol states, and define functions that take care of state transitions and invoke message handlers.  The headers will leave undefined the ''implementations'' of message handlers; these are provided by code utilizing the protocol actors.  (Perhaps through subclassing.)
-Message handlers will be responsible for processing the received message (obviously) and also deciding which of the allowed states is transitioned to after processing.  The generated protocol header will enforce protocol safety here.  Lower-level enforcement of transport-layer properties, like not nesting asynchronous calls within synchronous handlers, can be enforced in a shared ProtocolBase class.  '''TODO''': should these kinds of properties be global, or defined in protocol specs?  What are all these properties?  Deadlock avoidance is another, if we provide synchronous messages.
-Protocol safety (only sending/receiving allowed messages from particular states) will be enforced by generating a finite-state machine in the protocol header, and checking sent/received messages against this state machine.
+Message handlers will be responsible for processing the received message (obviously) and also deciding which of the allowed states is transitioned to after processing.  The generated protocol header will enforce protocol safety here.  Lower-level enforcement of transport-layer properties can be enforced in a shared ProtocolBase class.  Hopefully much of this we can check statically.  Some of these properties are
+* not nesting asynchronous calls within synchronous handlers
+* avoid deadlock
+* don't fill the underlying pipe (rate limit)
-Very importantly, there will be an implicitly-defined ERROR state that occurs when an actor violates the protocol (or a type safety constraint).  How this is handled is very much open to discussion.
+Protocol safety (only sending/receiving allowed messages from particular states) will be enforced by generating a finite-state machine in the protocol header, and checking sent/received messages and transitions against this state machine.
-'''NOTE''': I think it's worthwhile leaving error detection on even in optimized builds.  It should be quite cheap compared to IPC overhead and will likely allow us to avoid security-related bugs.
+An ''actor'' will just be an instance of a protocol server or client class.  We need some way of addressing particular actors across threads and processes.  Hopefully the chromium IPC code already supports this.  Also, we can easily support process-to-process and thread-to-thread communication over these protocols.  The latter would be very useful for debugging, and also useful for writing threaded code that avoids shared memory as much as possible (even when we have multiple processes we will still have multiple threads).  Due to its apparent difficulty, this is a low-priority TODO.
-An ''actor'' will just be an instance of a protocol server or client class.  We need some way of addressing particular actors across threads and processes.  Hopefully the chromium IPC code already supports this.  Also, we can easily support process-to-process and thread-to-thread communication over these protocols.  The latter would be very useful for debugging, and also useful for writing threaded code that avoids shared memory as much as possible (even when we have multiple processes we will still have multiple threads).
+Protocol instances will not be thread safe in the usual multi-threaded, shared-memory sense; they will instead be "thread safe" in the sense of HTML5 Worker Threads: share-nothing.  We should use these protocols to avoid multi-threaded, shared-memory code.
-I '''do not''' think that protocol instances should be thread safe (in the usual shared-memory sense).  We should use these protocols to avoid this kind of threaded code, whether in-process or out-of-process.
+We will do much of our static checking with the [http://spinroot.com/spin/whatispin.html SPIN model checker].  Our PDL compiler will know how to generate SPIN models, and additionally, we can use treehydra to create more complicated models from the Mozilla source itself.  This could be used to check invariants across protocols in addition to within protocols.