From MozillaWiki
< Media‎ | WebRTC
Jump to: navigation, search

Overall Architecture

Note: see Media/WebRTC/WebRTCE10S for the architecture on B2G with E10S


At a high level, there are five major components we need to integrate to build a functional WebRTC stack into Firefox.

  • The MediaStream components that provide generic media support.
  • The contributed code that handles RTP and codecs.
  • The SIPCC signaling stack.
  • The DataChannel management code and the libsctp code that it drives.
  • The transport stack (mtransport) stack which drives DTLS, ICE, etc.

These are managed/integrated by the PeerConnection code which provides the PeerConnection API and maintains all the relevant state.

In addition, there is the GetUserMedia() [GUM] code which handles media acquisition. However, the GUM code has no direct contact with the rest of the WebRTC stack, since the stack itself solely manipulates MediaStreams and does not care how they were created.

Here is an example sequence of events from the caller's perspective:

  1. JS calls create one or more MediaStream objects via the GetUserMedia() API. The GUM code works with the MediaStream code and returns a MediaStream object.
  2. JS calls new PeerConnection() which creates a PeerConnection object. [QUESTION: does this create a CC_Call right here?]
  3. JS calls pc.AddStream() to add a stream to the PeerConnection.
  4. JS calls pc.CreateOffer() to create an offer.
  5. Inside PeerConnection.createOffer(), the following steps happen:
    1. A Create offer request is sent to the CCAPP_Task
    2. An appropriate number of WebRTC streams are set up to match the number of streams.
    3. Some number of mtransports are set up (to match the appropriate number of streams) [OPEN QUESTION: is this done by PeerConnection or inside SIPCC?.
  6. Asynchronously, SIPCC creates the SDP and it gets passed up to the PeerConnection.
  7. The PeerConnection forwards the SDP response to the DOM which fires the JS createOffer callback.
  8. The JS forwards the offer and then calls pc.SetLocalDescription(). This causes:
    1. Attachment of the mtransport to the streams via ExternalRenderer/ExternalTransport
  9. When the remote SDP is received, the JS calls pc.SetRemoteDescription() which forwards to the CCAPP_Task in the same manner as createOffer() and setLocalDescription(). This causes:
    1. Forwarding of the ICE candidates to mtransport. At the time when the first candidates are received, mtransport can start ICE negotiation.
  10. Once ICE completes, DTLS negotiation starts
  11. Once DTLS negotiation completes, media can flow. [QUESTION: Should we hold off on attaching the mtransport to until this is ready?]

The big questions for me have to do with object lifetime and exactly how to plumb ICE and DTLS into the SDP system.

PeerConnection vs. CC_Call

The PeerConnection is 1:1 with CC_Call, so that when you do CreateOffer, it's effectively kicking off the offer process on an existing call. Then we can reuse the SIPCC state machine to some extent to manage the call state. Subsequent calls to the other JSEP API's such as setRemoteDescription and localDescription will run on the same call and use the same state machine. There is a global singleton PeerConnectionCtx which handles callbacks/notifications from SIPCC.

Mtransport Interface

The mtransport (ICE, DTLS) subsystem is pretty independent of SIPCC. Roughly speaking, they are wired up as follows:

  • The PeerConnection creates ICE objects (NrIceCtx) as soon as it starts up. It creates as many as the number of m-lines we expect to need.
  • When SIPCC (lsm) determines that a new media flow is required it stands up a MediaPipeline (containing the MediaConduit [codecs], SRTP contexts, and a TransportFlow (DTLS, ICE, etc.)

Note that each MediaPipeline is one-way, so a two-way audio flow has two media pipelines. However, since you are doing symmetric RTP, you likely have two MediaPipelines for each TransportFlow, though there may be two Transportflows for each MediaPipeline if RTCP is not muxed.

Internal vs 3rd party code


Colors in the diagram:

  • White: Mozilla's own code
  • Orange: 3rd party code
  • Green: code shared with Google Chrome

List of Components

The system has the following individual components, in no particular order

  • PeerConnection
    • PeerConnection.js -- shim translation layer to let us do API adaptation to the C++
    • PeerConnectionImpl -- C++ implementation of the PeerConnection interface.
    • SIPCC -- handles SDP and media negotiation. Provided by Cisco but not a downstream.
  • Media
    • -- handles media encoding and decoding. Downstream from Google.
    • MediaConduit -- Generic wrapper around
    • MediaPipeline -- Wrapper to hold the MediaConduit, mtransport subsystem, and the SRTP contexts, as well as interface with MediaStreams.
  • Transport
    • mtransport -- generic transport subsystem with implementations for ICE, DTLS, etc.
    • NSS -- new DTLS stack. Mentioned because we need to land the new version of NSS
    • nICEr -- ICE stack; downstream from reSIProcate project
    • nrappkit --portable runtime, utility library; downstream from
  • DataChannel
    • DataChannel implementation in the DOM
    • libsctp -- SCTP implementation; downstream from the BSD SCTP guys

Thread Diagram

The system operates in a number of different threads. The following diagrams show the thread interactions for some common interactions between the threads.

Overview of threading models and inter-thread communication

At a high-level we need to deal with two kinds of threads:

  • Standard Firefox threads (nsThreads) [1]
  • Non-Firefox threads generated internally to SIPCC (these are a thin wrapper around OS threads).

Unfortunately, these threads have rather different operating and dispatch models, as described below.


The primary way to communicate between nsThreads is to use the Dispatch() method of the thread. This pushes an event into an event queue for the thread. For instance (

 nsCOMPtr<nsIRunnable> r = new PiCalculateTask(callback, digits);


SIPCC threads communicate via an explicit message passing system based on Unix domain sockets. The actual message passes are wrapped in subroutine calls. E.g.,

   if (cprSendMessage(sip_msgq /*sip.msgQueue */ , (cprBuffer_t)msg, (void **)&syshdr)
       == CPR_FAILURE) {
       return CPR_FAILURE;

Internally, cprSendMessage() is a write to a unix domain socket, and the receiving thread needs to loop around cprGetMessage(), as in ccapp_task.c:

   * CCApp Provider main routine.
   * @param   arg - CCApp msg queue
   * @return  void
   * @pre     None
  void CCApp_task(void * arg)
      static const char fname[] = "CCApp_task";
      phn_syshdr_t   *syshdr = NULL;
      appListener *listener = NULL;
      void * msg;
      //initialize the listener list
      while (1) {
          msg = cprGetMessage(ccapp_msgq, TRUE, (void **) &syshdr);
          if ( msg) {
              CCAPP_DEBUG(DEB_F_PREFIX"Received Cmd[%d] for app[%d]\n", DEB_F_PREFIX_ARGS(SIP_CC_PROV, fname),
                      syshdr->Cmd, syshdr->Usr.UsrInfo);
              listener = getCcappListener(syshdr->Usr.UsrInfo);
              if (listener != NULL) {
                  (* ((appListener)(listener)))(msg, syshdr->Cmd);
              } else {
                  CCAPP_DEBUG(DEB_F_PREFIX"Event[%d] doesn't have a dedicated listener.\n", DEB_F_PREFIX_ARGS(SIP_CC_PROV, fname),

Interoperating between SIPCC Threads and nsThreads

The right idiom here is that when we write to a thread using it's idiom. This means that when we have a pair of threads of different types, each needs to know about the other's idiom, but neither needs to change it's master event loop. The idea is shown below:

Note that this makes life kind of unpleasant when we want to write to an nsThread from C code in SIPCC because nsThread communication is by instantiating a nsIRunnable() class which is foreign linguistically to C. However, looking at the existing code it appears that we can isolate these calls to C++ code or make bridges to make those available.

In the diagrams below, CPR-style messages are indicated with IPC() and nsThread-style messages are indicated with Dispatch()

Important threads

There are a number of threads that do the heavy lifting here.

  • DOM Thread: the thread where calls from the JS end up executing. Anything running here blocks the DOM and the JS engine (existing Firefox thread)
  • SocketTRansportService (STS): the thread with the main networking event loop (existing Firefox thread)
  • MediaStream: where media from devices is delivered. (same as DOM thread?) (existing Firefox thread)
  • PeerConnection Thread: the thread where the PeerConnection operates (new thread; QUESTION: how many are there? one total or one per PC?)
  • CCAPP_Task: the outermost of the SIPCC threads, running in ccapp_task.c:CCApp_task() This is the API for SIPCC
  • GSMTask: The internal SIPCC state machine, running in gsm.c::GSMTask() This is where most SIPCC\SDP processing and state management occurs.
  • Media Input Thread: the thread used by WebRTC for incoming media decoding (new thread)
  • Media Output Thread: the thread used by WebRTC for outgoing media encoding (new thread; may be the same as Media Input thread).

A note about this taxonomy:

  1. In general, expensive operations prompted by (DOM, STS, MediaStream) need to be done on other threads, so your event handlers should just Dispatch to some other thread which does the work. This is particularly true for media encoding and decoding.

Signaling System: CreateOffer

[TODO: EKR. Parallel forking and cloning.]

The sequence for CreateOffer is shown in the link above.

As described above, most of the heavy lifting for the API happens off the DOM thread. Thus, when the JS invokes CreateOffer(), this turns into a Dispatch to the PeerConnection thread which (after some activity of its own) invokes SIPCC's FEATURE_CREATE_OFFER via a SIPCC IPC message to CCAPP_Task, which using the GSMTask thread constructs the SDP based on local streams added using AddStream. The appropriate transport flows (mtransport) are then constructed. (Note thast they may already exist if some have been pre-created). The SDP offer is then Dispatched back all the way to the DOM thread which eventually calls the JS CreateOffer callback. [Question: is this actually not Dispatch but some fancier call.]

In the meantime, the ICE gathering process has been running on the STS thread. As each candidate is gathered, the STS thread does an IPC call to the GSMTask thread to return the ICE candidate. These candidates are then Dispatched back to the CCAPP_Task thread and then to the PC thread and then eventually to the DOM thread where the ICE candidate events fire.

At this point it may be necessary to create some streams in order to determine what the acceptable media parameters are for the SDP. For instance, if that is required to determine whether a given codec is supported. [TODO: Enda?] However, they would not be hooked up to anything at this point.

Signaling System: SetLocal(Caller)

Once the caller has generated the offer the JS can call SetLocalDescription(). This just bubbles its way down to the GSMTask which verifies that the SDP is correct (currently this means matches the outstanding SDP) and if so partially assembles the appropriate media streams and transport objects. Note that we cannot actually plug them together until we have the DTLS fingerprint from the other side (or we will need some clever UI). See the SetRemote(Caller) section below.

Signaling System: SetRemote(Callee)

The callee receives an offer through its signaling mechanism and calls the JS API setRemoteDescription passing in the SDP. Internally in the GSMTask the offer SDP is parsed, the remote media streams are constructed and events are fired back to the DOM Thread signaling the addition of remote media streams. If there are ICE candidates in the offer SDP they are dispatched to the STS thread for ICE processing.

[TODO: the question of what WebRTC remote streams can be constructed here is tricky. You need to construct some sort of representation in order to tell the generate onaddstream events, but you don't know what the codecs will be until setLocal is called. Enda, you will need to figure this out.]

Signaling System: CreateAnswer(Callee)

The JS calls CreateAnswer passing in the offer SDP to create the answer SDP. This calls down to the GSMTask thread passing in the offer SDP. The local SDP wil now get created, for this to occur local streams must have been added by the JS API AddStream. If the local SDP is successfully created then the offer SDP will be negotiated, the output of this will be the answer SDP. To create the SDP flows will need to be created and queried. If there are ICE candidates to be added to this answer SDP they will get added. This answer SDP will then get dispatched back to the DOM thread.

Signaling System: SetLocal(Callee)

Invoked by the JS SetLocalDescription must be called after CreateAnswer and takes as input the answer SDP which is passed down to the GSMTask thread. Here the SDP is compared to the insernal answer SDP, if there is a difference an error is reported back to the DOM Thread. Following this if ICE completes processing successfuly then the media can start sending and receiving on the flows that were created in CreateAnswer.

Signaling System: SetRemote(Caller)

The above diagram shows the caller's SetRemote sequence.

The process starts with receiving the remote description and the JS calling SetRemoteDescription(). This is Dispatched() onto the PC thread and then to CCAPP_Task API thread which passes it to the Internal SIPCC GSMTask thread for processing. Assuming everything is OK, it parses the SDP and creates the remote media stream which are bubbled up to the DOM Thread. The ICE candidated in the answer SDP are parsed and passed to the STS thread. Next the answer SDP is negotiated with the internal local SDP used in the earlier offer. At this point a Connected event is returned to CCAPP_Task thread and then to the PC thread where it gets translated to a JSEP ReadyState of Active. Once ICE and DTLS handshaking are complete, a message is sent up to the GSMTask, which can then start sending and receiving media on the working mtransport that was initilized in CreateOffer earlier.

Signaling System: localDescription

Here are two options for implementing this API.

(1). This like the other JSEP API's will be a function on the call object. It will send feature CC_FEATURE_LOCALDESC to the CCAPP thread which will send the message CC_MSG_SETLOCALDESC to GSMTask. The event handler in GSM will return the local SDP string stored with that call's DCB (Dial Control Block). AddStream and CreateOffer or CreateAnswer will need to have been called for this to work successfully. This approach is asynchronous so the SDP will need to be returned similarly to how it is returned for the other JSEP API's, via the UI interface and back through the CCAPP thread.

(2). Another approach would be to not use the the SIPCC APIs and to store the SDP returned from CreateOffer or CreateAnswer in a string in the PeerConnection Interface backend code. This would allow this API to be asynchronous and highly performant. One downside would be that if AddStream or RemoveStream is called the SDP we now store in the PC would not get dynamically updated.

Signaling System: remoteDescription

Much the same applies to this API as localDescription and also the two possible approaches to implementing it.

remoteDesctiption would require setRemoteDescription to be called first, that is how the remote SDP is set into the PeerConnection.

Signaling System: AddStream

There are probably many ways to implement the AddStream API but I present this one for discussion.

AddStream would be an API on the PeerConnection backend interface that takes as a parameter a MediaStream pointer.

When called the MediaStreams are stored in a container in the PeerConnection backend. When CreateOffer or CreateAnswer are called to generate the local SDP then the GSMTask thread that is generating the SDP will interrogate the MediaStream container and assemble the media lines based on information it reads from the media streams.

Thread diagram for transport subsystem The above diagram shows the threads for media transport as well as ICE/DTLS.

As shown above, the CCAPP thread sets up the mtransport stack which runs on the STS thread. ICE and DTLS run on their own on the STS thread and eventually the mtransport becomes ready to read and write. The CCAPP thread is then notified and it notifies the Media In and Media Out threads that they can start playout in each direction. Prior to this, Media In was passive and Media Out either wasn't registered for playout or discarded NotifyOutput.

  • Incoming media originates on the STS thread but is then dispatched to the Media In thread for decoding and playout.
  • Outgoing media originates on the Media Stream thread but is then dispatched to the Media Out thread for encoding and transmission.
Note that real care will need to be taken to make sure that the lifetime of objects shared across threads is right. We should be using nsRefPtr with thread-safe ref count objects to assist this process.