Necko: Electrolysis design and subprojects

From MozillaWiki
Jump to: navigation, search

This is a page dedicated to the design issues involved in making necko work under electrolysis (i.e. moving all network traffic to the chrome process, and using IPDL to communicate with the content process(es)).

Protocols that need work for e10s

In the long term, we do not want to allow any network connections from child processes for security reasons (we may enforce this via operating system mechanisms). In the short term, we are more concerned with memory footprint and correctness. HTTP requires both caching and NSS, which are both memory-heavy, and it also uses cookies and auth, which require a single central database. So HTTP will need to be centralized in chrome immediately. FTP (and gopher, if it's still supported) will initially continue to run unmodified in child processes as an interim step. Eventually they too will need to do network traffic solely in chrome. Web sockets will also need to be modified to run only in chrome.

Eventually we may sandbox all file access, and thus other protocols (about://, file://, jar://) will need some work. But this is for "Stage 2".

High-level implementation description

Most of necko's logic is going to happen in the chrome process.

  • Content process will open Channels, but these channels will essentially be stubs (HttpChannelChild) that are connected to a real Channel in the chrome process via IPDL.
  • When an HTTP request is made, for instance, all the nitty-gritty of socket creation, authentication, SSL, cookies, and caching will be made in the chrome process
  • The content process will then be notified with the usual OnStartRequest, OnDataAvailable, and OnStopRequest notifications.

So, the most obvious place for inter-process communication will be AsyncOpen (from content->chrome, to queue the network request) and the various "On..." events back from chrome to the content process. There will also be IPDL traffic for notifications, cancellations, and probably other things. For HTTP, this traffic is handled by the netwerk/protocol/http/src/HttpChannel{Parent|Child} classes.

Design/architecture issues

Types of network requests under electrolysis

The most common case of necko usage under electrolysis will be network requests that originate in a child (tab) process, and are serviced by the parent. But it is important to realize that there are actually three logically different types of network request:

  1. Traditional, non-e10s single process mode.
    • e10s will be disabled for firefox on certain platforms, and may not be used at all in Thunderbird or other necko applications. In such cases MOZ_IPC will not be #defined, and necko should work exactly as it used to.
      • So any code dealing with IPC must be guarded by #if MOZ_IPC.
  2. Requests from within the parent.
    • While most necko requests will be initiated from child (tab) processes, there will still be some (update checks, safebrowsing) that originate within the parent.
      • Note that some IPC may still be performed for these requests (an observer of the request may still live in a child process).
      • Otherwise, these requests should work essentially the same as in "traditional" mode.
  3. Request from a child process
    • The basic design for requests from child processes is that they should "wrap" a regular request on the parent, using IPDL. In other words, we hope to keep the existing nsHttpChannel, nsHttpHandler, etc. code as unmodified as possible; this code should not need to know whether it is servicing a child or a parent request. The logic for supporting a child request should be handled within the IPDL protocol classes instead whenever possible (for instance, HttpChannelParent).
      • Some of the existing necko architecture will exist in both the parent/child, and will need to know at some times where they are running; for instance, nsHttpHandler needs to know whether to hand out an nsHttpChannel (if it's in the parent) or an HttpChannelChild (if it's in the child). For this there is a IsNeckoChild() function, which returns true if called within the child process. For certain optimizations (ex: consolidating OnDataAvail and OnStatus/OnProgress into one IPDL message) it is also useful for the chrome nsHttpChannel to know if it is servicing a remote child request: this is true if "mRemoteChannel" is set.

Sync and async IPC messages

IPDL traffic between the parent/child channels should generally be asynchronous. This is both for performance reasons, and to avoid tricky issues that can come up with sync messaging. In particular, the child must avoid doing sync requests to the parent to get channel state information, as the parent channel may have diverged in state. (See http://tinyurl.com/yaa9p7s for some discussion of this). The child channel must instead cache all the state info that is needed to answer any queries from necko client code.

Validating data

Lucas suggested that IPDL parent actors should take measures to validate data passed to them by children, in case the latter have been compromised (ex: validating that strings are actually UTF-16). In necko's case, that should mainly be the URI and other data for the initial request, but keep in mind for other IPC exchanges, too.

Building, running, and testing necko e10s

Building

The code lives in the main e10s repo, so follow the regular e10s instructions at Content_Processes#Building_.26_debugging to build.

Running

The necko e10s code is currently turned on by default in the e10s tree. You can also revert to our starting point mode, where each child process gets its own full necko stack (more feature-complete, but takes up more memory, and separate processes will not have a synchronized cookie database), by setting NECKO_SEPARATE_STACKS in your environment). This is mainly useful only if you're trying to test something other than necko, but the e10s/necko code doesn't yet implement some feature that you need.

Depending on what you're trying to do, you'll either be running xpcshell tests or the remote tab demo (a.k.a. test-ipc.xul). Running xpcshell tests is quite lightweight (text-based, and fewer XPCOM things to load, so fast to run on a remote machine over ssh, etc.), while test-ipc.xul fires up a version of the whole browser. Develop with xpcshell when you can (unfortunately some important features--like load groups, notifications, etc.--are not tested by xpcshell, so you currently need to run test-ipc.xul).

Testing and Logging

Testing is currently mainly done in the form of xpcshell tests (see netwerk/test/unit_ipc; You'll need the patches for bug 521922 and (if you want to run the full set of existing necko tests) bug 526335. The xpcshell framework is set up to run first on chrome (the tests in 'netwerk/test/unit') and then the child ('netwerk/test/unit_ipc'), since we need both codepaths to work). If you are not interested in the chrome tests (which should all pass, unless you've f*cked something up with chrome-side necko), you can turn them off in /netwerk/test/Makefile.in by setting XPCSHELL_TESTS= unit_ipc (i.e. changing the "+=" to "="), after which "make xpcshell-tests" will run only the child-side tests.

NOTE: the xpcshell tests currently by default do NOT run the e10s version of necko; instead they run the "each process gets the full necko stack" version. This is both because we are still making sure that the xpcshell tests all work correctly on the child-side with necko unmodified (bug 526335) before we try to test them against the experimental code, and because it is quite certain that many of the existing tests will fail if we try to run them against our buggy, incomplete e10s necko code. You will want to turn on the e10s code if you're trying to test it. You may also want HTTP logging to work :), which is currently broken. To turn on e10s HTTP for xpcshell and fix HTTP logging, apply my workaround patch for bug 534764. I do not yet have a fix for HTTP logging for test-ipc.xul.

Bugzilla tracking

The necko e10s work is currently divided into two broad stages: features that are needed for the next release of fennec (https://bugzilla.mozilla.org/show_bug.cgi?id=516730), and later work which will be needed for e10s firefox (https://bugzilla.mozilla.org/show_bug.cgi?id=535725). All necko e10s work should be grouped under one of these two tracking bugs.

Old notes

The remainder of this page is notes from the original revision of this page. They are now superceded by the various bugs that have been created for them, but I'm keeping them around for reference. Please update the bugs, not these notes (unless you feel like doing both :)

HTTP Headers

https://bugzilla.mozilla.org/show_bug.cgi?id=536279 and https://bugzilla.mozilla.org/show_bug.cgi?id=536283

HTTP headers will need to be parsed first in the chrome process--so things like auth, cache directives, cookies, etc., can be handled. We will also need to provide some or all headers to the content process, via IPDL.

  • bz suggests we "whitelist" the headers and provide only needed headers to the content process.

HTTP redirects

https://bugzilla.mozilla.org/show_bug.cgi?id=513086 and https://bugzilla.mozilla.org/show_bug.cgi?id=536294

The current architecture shifts out the original channel and replaces it with a channel to the new destination.

  • This will probably require IPDL traffic to tell the content process to do something similar on its end.
  • There are also listeners that are notified of redirects, and have the ability to cancel if they don't approve (plugins such as flash use this).
    • There will be listeners that need to notified in both chrome and content processes, so use IPC.
    • But right now these notifications are synchronous: we should change this to async on mozilla-central, and then merge into e10s tree, before trying to implement with IPC.
  • Security concerns
    • Is it possible an observer to rewrite/redirect a navigation rather than just canceling it? If so how would that change be propagated?
    • Is the navigation blocked until all observers return? Otherwise we might run the risk of a race condition.

HTTPS

https://bugzilla.mozilla.org/show_bug.cgi?id=536301

e10s will hopefully only need minor mods for HTTPS.

  • One issue: to determine if Lock Icon needed, content process will need handle to securityInfo, which is currently pointed to by nsHttpChannel. Shouldn't need to actually read security info--see what exactly is needed.


Download Manager

The download manager should live in the chrome process. The chrome process is responsible for displaying "Where do you want to save this file?" and should be responsible for the network transfer and disk access.

Issues arise in this architecture (as I understand it) because currently necko actually primarily resides in content processes.

In the ideal world, necko would live in the chrome process entirely and would proxy requests for content processes. Content processes would "subscribe" to the kinds of content they're able to handle for a given request, and if no handler is subscribed, the download manager should be invoked.

How should this work in the existing incarnation of necko/e10s?

LoadGroups

https://bugzilla.mozilla.org/show_bug.cgi?id=536292

LoadGroups will live in the content process.

  • They need to be able to cancel Channels, which should be fairly trivial IPC.
  • When Channels are added to a LoadGroup notifications must be sent, both to DocShell (in content) and various listeners in chrome. Right now these notifications are SYNC, but we might want to change them to asynchronous to make them play better in the e10s model. But for now bz/bsmedberg suggest we wait.
  • We will need to keep track (via LoadGroups or otherwise) of which channels are "owned" by which content processes, so if one dies, we know to cancel its requests.

Application cache

https://bugzilla.mozilla.org/show_bug.cgi?id=536295

The application cache is fairly recent & is used for offline mode for web apps like gmail. Unlike the regular HTTP cache, which will be invisible effectively to the content process, the app cache will be visible to the content process, which needs among other things to parse HTTP headers to see if data needs to be cached.

  • Presumably cache lives in chrome, and content processes read from it? Not sure.
  • This isn't getting lots of use yet, so it doesn't need to work immediately.
  • Honza Bambas and/or Doug Camp (ex-Mozilla) are the ones to ask about this code.

EventSink listeners

https://bugzilla.mozilla.org/show_bug.cgi?id=536292

(Firebug, for instance). We'll need to propagate these events to both chrome/content processes, as listeners will live on both.

  • ideally we send one IPC msg per event, not one per remote listener.
  • don't need much data (HTTP/Channel status?); might need to send the data along in the IPC msg.
  • These calls are already async, so no API change should be needed.

Suspending channels

https://bugzilla.mozilla.org/show_bug.cgi?id=536321

We suspend Channels from the Download manager, and also when plugins don't consume arriving Channel data quickly enough. Also HTTP auth?

  • We will need to keep around whatever data has arrived but hasn't been delivered, so resume works correctly. Could happen on either chrome/content end.

Http Auth dialog

https://bugzilla.mozilla.org/show_bug.cgi?id=537782

How will chrome know which window to pop Auth tab up in when HTTP auth is needed? Right now the Channel has callbacks which can be used to get originating window. We'll need to make this work with e10s, possibly by including window info in chrome Channel when we create it during AsyncOpen.

File form POSTs

https://bugzilla.mozilla.org/show_bug.cgi?id=536273

bz suggests that we can accomplish this without having the content process read the file by creating a new kind of stream class, in which the content process marks the name of the file and the chrome process actually reads it from disk and uploads it.

Channel "owner"

Channels have Set/GetOwner methods, used to store the principal responsible for the request.

  • Apparently not used much for HTTP?
    • jst: CSP (Content Security Protocol) may use?
  • Not clear if get & set only on content side, or if set only on content, but read by both chrome/content. Find out and propagate as needed
  • bz would like to rework this API anyway

FTP

https://bugzilla.mozilla.org/show_bug.cgi?id=536289

  • Doug Turner knows this code.
  • Proposal from bsmedberg: we keep FTP purely within the chrome process, so that we don't need to make it IPC-savvy.
    • We would only allow files to be downloaded (not rendered within browser) via FTP.
    • browsing of FTP server tree would be done by hiding content process tab with a chrome one when a FTP directory is the target.
    • FTP also supports upload--used by seamonkey and composer--but this happens all in chrome, so no changes needed?

Security Issues

For a general Electrolysis threat model see Security/ProcessIsolation/ThreatModel

Race conditions

  • Could one frame navigate another frame without permission
  • Could one window script into another window without permission
  • Redirects - could a redirect happen in chrome while content is performing a security check

Domain isolation

  • Do we try to restrict cookies to per process / per window
  • Can we actually authenticate a network request from a given content process
  • How do we handle access to file:// and related schemes... trusting content process might be too much (see below)
  • If we do try to isolate, how do we verify the validity of a request (do we need to have a stateful proxy to determine which content is valid to access cross-domain and which isn't)
  • Can we do anything to protect cookies? Password manager? Cache? Local storage?

Do we want to check principals in the chrome process?

(heard on IRC...)

   bz: long term we want to move CheckLoadURI checks into the chrome process and
   not trust any self-reported principals of content processes right?

   bsmedberg: hrm, I'm not sure that's ever going to be feasible.  it's certainly
   not part of releasing anything, or even releasing anything with a sandbox

   bz: it seems like it lets you trivially escape the sandbox....  if not done

   bsmedberg: depends on what the sandbox is for, though.  If it's only to
   prevent viral infections and such, I think you're fine.  And protecting
   against XSS/cookiedata leaks is much harder due to interior iframes,
   document.cookie scripting, loading arbitrary JS/images

   bz is not sure why it's fine: As long as you can ask the chrome process to do
   network requests for you and it trusts your self-reported principal you can
   read arbitrary files and phone home.  Not a viral infection, but much worse
   than just XSS