Gecko:Overview

From MozillaWiki
Jump to: navigation, search

This document attempts to give an overview of the different parts of Gecko, what they do, and why they do it, say where the code for them is within the repository, and link to more specific documentation (when available) covering the details of that code. It is not yet complete. Maintainers of these areas of code should correct errors, add information, and add links to more detailed documentation (since this document is intended to remain an overview, not complete documentation).

Browsers, Frames, and Document Navigation

Docshell

The user of a Web browser can change the page shown in that browser in many ways: by clicking a link, loading a new URL, using the forward and back buttons, or other ways. This can happen inside a space we'll call a browsing context; this space can be a browser window, a tab, or a frame or iframe within a document. The toplevel data structures within Gecko represent this browsing context; they contain other data structures representing the individual pages displayed inside of it (most importantly, the current one). In terms of implementation, these two types of navigation, the top level of a browser and the frames within it, largely use the same data structures.

In Gecko, the docshell is the toplevel object responsible for managing a single browsing context. It, and the associated session history code, manage the navigation between pages inside of a docshell. (Note the difference between session history, which is a sequence of pages in a single browser session, used for recording information for back and forward navigation, and global history, which is the history of pages visited and associated times, regardless of browser session, used for things like link coloring and address autocompletion.)

There are relatively few objects in Gecko that are associated with a docshell rather than being associated with a particular one of the pages inside of it. Most such objects are attached to the docshell. An important object associated with the docshell is the nsGlobalWindowOuter which is what the HTML5 spec refers to as a WindowProxy (into which Window objects, as implemented by nsGlobalWindowInner, are loaded). See the DOM section below for more information on this.

The most toplevel object for managing the contents of a particular page being displayed within a docshell is a document viewer (see layout). Other important objects associated with this presentation are the document (see DOM) and the pres(entation) shell and pres(entation) context (see layout).

<xul:browser>, frameloader and docshell in single process configuration

Docshells are organized into a tree. If a docshell has a non-null parent, then it corresponds to a subframe in whatever page is currently loaded in the parent docshell, and the corresponding subframe element (for example, an iframe) is called a browsing context container. In Gecko, a browsing context container would implement nsIFrameLoaderOwner to hold a frameloader, which holds and manages the docshell.

One of the most interfaces docshell implemented is nsIWebNavigation. It defines major functions of a browsing context, such as loadURI / goBack / goForward and reload. In single process configuration of desktop Firefox, <xul:browser> (which represents a tab) operates on docshell through nsIWebNavigation, as shown in the right figure.

  • code: mozilla/docshell/
  • bugzilla: Core::Document Navigation
  • documentation: DocShell:Home Page

Session History

In order to keep the session history of subframes after the root document is unloaded and docshells of subframes are destroyed, only the root docshell of a docshell tree manages the session history (this does not match the conceptual model in the HTML5 spec and may be subject to change).

As an example to explain the structure of session history implementation in Gecko, consider there's a tab A, which loads document 1; in document 1, there are iframe B & C, which loads document 2 & 3, respectively. Later, an user navigates iframe B to document 4. The following figures show the conceptual model of the example, and the corresponding session history structure.

Conceptual model representation; color denotes current visible active documents
Session history structure; color denotes current transaction / entries

In Gecko, a session history object holds a list of transactions (denoted as T1 & T2 in the figure); each transaction points to a tree of entries; each entry records the docshell it associated to, and a URL. Now that if we navigate the tab to a different root document 5, which includes an iframe D with document 6, then it becomes:

Conceptual model representation
Session history structure

Embedding

To be written (and maybe rewritten if we get an IPC embedding API).

<xul:remote-browser>, frameloader and docshell in multiprocess configuration. The horizontal dash-lines represent inter-process communication

Multi-process and IPC

In a multi-process desktop Firefox, a tab is managed by <xul:remote-browser>. It still operates on nsIWebNavigation, but in this case the docshell is in a remote process and can not be accessed directly. The encapsulation of remote nsIWebNavigation is done by the javascript-implemented RemoteWebNavigation.

In this configuration, the frameloader of a root docshell lives in parent process, so it can not access docshell directly either. Instead, it holds a TabParent instance to interact with TabChild in child-process. At C/C++ level, the communication across processes for a single tab is through the PBrowser IPC protocol implemented by TabParent and TabChild, while at the javascript level it's done by message manager (which is ontop of PBrowser). RemoteWebNavigation, for example, sends messages to browser-child.js in content process through message manager.

Networking

The network library Gecko uses is called Necko. Necko APIs are largely organized around three concepts: URI objects, protocol handlers, and channels.

Protocol handlers

A protocol handler is an XPCOM service associated with a particular URI scheme or network protocol. Necko includes protocol handlers for HTTP, FTP, the data: URI scheme, and various others. Extensions can implement protocol handlers of their own.

A protocol handler implements the nsIProtocolHandler API, which serves three primary purposes:

  1. Providing metadata about the protocol (its security characteristics, whether it requires actual network access, what the corresponding URI scheme is, what TCP port the protocol uses by default).
  2. Creating URI objects for the protocol's scheme.
  3. Creating channel objects for the protocol's URI objects

Typically, the built-in I/O service (nsIIOService) is responsible for finding the right protocol handler for URI object creation and channel creation, while a variety of consumers queries protocol metadata. Querying protocol metadata is the recommended way to handle any sort of code that needs to have different behavior for different URI schemes. In particular, unlike whitelists or blacklists, it correctly handles the addition of new protocols.

A service can register itself as a protocol handler by registering for the contract ID "@mozilla.org/network/protocol;1?name=SSSSS" where SSSS is the URI scheme for the protocol (e.g. "http", "ftp", and so forth).

URI objects

URI objects, which implement the nsIURI API, are a way of representing URIs and IRIs. Their main advantage over strings is that they do basic syntax checking and canonicalization on the URI string that they're provided with. They also provide various accessors to extract particular parts of the URI and provide URI equality comparisons. URIs that correspond to hierarchical schemes implement the additional nsIURL interface, which exposes even more accessors for breaking out parts of the URI.

URI objects are typically created by calling the newURI method on the I/O service, or in C++ by calling the NS_NewURI utility function. This makes sure to create the URI object using the right protocol handler, which ensures that the right kind of object is created. Direct creation of URIs via createInstance is reserved for protocol handler implementations.

Channels

Channels are the Necko representation of a single request/response interaction with a server. A channel is created by calling the newChannel method on the I/O service, or in C++ by calling the NS_NewChannel utility function. The channel can then be configured as needed, and finally its asyncOpen method can be called. This method takes an nsIStreamListener as an argument.

If asyncOpen has returned successfully, the channel guarantees that it will asynchronously call the onStartRequest and onStopRequest methods on its stream listener. This will happen even if there are network errors that prevent Necko from actually performing the requests. Such errors will be reported in the channel's status and in the status argument to onStopRequest.

If the channel ends up being able to provide data, it will make one or more onDataAvailable on its listener after calling onStartRequest and before calling onStopRequest. For each call, the listener is responsible for either returning an error or reading the entire data stream passed in to the call.

If an error is returned from either onStartRequest or onDataAvailable, the channel must act as if it has been canceled with the corresponding error code.

A channel has two URI objects associated with it. The originalURI of the channel is the URI that was originally passed to newChannel to create the channel that then had asyncOpen called on it. The URI is the URI from which the channel is reading data. These can be different in various cases involving protocol handlers that forward network access to other protocol handlers, as well as in situations in which a redirect occurs (e.g. following an HTTP 3xx response). In redirect situations, a new channel object will be created, but the originalURI will be propagated from the old channel to the new channel.

Note that the nsIRequest that's passed to onStartRequest must match the one passed to onDataAvailable and onStopRequest, but need not be the original channel that asyncOpen was called on. In particular, when an HTTP redirect happens the request argument to the callbacks will be the post-redirect channel.

TODO: crypto?

Document rendering pipeline

Some of the major components of Gecko can be described as steps on the path from an HTML document coming in from the network to the graphics commands needed to render that document. An HTML document is a serialization of a tree structure. (FIXME: add diagram) The HTML parser and content sink create an in-memory representation of this tree, which we call the DOM tree or content tree. Many JavaScript APIs operate on the content tree. Then, in layout, we create a second tree, the frame tree (or rendering tree) that is a similar shape to the content tree, but where each node in the tree represents a rectangle (except in SVG where they represent other shapes). We then compute the positions of the nodes in the frame tree (called frames) and paint them using our cross-platform graphics APIs (which, underneath, map to platform-specific graphics APIs).

Further documentation:

Parser

The parser's job is to transform a character stream into a tree structure, with the help of the content sink classes.

HTML is parsed using a parser implementing the parsing algorithm in the HTML specification (starting with HTML5). Much of this parser is translated from Java, and changes are made to the Java version. This parser in parser/html/. The parser is driven by the output of the networking layer (see nsHtml5StreamParser::OnDataAvailable). The HTML5 parser is capable of parsing off the main thread which is the normal case. It also parses on the main thread to be able to synchronously handle things such as innerHTML modifications.

The codebase still has the previous generation HTML parser, which is still used for a small number of things, though we hope to be able to remove it entirely soon. This parser is in parser/htmlparser/.

XML is parsed using the expat library (parser/expat/) and code that wraps it (parser/xml/). This is a non-validating parser; however, it loads certain DTDs to support XUL localization.

DOM / Content

The content tree or DOM tree is the central data structure for Web pages. It is a tree structure, initially created from the tree structure expressed in the HTML or XML markup. The nodes in the tree implement major parts of the DOM (Document Object Model) specifications. The nodes themselves are part of a class hierarchy rooted at nsINode; different derived classes are used for things such as text nodes, the document itself, HTML elements, SVG elements, etc., with further subclasses of many of these types (e.g., for specific HTML elements). Many of the APIs available to script running in Web pages are associated with these nodes. The tree structure persists while the Web pages is displayed, since it stores much of state associated with the Web page. The code for these nodes lives in the content/ directory.

The DOM APIs are not threadsafe. DOM nodes can be accessed only from the main thread (also known as the UI thread (user interface thread)) of the application.

There are also many other APIs available to Web pages that are not APIs on the nodes in the DOM tree. Many of these other APIs also live in the same directories, though some live in content/ and some in dom/. These include APIs such as the DOM event model.

The dom/ directory also includes some of the code needed to expose Web APIs to JavaScript (in other words, the glue code between JavaScript and these APIs). For an overview, see DOM API Implementation. See Scripting below for details of the JS engine.

TODO: Internal APIs vs. DOM APIs.

TODO: Mutation observers / document observers.

TODO: Reference counting and cycle collection.

TODO: specification links

Style System

Quantum CSS (Stylo)

Starting with Firefox 57 and later, Gecko makes use of the parallel style system written in Rust that comes from Servo. There's an overview of this with graphics to help explain what's going on. The Servo wiki has some more details.

Gecko

The rest of the style section section describes the Gecko style system used in Firefox 56 and earlier. Some bits may still apply, but it likely needs revising.

In order to display the content, Gecko needs to compute the styles relevant to each DOM node. It does this based on the model described in the CSS specifications: this model applies to style specified in CSS (e.g. by a 'style' element, an 'xml-stylesheet' processing instruction or a 'style' attribute), style specified by presentation attributes, and the default style specified by our own user agent style sheets. There are two major sets of data structures within the style system:

  • first, data structures that represent sources of style data, such as CSS style sheets or data from stylistic HTML attributes
  • second, data structures that represent computed style for a given DOM node.

These sets of data structures are mostly distinct (for example, they store values in different ways).

The loading of CSS style sheets from the network is managed by the CSS loader; they are then tokenized by the CSS scanner and parsed by the CSS parser. Those that are attached to the document also expose APIs to script that are known as the CSS Object Model, or CSSOM.

The style sheets that apply to a document are managed by a class called the style set. The style set interacts with the different types of style sheets (representing CSS style sheets, presentational attributes, and 'style' attributes) through two interfaces: nsIStyleSheet for basic management of style sheets and nsIStyleRuleProcessor for getting the style data out of them. Usually the same object implements both interfaces, except in the most important case, CSS style sheets, where there is a single rule processor for all of the CSS style sheets in each origin (user/UA/author) of the CSS cascade.

The computed style data for an element/frame are exposed to the rest of Gecko through the class mozilla::ComputedStyle (previously called nsStyleContext). Rather than having a member variable for each CSS property, it breaks up the properties into groups of related properties called style structs. These style structs obey the rule that all of the properties in a single struct either inherit by default (what the CSS specifications call "Inherited: yes" in the definition of properties; we call these inherited structs) or all are not inherited by default (we call these reset structs). Separating the properties in this way improves the ability to share the structs between similar ComputedStyle objects and reduce the amount of memory needed to store the style data. The ComputedStyle API exposes a method for getting each struct, so you'll see code like sc->GetStyleText()->mTextAlign for getting the value of the text-align CSS property. (Frames (see the Layout section below) also have the same GetStyle* methods, which just forward the call to the frame's ComputedStyle.)

The ComputedStyles form a tree structure, in a shape somewhat like the content tree (except that we coalesce identical sibling ComputedStyles rather than keeping two of them around; if the parents have been coalesced then this can apply recursively and coalasce cousins, etc.; we do not coalesce parent/child ComputedStyles). The parent of a ComputedStyle has the style data that the ComputedStyle inherits from when CSS inheritance occurs. This means that the parent of the ComputedStyle for a DOM element is generally the ComputedStyle for that DOM element's parent, since that's how CSS says inheritance works.

The process of turning the style sheets into computed style data goes through three main steps, the first two of which closely relate to the nsIStyleRule interface, which represents an immutable source of style data, conceptually representing (and for CSS style rules, directly storing) a set of property:value pairs. (It is similar to the idea of a CSS style rule, except that it is immutable; this immutability allows for significant optimization. When a CSS style rule is changed through script, we create a new style rule.)

The first step of going from style sheets to computed style data is finding the ordered sequence of style rules that apply to an element. The order represents which rules override which other rules: if two rules have a value for the same property, the higher ranking one wins. (Note that there's another difference from CSS style rules: declarations with !important are represented using a separate style rule.) This is done by calling one of the nsIStyleRuleProcessor::RulesMatching methods. The ordered sequence is stored in a trie called the rule tree: the path from the root of the rule tree to any (leaf or non-leaf) node in the rule tree represents a sequence of rules, with the highest ranking farthest from the root. Each rule node (except for the root) has a pointer to a rule, but since a rule may appear in many sequences, there are sometimes many rule nodes pointing to the same rule. Once we have this list we create a ComputedStyle (or find an appropriate existing sibling) with the correct parent pointer (for inheritance) and rule node pointer (for the list of rules), and a few other pieces of information (like the pseudo-element).

The second step of going from style sheets to computed style data is getting the winning property:value pairs from the rules. (This only provides property:value pairs for some of the properties; the remaining properties will fall back to inheritance or to their initial values depending on whether the property is inherited by default.) We do this step (and the third) for each style struct, the first time it is needed. This is done in nsRuleNode::WalkRuleTree, where we ask each style rule to fill in its property:value pairs by calling its MapRuleInfoInto function. When called, the rule fills in only those pairs that haven't been filled in already, since we're calling from the highest priority rule to the lowest (since in many cases this allows us to stop before going through the whole list, or to do partial computation that just adds to data cached higher in the rule tree).

The third step of going from style sheets to computed style data (which various caching optimizations allow us to skip in many cases) is actually doing the computation; this generally means we transform the style data into the data type described in the "Computed Value" line in the property's definition in the CSS specifications. This transformation happens in functions called nsRuleNode::Compute*Data, where the * in the middle represents the name of the style struct. This is where the transformation from the style sheet value storage format to the computed value storage format happens.

Once we have the computed style data, we then store it: if a style struct in the computed style data doesn't depend on inherited values or on data from other style structs, then we can cache it in the rule tree (and then reuse it, without recomputing it, for any ComputedStyles pointing to that rule node). Otherwise, we store it on the ComputedStyle (in which case it may be shared with the ComputedStyle's descendant ComputedStyles). This is where keeping inherited and non-inherited properties separate is useful: in the common case of relatively few properties being specified, we can generally cache the non-inherited structs in the rule tree, and we can generally share the inherited structs up and down the ComputedStyle tree.

The ownership models in style sheet structures are a mix of reference counted structures (for things accessible from script) and directly owned structures. ComputedStyles are reference counted, and own their parents (from which they inherit), and rule nodes are garbage collected with a simple mark and sweep collector (which often never needs to run).

Layout

Much of the layout code deals with operations on the frame tree (or rendering tree). In the frame tree, each node represents a rectangle (or, for SVG, other shapes). The frame tree has a shape similar to the content tree, since many content nodes have one corresponding frame, though it differs in a few ways, since some content nodes have more than one frame or don't have any frames at all. When elements are display:none in CSS or undisplayed for certain other reasons, they won't have any frames. When elements are broken across lines or pages, they have multiple frames; elements may also have multiple frames when multiple frames nested inside each other are needed to display a single element (for example, a table, a table cell, or many types of form controls).

Each node in the frame tree is an instance of a class derived from nsIFrame. As with the content tree, there is a substantial type hierarchy, but the type hierarchy is very different: it includes types like text frames, blocks and inlines, the various parts of tables, flex and grid containers, and the various types of HTML form controls.

Frames are allocated within an arena owned by the PresShell. Each frame is owned by its parent; frames are not reference counted, and code must not hold on to pointers to frames. To mitigate potential security bugs when pointers to destroyed frames, we use frame poisoning, which takes two parts. When a frame is destroyed other than at the end of life of the presentation, we fill its memory with a pattern consisting of a repeated pointer to inaccessible memory, and then put the memory on a per-frame-class freelist. This means that if code accesses the memory through a dangling pointer, it will either crash quickly by dereferencing the poison pattern or it will find a valid frame.

Like the content tree, frames must be accessed only from the UI thread.

The frame tree should not store any important data, i.e. any data that cannot be recomputed on-the-fly. While the frame tree does usually persist while a page is being displayed, frames are often destroyed and recreated in response to certain style changes, and in the future we may do the same to reduce memory use for pages that are currently inactive. There were a number of cases where this rule was violated in the past and we stored important data in the frame tree; however, most (though not quite all) such cases are now fixed.

The rectangle represented by the frame is what CSS calls the element's border box. This is the outside edge of the border (or the inside edge of the margin). The margin lives outside the border; and the padding lives inside the border. In addition to nsIFrame::GetRect, we also have the APIs nsIFrame::GetPaddingRect to get the padding box (the outside edge of the padding, or inside edge of the border) and nsIFrame::GetContentRect to get the content box (the outside edge of the content, or inside edge of the padding). These APIs may produce out of date results when reflow is needed (or has not yet occurred).

In addition to tracking a rectangle, frames also track two overflow areas: ink overflow and scrollable overflow. These overflow areas represent the union of the area needed by the frame and by all its descendants. The ink overflow is used for painting-related optimizations: it is a rectangle covering all of the area that might be painted when the frame and all of its descendants paint. The scrollable overflow represents the area that the user should be able to scroll to to see the frame and all of its descendants. In some cases differences between the frame's rect and its overflow happen because of descendants that stick out of the frame; in other cases they occur because of some characteristic of the frame itself. The two overflow areas are similar, but there are differences: for example, margins are part of scrollable overflow but not ink overflow, whereas text-shadows are part of ink overflow but not scrollable overflow.

When frames are broken across lines, columns, or pages, we create multiple frames representing the multiple rectangles of the element. The first one is the primary frame, and the rest are its continuations (which are more likely to be destroyed and recreated during reflow). These frames are linked together as continuations: they have a doubly-linked list that can be used to traverse the continuations using nsIFrame::GetPrevContinuation and nsIFrame::GetNextContinuation. (Currently continuations always have the same style data, though we may at some point want to break that invariant.)

Continuations are sometimes siblings of each other (i.e. nsIFrame::GetNextContinuation and nsIFrame::GetNextSibling might return the same frame), and sometimes not. For example, if a paragraph contains a span which contains a link, and the link is split across lines, then the continuations of the span are siblings (since they are both children of the paragraph), but the continuations of the link are not siblings (since each continuation of the link is descended from a different continuation of the span). Traversing the entire frame tree does not require explicit traversal of any frames' continuations-list, since all of the continuations are descendants of the element containing the break.

We also use continuations for cases (most importantly, bidi reordering, where left-to-right text and right-to-left text need to be separated into different continuations since they may not form a contiguous rectangle) where the continuations should not be rewrapped during reflow: we call these continuations fixed rather than fluid. nsIFrame::GetNextInFlow and nsIFrame::GetPrevInFlow traverse only the fluid continuations and do not cross fixed continuation boundaries.

If an inline frame has non-inline children, then we split the original inline frame into parts. The original inline's children are distributed into these parts like so: The children of the original inline are grouped into runs of inline and non-inline, and runs of inline get an inline parent, while runs of non-inline get an anonymous block parent. We call this 'ib-splitting' or 'block-inside-inline splitting'. This splitting proceeds recursively up the frame tree until all non-inlines inside inlines are ancestors of a block frame with anonymous block wrappers in between. This splitting maintains the relative order between these child frames, and the relationship between the parts of a split inline is maintained using an ib-sibling chain. It is important to note that any wrappers created during frame construction (such as for tables) might not be included in the ib-sibling chain depending on when this wrapper creation takes place.

TODO: nsBox craziness from https://bugzilla.mozilla.org/show_bug.cgi?id=524925#c64

TODO: link to documentation of block and inline layout

TODO: link to documentation of scrollframes

TODO: link to documentation of XUL frame classes

Code (note that most files in base and generic have useful one line descriptions at the top that show up in DXR):

  • layout/base/ contains objects that coordinate everything and a bunch of other miscellaneous things
  • layout/generic/ contains the basic frame classes as well as support code for their reflow methods (ReflowInput, ReflowOutput)
  • layout/forms/ contains frame classes for HTML form controls
  • layout/tables/ contains frame classes for CSS/HTML tables
  • layout/mathml/ contains frame classes for MathML
  • layout/svg/ contains frame classes for SVG
  • layout/xul/ contains frame classes for the XUL box model and for various XUL widgets

Bugzilla:

  • All of the components whose names begin with "Layout" in the "Core" product

Further documentation:


Frame Construction

Frame construction is the process of creating frames. This is done when styles change in ways that require frames to be created or recreated or when nodes are inserted into the document. The content tree and the frame tree don't have quite the same shape, and the frame construction process does some of the work of creating the right shape for the frame tree. It handles the aspects of creating the right shape that don't depend on layout information. So for example, frame construction handles the work needed to implement table anonymous objects but does not handle frames that need to be created when an element is broken across lines or pages.

The basic unit of frame construction is a run of contiguous children of a single parent element. When asked to construct frames for such a run of children, the frame constructor first determines, based on the siblings and parent of the nodes involved, where in the frame tree the new frames should be inserted. Then the frame constructor walks through the list of content nodes involved and for each one creates a temporary data structure called a frame construction item. The frame construction item encapsulates various information needed to create the frames for the content node: its style data, some metadata about how one would create a frame for this node based on its namespace, tag name, and styles, and some data about what sort of frame will be created. This list of frame construction items is then analyzed to see whether constructing frames based on it and inserting them at the chosen insertion point will produce a valid frame tree. If it will not, the frame constructor either fixes up the list of frame construction items so that the resulting frame tree would be valid or throws away the list of frame construction items and requests the destruction and re-creation of the frame for the parent element so that it has a chance to create a list of frame construction items that it can fix up.

Once the frame constructor has a list of frame construction items and an insertion point that would lead to a valid frame tree, it goes ahead and creates frames based on those items. Creation of a non-leaf frame recursively attempts to create frames for the children of that frame's element, so in effect frames are created in a depth-first traversal of the content tree.

The vast majority of the code in the frame constructor, therefore, falls into one of these categories:

  • Code to determine the correct insertion point in the frame tree for new frames.
  • Code to create, for a given content node, frame construction items. This involves some searches through static data tables for metadata about the frame to be created.
  • Code to analyze the list of frame construction items.
  • Code to fix up the list of frame construction items.
  • Code to create frames from frame construction items.

Code: layout/base/nsCSSFrameConstructor.h and layout/base/nsCSSFrameConstructor.cpp

Physical Sizes vs. Logical Sizes

TODO: Discuss inline-size (typically width) and block size (typically height), writing modes, and the various logical vs. physical size/rect types.

Reflow

Reflow is the process of computing the positions and sizes of frames. (After all, frames represent rectangles, and at some point we need to figure out exactly *what* rectangle.) Reflow is done recursively, with each frame's Reflow method calling the Reflow methods on that frame's descendants.

In many cases, the correct results are defined by CSS specifications (particularly CSS 2.1). In some cases, the details are not defined by CSS, though in some (but not all) of those cases we are constrained by Web compatibility. When the details are defined by CSS, however, the code to compute the layout is generally structured somewhat differently from the way it is described in the CSS specifications, since the CSS specifications are generally written in terms of constraints, whereas our layout code consists of algorithms optimized for incremental recomputation.

The reflow generally starts from the root of the frame tree, though some other types of frame can act as "reflow roots" and start a reflow from them (nsTextControlFrame is one example; see the NS_FRAME_REFLOW_ROOT frame state bit). Reflow roots must obey the invariant that a change inside one of their descendants never changes their rect or overflow areas (though currently scrollbars are reflow roots but don't quite obey this invariant).

In many cases, we want to reflow a part of the frame tree, and we want this reflow to be efficient. For example, when content is added or removed from the document tree or when styles change, we want the amount of work we need to redo to be proportional to the amount of content. We also want to efficiently handle a series of changes to the same content.

To do this, we maintain two bits on frames: NS_FRAME_IS_DIRTY indicates that a frame and all of its descendants require reflow. NS_FRAME_HAS_DIRTY_CHILDREN indicates that a frame has a descendant that is dirty or has had a descendant removed (i.e., that it has a child that has NS_FRAME_IS_DIRTY or NS_FRAME_HAS_DIRTY_CHILDREN or it had a child removed). These bits allow coalescing of multiple updates; this coalescing is done in PresShell, which tracks the set of reflow roots that require reflow. The bits are set during calls to PresShell::FrameNeedsReflow and are cleared during reflow.

The layout algorithms used by many of the frame classes are those specified in CSS, which are based on the traditional document formatting model, where widths are input and heights are output.

In some cases, however, widths need to be determined based on the content. This depends on two intrinsic widths: the minimum intrinsic width (see nsIFrame::GetMinISize) and the preferred intrinsic width (see nsIFrame::GetPrefISize). The concept of what these widths represent is best explained by describing what they are on a paragraph containing only text: in such a paragraph the minimum intrinsic width is the width of the longest word, and the preferred intrinsic width is the width of the entire paragraph laid out on one line.

Intrinsic widths are invalidated separately from the dirty bits described above. When a caller informs the pres shell that a frame needs reflow (PresShell::FrameNeedsReflow), it passes one of three options:

  • eResize indicates that no intrinsic widths are dirty
  • eTreeChange indicates that intrinsic widths on it and its ancestors are dirty (which happens, for example, if new children are added to it)
  • eStyleChange indicates that intrinsic widths on it, its ancestors, and its descendants are dirty (for example, if the font-size changes)

Reflow is the area where the XUL frame classes (those that inherit from nsBoxFrame or nsLeafBoxFrame) are most different from the rest. Instead of using nsIFrame::Reflow, they do their layout computations using intrinsic size methods called GetMinSize, GetPrefSize, and GetMaxSize (which report intrinsic sizes in two dimensions) and a final layout method called Layout. In many cases these methods defer some of the computation to a separate object called a layout manager.

When an individual frame's Reflow method is called, most of the input is provided on an object called ReflowInput and the output is filled in to an object called ReflowOutput. After reflow, the caller (usually the parent) is responsible for setting the frame's size based on the metrics reported. (This can make some computations during reflow difficult, since the new size is found in either the reflow state or the metrics, but the frame's size is still the old size. However, it's useful for invalidating the correct areas that need to be repainted.)

One major difference worth noting is that in XUL layout, the size of the child is set prior to its parent calling its Layout method. (Once invalidation uses display lists and is no longer tangled up in Reflow, it may be worth switching non-XUL layout to work this way as well.)

Painting

TODO: display lists (and event handling)

TODO: layers

Pagination

The concepts behind pagination (also known as fragmentation) are a bit complicated, so for now we've split them off into a separate document: Gecko:Continuation_Model. This code is used for printing, print-preview, and multicolumn frames.

Dynamic change handling along the rendering pipeline

The ability to make changes to the DOM from script is a major feature of the Web platform. Web authors rely on the concept (though there are a few exceptions, such as animations) that changing the DOM from script leads to the same rendering that would have resulted from starting from that DOM tree. They also rely on the performance characteristics of these changes: small changes to the DOM that have small effects should have proportionally small processing time. This means that Gecko needs to efficiently propagate changes from the content tree to style, the frame tree, the geometry of the frame tree, and the screen.

For many types of changes, however, there is substantial overhead to processing a change, no matter how small. For example, reflow must propagate from the top of the frame tree down to the frames that are dirty, no matter how small the change. One very common way around this is to batch up changes. We batch up changes in lots of ways, for example:

  • The content sink adds multiple nodes to the DOM tree before notifying listeners that they've been added. This allows notifying once about an ancestor rather than for each of its descendants, or notifying about a group of descendants all at once, which speeds up the processing of those notifications.
  • We batch up nodes that require style reresolution (recomputation of selector matching and processing the resulting style changes). This batching is tree based, so it not only merges multiple notifications on the same element, but also merges a notification on an ancestor with a notification on its descendant (since some of these notifications imply that style reresolution is required on all descendants).
  • We wait to reconstruct frames that require reconstruction (after destroying frames eagerly). This, like the tree-based style reresolution batching, avoids duplication both for same-element notifications and ancestor-descendant notifications, even though it doesn't actually do any tree-based caching.
  • We postpone doing reflows until needed. As for style reresolution, this maintains tree-based dirty bits (see the description of NS_FRAME_IS_DIRTY and NS_FRAME_HAS_DIRTY_CHILDREN under Reflow).
  • We allow the OS to queue up multiple invalidates before repainting (though we will likely switch to controlling that ourselves). This leads to a single repaint of some set of pixels where there might otherwise have been multiple (though it may also lead to more pixels being repainted if multiple rectangles are merged to a single one).

Having changes buffered up means, however, that various pieces of information (layout, style, etc.) may not be up-to-date. Some things require up-to-date information: for example, we don't want to expose the details of our buffering to Web page script since the programming model of Web page script assumes that DOM changes take effect "immediately", i.e., that the script shouldn't be able to detect any buffering. Many Web pages depend on this.

We therefore have ways to flush these different sorts of buffers. There are methods called FlushPendingNotifications on nsIDocument and nsIPresShell, that take an argument of what things to flush:

  • Flush_Content: create all the content nodes from data buffered in the parser
  • Flush_ContentAndNotify: the above, plus notify document observers about the creation of all nodes created so far
  • Flush_Style: the above, plus make sure style data are up-to-date
  • Flush_Frames: the above, plus make sure all frame construction has happened (currently the same as Flush_Style)
  • Flush_InterruptibleLayout: the above, plus perform layout (Reflow), but allow interrupting layout if it takes too long
  • Flush_Layout: the above, plus ensure layout (Reflow) runs to completion
  • Flush_Display (should never be used): the above, plus ensure repainting happens

The major way that notifications of changes propagate from the content code to layout and other areas of code is through the nsIDocumentObserver and nsIMutationObserver interfaces. Classes can implement this interface to listen to notifications of changes for an entire document or for a subtree of the content tree.

WRITE ME: ... layout document observer implementations

TODO: how style system optimizes away rerunning selector matching

TODO: style changes and nsChangeHint

Refresh driver

Graphics

Wiki page | contribute

Further documentation:

WebRender vs Layers

Until recently, there were two different rendering paths in Gecko. The new one is WebRender. "Layers" is the name of the previous / non-WebRender architecture, and was removed in bug 1541472.

Here's a very high level summary of how it works: [1]

From a high level architectural point of view, the main differences are:

  • Layers separate rendering into a painting phase and a compositing phase which usually happen in separate processes. For WebRender, on the other hand, all of the rendering happens in a single operation.
  • Layers internally expresses rendering rendering commands through a traditional canvas-like immediate mode abstraction (Moz2D) which has several backends (some of which use the GPU). WebRender works directly at a lower and GPU-centric level, dealing in terms of batches, draw calls and shaders.
  • Layers code is entirely C++, while WebRender is mostly Rust with some integration glue in C++.

WebRender

  • WebRender's main entry point is a display list. It is a description of a visible subset of the page produced by the layout module.
  • WebRender and Gecko's display list representations are currently different so there is a translation between the two.
  • The WebRender display list is serialized on the content process and sent to the process that will do the rendering (either the main process or the GPU process).
  • The rendering process deserializes the display list and builds a "Scene".
  • From this scene, frames can be built. Frames represent the actual drawing operations that need to be performed for content to appear on screen. A single scene can produce several frames, for example if an element of the scene is scrolled.
  • The Renderer consumes the frame and produces actual OpenGL drawing commands. Currently WebRender is based on OpenGL and on Windows the OpenGL commands are transparently turned into D3D ones using ANGLE. In the long run we plan to work with a vulkan-like abstraction called gfx-rs.

The main phases in WebRender are therefore:

  • Display list building
  • Scene building
  • frame building
  • GPU commands execution

All of these phases are performed on different threads. The first two happen whenever the layout of the page changes or the user scrolls past the part of the page covered by the current scene. They don't necessarily happen at a high frequency. Frame building and GPU command execution, on the other hand, happen at a high frequency (the monitor's refresh rate during scrolling and animations), which means that they must fit in the frame budget (typically 16ms). In order to avoid going over the frame budget, frame building and GPU command execution can overlap. scene building and frame building can also overlap and it is possible to continue generating frames while a new scene is being built asynchronously.

WebRender has a fallback mechanism called "Blob images" for content that it does not handle (for example some SVG drawing primitives). It consists in recording and serializing a list of drawing commands (the "blob") on the content process and sending it to WebRender along with the regular display list. Blobs are then turned into images on the CPU during the scene building phase. The most of WebRender treats blob images as regular images.

Some important internal operations and data structures:

The WebRender code path reuses the layers IPC infrastructure for sharing textures between the content process and the renderer. For example The ImageBridge protocol described in the Compositing section is also used to transfer video frames when WebRender is enabled. Asynchronous Panning and Zooming (APZ) described in a later sections is also relevant to WebRender.

Painting/Rasterizing (Layers aka Non-WebRender)

Layers pipeline overview

Painting/rendering/rasterizing is the step where graphical primitives (such as a command to fill an SVG circle, or the internally produced command to draw a rounded rectangle) are used to color in the pixels of a surface so that the surface "displays" those rasterized primitives.

The platform independent library that gecko uses to render is Moz2D. Gecko code paints into Moz2D DrawTarget objects (the DrawTarget base class having multiple platform dependent subclasses).

Compositing

The compositing stage of the rendering pipeline is operated by the gfx/layers module.

Different parts of a web page can sometimes be painted into intermediate surfaces retained by layers. It can be convenient to think of layers as the layers in image manipulation programs like The Gimp or Photoshop. Layers are organized as a tree. Layers are primarily used to minimize invalidating and repainting when elements are being animated independently of one another in a way that can be optimized using layers (e.g. opacity or transform animations), and to enable some effects (such as transparency and 3D transforms).

Compositing is the action of flattening the layers into the final image that is shown on the screen.

We paint and composite on separate threads (it is called Off-main-thread compositing, or OMTC, and a long time ago Firefox did not have this separation).

The Layers architecture is built on the following notions:

  • Compositor: an object that can draw quads on the screen (or on an off-screen render target).
  • Texture: an object that contains image data.
  • Compositable: an object that can manipulate one or several textures, and knows how to present them to the compositor
  • Layer: an element of the layer tree. A layer usually doesn't know much about compositing: it uses a compositable to do the work. Layers are mostly nodes of the layer tree.
New layers architecture, with OGL backend

Since painting and compositing are performed on different threads/processes, we need a mechanism to synchronize a client layer tree that is constructed on the content thread, and a host layer tree that is used for compositing on the compositor thread. This synchronization process must ensure that the host layer tree reflects the state of the current layer tree while remaining in a consistent state, and must transfer texture data from a thread to the other. Moreover, on some platforms the content and compositor threads may live in separate processes.

To perform this synchronization, we use the IPDL IPC framework. IPDL lets us describe communications protocols and actors in a domain specific language. for more information about IPDL, read https://wiki.mozilla.org/IPDL

When the client layer tree is modified, we record modifications into messages that are sent together to the host side within a transaction. A transaction is basically a list of modifications to the layer tree that have to be applied all at once to preserve consistency in the state of the host layer tree.

Texture transfer is done by synchronizing texture objects across processes/threads using a couple TextureClient/TextureHost that wrap shared data and provide access to it on both sides. TextureHosts provide access to one or several TextureSource, which has the necessary API to actually composite the texture (TextureClient/Host being more about IPC synchronization than actual compositing. The logic behind texture transfer (as in single/double/triple buffering, texture tiling, etc) is operated by the CompositableClient and CompositableHost.

It is important to understand the separation between layers and compositables. Compositables handle all the logic around texture transfer, while layers define the shape of the layer tree. a Compositable is created independently from a layer and attached to it on both sides. While layers transactions can only originate from the content thread, this separation makes it possible for us to have separate compositable transactions between any thread and the compositor thread. We use the ImageBridge IPDL protocol to that end. The Idea of ImageBridge is to create a Compositable that is manipulated on the ImageBridgeThread, and that can transfer/synchronize textures without using the content thread at all. This is very useful for smooth Video compositing: video frames are decoded and passed into the ImageBridge without ever touching the content thread, which could be busy processing reflows or heavy javascript workloads.

It is worth reading the inline code documentation in the following files:

To visualize how a web page is layered, turn on the pref "layers.draw-borders" in about:config. When this pref is on, the borders of layers and tiles are displayed on top of the content. This only works when using the Compositor API, that is when WebRender is disabled, since WebRender does not have the same concept of layers.

Blog posts with information on Layers that should be integrated here:

Async Panning and Zooming

On some of our mobile platforms we have some code that allows the user to do asynchronous (i.e. off-main-thread) panning and zooming of content. This code is the Async Pan/Zoom module (APZ) code and is further documented at Platform/GFX/APZ.

Scripting

JavaScript Engine

Gecko embeds the SpiderMonkey JavaScript engine (located in js/src). The JS engine includes a number of Just-In-Time compilers (or JITs). SpiderMonkey provides a powerful and extensive embedding API (called the JSAPI). Because of its complexity, using the JSAPI directly is frowned upon, and a number of abstractions have been built in Gecko to allow interacting with the JS engine without using JSAPI directly. Some JSAPI concepts are important for Gecko developers to understand, and they are listed below:

  • Global object: The global object is the object on which all global variables/properties/methods live. In a web page this is the 'window' object. In other things such as XPCOM components or sandboxes XPConnect creates a global object with a small number of builtin APIs.
  • Compartments: A compartment is a subdivision of the JS heap. The mapping from global objects to compartments is one-to-one; that is, every global object has a compartment associated with it, and no compartment is associated with more than one global object. (NB: There are compartments associated with no global object, but they aren't very interesting). When JS code is running SpiderMonkey has a concept of a "current" compartment. New objects that are created will be created in the "current" compartment and associated with the global object of that compartment.
  • When objects in one compartment can see objects in another compartment (via e.g. iframe.contentWindow.foo) cross-compartment wrappers or CCWs mediate the interaction between the two compartments. By default CCWs ensure that the current compartment is kept up-to-date when execution crosses a compartment boundary. SpiderMonkey also allows embeddings to extend wrappers with custom behavior to implement security policies, which Gecko uses extensively (see the Security section for more information).

Further documentation:

XPConnect

XPConnect is a bridge between C++ and JS used by XPCOM components exposed to or implemented in JavaScript. XPConnect allows two XPCOM components to talk to each other without knowing or caring which language they are implemented in. XPConnect allows JS code to call XPCOM components by exposing XPCOM interfaces as JS objects, and mapping function calls and attributes from JS into virtual method calls on the underlying C++ object. It also allows JS code to implement an XPCOM component that can be called from C++ by faking the vtable of an interface and translating calls on the interface into JS calls. XPConnect accomplishes this using the vtable data stored in XPCOM TypeLibs (or XPT files) by the XPIDL compiler and a small layer of platform specific code called xptcall that knows how to interact with and impersonate vtables of XPCOM interfaces.

XPConnect is also used for a small (and decreasing) number of legacy DOM objects. At one time all of the DOM used XPConnect to talk to JS, but we have been replacing XPConnect with the WebIDL bindings (see the next section for more information). Arbitrary XPCOM components are not exposed to web content. XPCOM components that want to be accessible to web content must provide class info (that is, they must QI to nsIClassInfo) and must be marked with the nsIClassInfo::DOM_OBJECT flag. Most of these objects also are listed in dom/base/nsDOMClassInfo.cpp, where the interfaces that should be visible to the web are enumerated. This file also houses some "scriptable helpers" (classes ending in "SH" and implementing nsIXPCScriptable), which are a set of optional hooks that can be used by XPConnect objects to implement more exotic behaviors (such as indexed getters or setters, lazily resolved properties, etc). This is all legacy code, and any new DOM code should use the WebIDL bindings.

WebIDL Bindings

The WebIDL bindings are a separate bridge between C++ and JS that is specifically designed for use by DOM code. We intend to replace all uses of XPConnect to interface with content JavaScript with the WebIDL bindings. Extensive documentation on the WebIDL bindings is available, but the basic idea is that a binding generator takes WebIDL files from web standards and some configuration data provided by us and generates at build time C++ glue code that sits between the JS engine and the DOM object. This setup allows us to produce faster, more specialized code for better performance at the cost of increased codesize.

The WebIDL bindings also implement all of the behavior specified in WebIDL, making it possible for new additions to the DOM to behave correctly "out of the box". The binding layer also provides C++ abstractions for types that would otherwise require direct JSAPI calls to handle, such as typed arrays.

Reflectors

Whether using XPConnect or Web IDL, the basic idea is that there is a JSObject called a reflector that represents some C++ object. The JSObject ensures that the C++ object remains alive while the JSObject is alive. The C++ object does not necessarily keep the JSObject alive; this allows the JSObject to be garbage-collected as needed. If this happens, the next time JS needs access to that C++ object a new reflector is created. If something happens that requires that the actual object identity of the JSObject be preserved even though it's not explicitly reachable, the C++ object will start keeping the JSObject alive as well. After this point, the cycle collector would be the only way for the pair of objects to be deallocated. Examples of operations that require identity-preservation are adding a property to the object, giving the object a non-default prototype, using the object as a weakmap key or weakset member.

In the XPConnect case, reflectors are created by XPCWrappedNative::GetNewOrUsed. This finds the right JSClass to use for the object, depending on information provided by the C++ object via the nsIClassInfo interface about special class hooks it might need. If the C++ object has no clasinfo, XPC_WN_NoHelper_JSClass is used. GetNewOrUsed also determines the right global to use for the reflector, again based on nsIClassInfo information, with a fallback to the JSContext's current global.

In the Web IDL case, reflectors are created by the Wrap method that the binding code generator spits out for the relevant Web IDL interface. So for the Document interface, the relevant method is mozilla::dom::Document_Binding::Wrap. The JSClass that is used is also created by the code generator, as part of the DOMJSClass struct it outputs. Web IDL objects provide a GetParentObject() method that is used to determine the right global to use for the reflector.

Web IDL objects typically inherit from nsWrapperCache, and use that class to store a pointer to their reflector and to keep it alive as needed. The only exception to that is if the Web IDL object only needs to find a reflector once during its lifetime, typically as a result of a Web IDL constructor being called from JS (see TextEncoder for an example). In that case, there is no need to get the JS object from the C++ one, and no way to end up in any of the situations that require preserving the JS object from the C++ side, so the Web IDL object is allowed to not inherit from nsWrapperCache.

Security

Gecko's JS security model is based on the concept of compartments. Every compartment has a principal associated with it that contains security information such as the origin of the compartment, whether or not it has system privileges, etc. For the following discussion, wrappee refers to the underlying object being wrapped, scope refers to the compartment the object is being wrapped for use in, and wrapper refers to the object created in scope that allows it to interact with wrappee. "Chrome" refers to JS that is built in to the browser or part of an extension, which runs with full privileges, and is contrasted with "content", which is web provided script that is subject to security restrictions and the same origin policy.

  • Xray wrappers: Xray wrappers are used when the scope has chrome privileges and the wrappee is the JS reflection of an underlying DOM object. Beyond the usual CCW duties of making sure that content code does not run with chrome privileges, Xray wrappers also ensure that calling methods or accessing attributes on the wrappee has the "expected" effect. For example, a web page could replace the close method on its window with a JS function that does something completely different. (e.g. window.close = function() { alert("This isn't what you wanted!") };). Overwriting methods in this way (or creating entirely new ones) is referred to as expando properties. Xrays see through expando properties, invoking the original method/getter/setter if the expando overwrites a builtin or seeing undefined in the expando creates a new property. This allows chrome code to call methods on content DOM objects without worrying about how the page has changed the object.
  • Waived xray wrappers: The xray behavior is not always desirable. It is possible for chrome to "waive" the xray behavior and see the actual JS object. The wrapper still guarantees that code runs with the correct privileges, but methods/getters/setters may not behave as expected. This is equivalent to the behavior chrome sees when it looks at non-DOM content JS objects.
  • For more information, see the MDN page
  • bholley's Enter the Compartment talk also provides an overview of our compartment architecture.

Images

Plugins

Platform-specific layers

  • widget
  • native theme
  • files, networking, other low-level things
  • Accessibility APIs
  • Input. Touch-input stuff is somewhat described at Platform/Input/Touch.

Editor

Base layers

NSPR

NSPR is a library for providing cross-platform APIs for various platform-specific functions. We tend to be trying to use it as little as possible, although there are a number of areas (particularly some network-related APIs and threading/locking primitives) where we use it quite a bit.

XPCOM

XPCOM is a cross-platform modularity library, modeled on Microsoft COM. It is an object system in which all objects inherit from the nsISupports interface.

components and services, contract IDs and CIDs

prior overuse of XPCOM; littering with XPCOM does not produce modularity

Base headers (part of xpcom/base/) and data structures. See also mfbt.

Threading

xptcall, proxies

reference counting, cycle collection

Further documentation:

String

XPCOM has string classes for representing sequences of characters. Typical C++ string classes have the goals of encapsulating the memory management of buffers and the issues needed to avoid buffer overruns common in traditional C string handling. Mozilla's string classes have these goals, and also the goal of reducing copying of strings.

(There is a second set of string classes in xpcom/glue for callers outside of libxul; these classes are partially source-compatible but have different (worse) performance characteristics. This discussion does not cover those classes.)

We have two parallel sets of classes, one for strings with 1-byte units (char, which may be signed or unsigned), and one for strings with 2-byte units (char16_t, always unsigned). The classes are named such that the class for 2-byte characters ends with String and the corresponding class for 1-byte characters ends with CString. 2-byte strings are almost always used to encode UTF-16. 1-byte strings are usually used to encode either ASCII or UTF-8, but are sometimes also used to hold data in some other encoding or just byte sequences.

The string classes distinguish, as part of the type hierarchy, between strings that must have a null-terminator at the end of their buffer (ns[C]String) and strings that are not required to have a null-terminator (ns[C]Substring). ns[C]Substring is the base of the string classes (since it imposes fewer requirements) and ns[C]String is a class derived from it. Functions taking strings as parameters should generally take one of these four types.

In order to avoid unnecessary copying of string data (which can have significant performance cost), the string classes support different ownership models. All string classes support the following three ownership models dynamically:

  • reference counted, copy-on-write, buffers (the default)
  • adopted buffers (a buffer that the string class owns, but is not reference counted, because it came from somewhere else)
  • dependent buffers, that is, an underlying buffer that the string class does not own, but that the caller that constructed the string guarantees will outlive the string instance

In addition, there is a special string class, nsAuto[C]String, that additionally contains an internal 64-unit buffer (intended primarily for use on the stack), leading to a fourth ownership model:

  • storage within an auto string's stack buffer

Auto strings will prefer reference counting an existing reference-counted buffer over their stack buffer, but will otherwise use their stack buffer for anything that will fit in it.

There are a number of additional string classes, particularly nsDependent[C]String, nsDependent[C]Substring, and the NS_LITERAL_[C]STRING macros which construct an nsLiteral[C]String which exist primarily as constructors for the other types. These types are really just convenient notation for constructing an ns[C]S[ubs]tring with a non-default ownership mode; they should not be thought of as different types. (The Substring, StringHead, and StringTail functions are also constructors for dependent [sub]strings.) Non-default ownership modes can also be set up using the Rebind and Adopt methods, although the Rebind methods actually live on the derived types, which is probably a mistake (although moving them up would require some care to avoid making an API that easily allows assigning a non-null-terminated buffer to a string whose static type indicates that it is null-terminated).

Note that the presence of all of these classes imposes some awkwardness in terms of distinctions being available as both static type distinctions and dynamic type distinctions. In general, the only distinctions that should be made statically are 1-byte vs. 2-byte sequences (CString vs String) and whether the buffer is null-terminated or not (Substring vs String). (Does the code actually do a good job of dynamically enforcing the Substring vs. String restriction?)

TODO: buffer growth, concatenation optimizations

TODO: encoding conversion, what's validated and what isn't

TODO: "string API", nsAString (historical)

  • Code: xpcom/string/
  • Bugzilla: Core::String

Further documentation: