Gecko:Overview
This document attempts to give an overview of the different parts of Gecko, what they do, and why they do it, say where the code for them is within the repository, and link to more specific documentation (when available) covering the details of that code. Maintainers of these areas of code should correct errors, add information, and add links to more detailed documentation (since this document is intended to remain an overview, not complete documentation).
Docshell and Session History
The user of a Web browser can change the page shown in that browser in many ways: by clicking a link, loading a new URL, using the forward and back buttons, or other ways. This can happen inside a space we'll call a browser session; this space can be a browser window, a tab, or a frame or iframe within a document. The toplevel data structures within Gecko represent this browser session; they contain other data structures representing the individual pages displayed inside of it (most importantly, the current one). In terms of implementation, these two types of navigation, the top level of a browser and the frames within it, largely use the same data structures.
In Gecko, the docshell is the toplevel object responsible for managing a single browser session. It, and the associated session history code, manage the navigation between pages inside of a docshell. (Note the difference between session history, which is a sequence of pages in a single browser, used for recording information for back and forward navigation, and global history, which is the history of pages visited and associated times, regardless of browser session, used for things like link coloring and address autocompletion.)
There are relatively few objects in Gecko that are associated with a docshell rather than being associated with a particular one of the pages inside of it. Most such objects are attached to the docshell. However, an important one that is the outer window object in the DOM code (where both the outer and inner window objects are implemented by nsGlobalWindow, though HTML5 describes the outer window as a WindowProxy and the inner window as a Window). See DOM for more information on this.
The most toplevel object for managing the contents of a particular page being displayed within a docshell is a document viewer (see layout). Other important objects associated with this presentation are the document (see DOM) and the pres(entation) shell and pres(entation) context (see layout).
- code: mozilla/docshell/
- bugzilla: Core::Document Navigation
- documentation: DocShell:Home Page
Embedding
To be written (and maybe rewritten if we get an IPC embedding API).
Multi-process and IPC
To be written.
Networking
- URIs (and how to create them; allowance for extensibility)
- protocol handlers (and how to create them; allowance for extensibility), channels
- request model
- crypto
Document rendering pipeline
Some of the major components of Gecko can be described as steps on the path from an HTML document coming in from the network to the graphics commands needed to render that document. An HTML document is a serialization of a tree structure. (FIXME: add diagram) The HTML parser and content sink create an in-memory representation of this tree, which we call the DOM tree or the content tree. Many APIs that script can used operate on the content tree. Then, in layout, we create a second tree, the frame tree (or rendering tree) that is a similar shape to the content tree, but where each node in the tree represents a rectangle (except in SVG where they represent other shapes). We then compute the positions of the nodes in the frame tree (called frames) and paint them using our cross-platform graphics APIs (which, underneath, map to platform-specific graphics APIs).
Parser
The parser's job is to transform a character stream into a tree structure, with the help of the content sink classes.
HTML is parsed using a parser implementing the parsing algorithm in the HTML specification (starting with HTML5). Much of this parser is translated from Java, and changes are made to the Java version. This parser in parser/html/.
The codebase still has the previous generation HTML parser, which is still used for a small number of things, though we hope to be able to remove it entirely soon. This parser is in parser/htmlparser/.
XML is parsed using the expat library (parser/expat/) and code that wraps it (parser/xml/). This is a non-validating parser; however, it loads certain DTDs to support XUL localization.
- code parser/ (though content sink classes are in content/)
DOM / Content
The content tree or DOM tree is the central data structure for Web
pages. It is a tree structure, initially created from the tree
structure expressed in the HTML or XML markup. The nodes in the tree
implement major parts of the DOM (Document Object Model) specifications.
The nodes themselves are part of a class hierarchy rooted at
nsINode; different derived classes are used for things such
as text nodes, the document itself, HTML elements, SVG elements, etc.,
with further subclasses of many of these types (e.g., for specific HTML
elements). Many of the APIs available to script running in Web pages
are associated with these nodes. The tree structure persists while the
Web pages is displayed, since it stores much of state associated with
the Web page. The code for these nodes lives in the content/ directory.
The DOM APIs are not threadsafe. DOM nodes can be accessed only from the main thread (also known as the UI thread (user interface thread)) of the application.
There are also many other APIs available to Web pages that are not APIs on the nodes in the DOM tree. Many of these other APIs also live in the same directories, though some live in content/ and some in dom/. These include APIs such as the DOM event model.
The dom/ directory also includes some of the code needed to expose Web APIs to JavaScript (in other words, the glue code between JavaScript and these APIs). See [#Scripting Scripting] below for details of this code.
TODO: Internal APIs vs. DOM APIs.
TODO: Mutation observers / document observers.
TODO: Reference counting and cycle collection.
TODO: specification links
Style System
In order to display the content, Gecko needs to compute the styles relevant to each DOM node. It does this based on the model described in the CSS specifications: this model applies to style specified in CSS, style specified in HTML, and our own default style. There are two major sets of data structures within the style system:
- first, data structures that represent sources of style data, such as CSS style sheets or data from stylistic HTML attributes
- second, data structures that represent computed style for a given DOM node.
these sets of data structures are mostly distinct (for example, they store values in different ways).
The loading of CSS style sheets from the network is managed by the CSS loader; they are then tokenized by the CSS scanner and parsed by the CSS parser. Those that are attached to the document also expose APIs to script that are known as the CSS Object Model, or CSSOM.
The style sheets that apply to a document are managed by a class called the style set. The style set interacts with the different types of style sheets (representing CSS style sheets, HTML presentational attributes, and style attributes) through two interfaces: [http://mxr.mozilla.org/mozilla-central/source/layout/style/nsIStyleSheet.h nsIStyleSheet] for basic management of style sheets and [http://mxr.mozilla.org/mozilla-central/source/layout/style/nsIStyleRuleProcessor.h nsIStyleRuleProcessor] for getting the style data out of them. Usually the same object implements both interfaces, except in the most important case, CSS style sheets, where there is a single rule processor for all of the CSS style sheets in each level of the CSS cascade.
The computed style data are exposed to the rest of Gecko through a class
called nsStyleContext. Rather than having a member variable for each
CSS property, it breaks up the properties into groups of related
properties called style structs. These style structs obey the rule that
all of the properties in a single struct either inherit by default (what
the CSS specifications call "Inherited: yes" in the definition of
properties; we call these inherited structs) or all are not inherited by
default (we call these reset structs). Separating the properties in
this way improves the ability to share the structs between similar style
contexts and reduce the amount of memory needed to store the style data.
The API nsStyleContext exposes involves a method for getting each
struct, so you'll see code like
sc->GetStyleText()->mTextAlign for getting the value of the
text-align CSS property. (Frames (see layout) also have the same
GetStyle* methods, which just forward the call to the frame's style
context.)
The style contexts form a tree structure, in a shape somewhat like the content tree (except that we coalesce identical sibling style contexts rather than keeping two of them around; if the parents have been coalesced then this can apply recursively and coalasce cousins, etc.). The parent of a style context has the style data that the style context inherits from when CSS inheritance occurs. This means that the parent of the style context for a DOM element is generally the style context for that DOM element's parent, since that's how CSS says inheritance works.
The process of turning the style sheets into computed style data goes through three main steps, the first two of which closely relate to the [http://mxr.mozilla.org/mozilla-central/source/layout/style/nsIStyleRule.h nsIStyleRule] interface, which represents an immutable source of style data, conceptually representing (and for CSS style rules, directly storing) a set of property:value pairs. (It is similar to the idea of a CSS style rule, except that it is immutable; this immutability allows for significant optimization. When a CSS style rule is changed through script, we create a new style rule.)
The first step of going from style sheets to computed style data is finding the ordered sequence of style rules that apply to an element. The order represents which rules override which other rules: if two rules have a value for the same property, the higher ranking one wins. (Note that there's another difference from CSS style rules: declarations with !important are represented using a separate style rule.) This is done by calling one of the nsIStyleRuleProcessor::RulesMatching methods. The ordered sequence is stored in a trie caled the rule tree: the path from the root of the rule tree to any (leaf or non-leaf) node in the rule tree represents a sequence of rules, with the highest ranking farthest from the root. Each rule node (except for the root) has a pointer to a rule, but since a rule may appear in many sequences, there are sometimes many rule nodes pointing to the same rule. Once we have this list we create a style context (or find an appropriate existing sibling) with the correct parent pointer (for inheritance) and rule node pointer (for the list of rules), and a few other pieces of information (like the pseudo-element).
The second step of going from style sheets to computed style data is getting the winning property:value pairs from the rules. (This only provides property:value pairs for some of the properties; the remaining properties will fall back to inheritance or to their initial values depending on whether the property is inherited by default.) We do this step (and the third) for each style struct, the first time it is needed. This is done in nsRuleNode::WalkRuleTree, where we ask each style rule to fill in its property:value pairs by calling its MapRuleInfoInto function. When called, the rule fills in only those pairs that haven't been filled in already, since we're calling from the highest priority rule to the lowest (since in many cases this allows us to stop before going through the whole list, or to do partial computation that just adds to data cached higher in the rule tree).
The third step of going from style sheets to computed style data (which various caching optimizations allow us to skip in many cases) is actually doing the computation; this generally means we transform the style data into the data type described in the "Computed Value" line in the property's definition in the CSS specifications. This transformation happens in functions called nsRuleNode::Compute*Data, where the * in the middle represents the name of the style struct. This is where the transformation from the style sheet value storage format to the computed value storage format happens.
Once we have the computed style data, we then store it: if it doesn't depend on inherited values or on data from other style structs, then we can cache it in the rule tree (and then reuse it, without recomputing it, for any style contexts pointing to that rule node). Otherwise, we store it on the style context. This is where keeping inherited and non-inherited properties separate is useful: in the common case of relatively few properties being specified, we can generally cache the non-inherited structs in the rule tree, and we can generally share the inherited structs up and down the style context tree.
The ownership models in style sheet structures are a mix of reference counted structures (for things accessible from script) and directly owned structures. Style contexts are reference counted, and own their parents (from which they inherit), and rule nodes are garbage collected with a simple mark and sweep collector (which often never needs to run).
- code: layout/style/
- Bugzilla: Style System (CSS)
- specifications
- CSS 2.1
- CSS 2010, listing stable css3 modules
- CSS WG editors drafts (often more current, but sometimes more unstable than the drafts on the technical reports page)
- [http://dbaron.org/mozilla/visited-privacy Preventing attacks on a
user's history through CSS :visited selectors]
- documentation
- style system documentation (somewhat out of date)
Layout
Much of the layout code deals with operations on the frame tree (or rendering tree). In the frame tree, each node represents a rectangle (or, for SVG, other shapes). The frame tree has a shape similar to the content tree, since many content nodes have one corresponding frame, though it differs in a few ways, since some content nodes have more than one frame or don't have any frames at all. When elements are display:none in CSS or undisplayed for certain other reasons, they won't have any frames. When elements are broken across lines or pages, they have multiple frames; elements may also have multiple frames when multiple frames nested inside each other are needed to display a single element (for example, a table, a table cell, or many types of form controls).
Each node in the frame tree is an instance of a class derived from
nsIFrame. As with the content tree, there is a substantial
type hierarchy, but the type hierarchy is very different: it includes
types like text frames, blocks and inlines, the various parts of tables,
and the various types of HTML form controls.
Frames are allocated within an arena owned by the pres shell. Each frame is owned by its parent; frames are not reference counted, and code must not hold on to pointers to frames. To mitigate potential security bugs when pointers to destroyed frames, we use [http://robert.ocallahan.org/2010/10/mitigating-dangling-pointer-bugs-using_15.html frame poisoning], which takes two parts. When a frame is destroyed other than at the end of life of the presentation, we fill its memory with a pattern consisting of a repeated pointer to inaccessible memory, and then put the memory on a per-frame-class freelist. This means that if code accesses the memory through a dangling pointer, it will either crash quickly by dereferencing the poison pattern or it will find a valid frame.
Like the content tree, frames must be accessed only from the UI thread.
The frame tree should not store any important data. While it does usually persist while a page is being displayed, frames are often destroyed and recreated in response to certain style changes, and in the future we may do the same to reduce memory use for pages that are currently inactive. There were a number of cases where this rule was violated in the past and we stored important data in the frame tree; however, most (though not quite all) such cases are now fixed.
The rectangle represented by the frame is what CSS calls the element's border box. This is the outside edge of the border (or the inside edge of the margin). The margin lives outside the border; and the padding lives inside the border. In addition to nsIFrame::GetRect, we also have the APIs nsIFrame::GetPaddingRect to get the padding box (the outside edge of the padding, or inside edge of the border) and nsIFrame::GetContentRect to get the content box (the outside edge of the content, or inside edge of the padding). These APIs may produce out of date results when reflow is needed (or has not yet occurred).
In addition to tracking a rectangle, frames also track two overflow areas: visual overflow and scrollable overflow. These overflow areas represent the union of the area needed by the frame and by all its descendants. The visual overflow is used for painting-related optimizations: it is a rectangle covering all of the area that might be painted when the frame and all of its descendants paint. The scrollable overflow represents the area that the user should be able to scroll to to see the frame and all of its descendants. In some cases differences between the frame's rect and its overflow happen because of descendants that stick out of the frame; in other cases they occur because of some characteristic of the frame itself. The two overflow areas are similar, but there are differences: for example, margins are part of scrollable overflow but not visual overflow, whereas text-shadows are part of visual overflow but not scrollable overflow.
When frames are broken across lines, columns, or pages, we create multiple frames representing the multiple rectangles of the element. The first one is the primary frame, and the rest are its continuations (which are more likely to be destroyed and recreated during reflow). These frames are linked together as continuations: they have a doubly-linked list that can be used to traverse the continuations using nsIFrame::GetPrevContinuation and nsIFrame::GetNextContinuation. (Currently continuations always have the same style data, though we may at some point want to break that invariant.)
Continuations are sometimes siblings of each other, and sometimes not. For example, if a paragraph contains a span which contains a link, and the link is split across lines, then the continuations of the span are siblings (since they are both children of the paragraph), but the continuations of the link are not siblings (since each continuation of the link is descended from a different continuation of the span). Traversing the entire frame tree does not require considering continuations, since all of the continuations are descendants of the element containing the break.
We also use continuations for cases (most importantly, bidi reordering, where left-to-right text and right-to-left text need to be separated into different continuations since they may not form a contiguous rectangle) where the continuations should not be rewrapped during reflow: we call these continuations fixed rather than fluid. nsIFrame::GetNextInFlow and nsIFrame::GetPrevInFlow traverse only the fluid continuations and do not cross fixed continuation boundaries.
TODO: nsBox craziness from https://bugzilla.mozilla.org/show_bug.cgi?id=524925#c64
TODO: describe block-within-inline splits
TODO: link to documentation of block and inline layout
TODO: link to documentation of scrollframes
TODO: link to documentation of XUL frame classes
Links:
Frame Construction
TODO: describe
Reflow
Reflow is the process of computing the positions of frames. (After all, frames represent rectangles, and at some point we need to figure out exactly *what* rectangle.) Reflow is done recursively, with each frame's Reflow method calling the Reflow methods on that frame's descendants.
The reflow generally starts from the top of the tree, though some other types of frame can act as reflow roots and start a reflow from them. Reflow roots must obey the invariant that a change inside one of their descendants never changes their rect or overflow areas (though currently scrollbars are reflow roots but don't quite obey this invariant).
In many cases, we want to reflow a part of the frame tree, and we want this reflow to be efficient. For example, when content is added or removed from the document tree or when styles change, we want the amount of work we need to redo to be proportional to the amount of content. We also want to efficiently handle a series of changes to the same content.
To do this, we maintain two bits on frames: NS_FRAME_IS_DIRTY indicates that a frame and all of its descendants require reflow. NS_FRAME_HAS_DIRTY_CHILDREN indicates that a frame has a descendant that is dirty or has had a descendant removed (i.e., that it has a child that has NS_FRAME_IS_DIRTY or NS_FRAME_HAS_DIRTY_CHILDREN or it had a child removed). These bits allow coalescing of multiple updates; this coalescing is done in nsPresShell, which tracks the set of reflow roots that require reflow. The bits are set during calls to nsPresShell::FrameNeedsReflow and are cleared during reflow.
The layout algorithms used by many of the frame classes are those specified in CSS, which are based on the traditional document formatting model, where widths are input and heights are output.
In some cases, however, widths need to be determined based on the content. This depends on two intrinsic widths: the minimum intrinsic width (see nsIFrame::GetMinWidth) and the preferred intrinsic width (see nsIFrame::GetPrefWidth). The concept of what these widths represent is best explained by describing what they are on a paragraph containing only text: in such a paragraph the minimum intrinsic width is the width of the longest word, and the preferred intrinsic width is the width of the entire paragraph laid out on one line.
Intrinsic widths are invalidated separately from the dirty bits described above. When a caller informs the pres shell that a frame needs reflow (nsIPresShell::FrameNeedsReflow), it passes one of three options:
- eResize indicates that no intrinsic widths are dirty
- eTreeChange indicates that intrinsic widths on it and its ancestors are dirty (which happens, for example, if new children are added to it)
- eStyleChange indicates that intrinsic widths on it, its ancestors, and its descendants are dirty (for example, if the font-size changes)
Reflow is the area where the XUL frame classes (those that inherit from nsBoxFrame or nsLeafBoxFrame) are most different from the rest. Instead of using nsIFrame::Reflow, they do their layout computations using intrinsic size methods called GetMinSize, GetPrefSize, and GetMaxSize (which report intrinsic sizes in two dimensions) and a final layout method called Layout. In many cases these methods defer some of the computation to a separate object called a layout manager.
When an individual frame's Reflow method is called, most of the input is provided on an object called nsHTMLReflowState and the output is filled in to an object called nsHTMLReflowMetrics. After reflow, the caller (usually the parent) is responsible for setting the frame's size based on the metrics reported. (This can make some computations during reflow difficult, since the new size is found in either the reflow state or the metrics, but the frame's size is still the old size. However, it's useful for invalidating the correct areas that need to be repainted.)
One major difference worth noting is that in XUL layout, the size of the child is set prior to its parent calling its Layout method. (Once invalidation uses display lists and is no longer tangled up in Reflow, it may be worth switching non-XUL layout to work this way as well.)
Painting
TODO: display lists (and event handling)
TODO: layers
Dynamic change handling along the rendering pipeline
TODO: flushing, different mozFlushType and what they mean
TODO: document observers and mutation observers
TODO: how style system optimizes away rerunning selector matching
TODO: style changes and nsChangeHint
Refresh driver
Graphics
2D Graphics API
- main use is at the end of the document pipeline, so could be part of it
- also used more directly from canvas
TODO: much more goes here
Scripting
- JavaScript Engine
- XPConnect
- quickstubs
- security (caps, wrappers)
- DOM glue, classinfo, etc.
Images
Plugins
== Platform-specific layers
- widget
- native theme
- files, networking, other low-level things
- Accessibility APIs
Editor
Base layers
NSPR
NSPR is a library for providing cross-platform APIs for various platform-specific functions. We tend to be trying to use it as little as possible, although there are a number of areas (particularly some network-related APIs and threading/locking primitives) where we use it quite a bit.
XPCOM
XPCOM is a cross-platform modularity library, modeled on Microsoft COM. It
is an object system in which all objects inherit from the nsISupports interface.
components and services, contract IDs and CIDs
prior overuse of XPCOM; littering with XPCOM does not produce modularity
Base headers (part of xpcom/base/) and data structures. See also mfbt.
Threading
xptcall, proxies
reference counting, cycle collection
String
1-byte vs. 2-byte strings; encodings;
different ownership models: managed (with reference counted buffers), adopting external buffers, dependent buffers
dynamic vs. static typing;
concatenation optimizations