Gecko:Layers

Layer API Proposals

Things we'd like to be able to do

Accelerated compositing and scaling
Accelerated color space conversion
Accelerated SVG filters (more speculative)
Off-main thread compositing and animation (including completely-off-main-thread video playback)
Implement CSS perspective transforms
Keep a backing store for a chunk of rendered content
Avoid having to read back WebGL surfaces to composite them
Possibly hw accelerated windowless plugins
Support targeting Xv or other specialized framebuffers for very low-end devices?
Support hardware-accelerated video decoding and playback?
Accelerated composition of content from multiple processes, including untrusted processes that may lack access to native accelerated graphics APIs

Requirements

Should have an efficient software fallback
Cross-platform API
Don't make clients implement multiple rendering code paths
Abstract over hardware acceleration APIs (e.g. D3D vs GL)
Be able to use platform layer APIs (e.g. Core Animation) if we get backed into a corner with undocumented interfaces?
Support VRAM management; we'd like to be able to discard buffers and re-render them when VRAM is full, including with fine granularity via a tile cache

Open questions

Should we build the layer tree if we are not hardware accelerated?
- Roc: I want code above gfx to have a single code path that works regardless of whether we're using accceleration or not. That means we want to be using the layer API all the time. However, in non-accelerated situations the layer API could permit an implementation that's mostly immediate-mode drawing --- that's the approach I took in my proposal.
Should layers always have a backing store?
- Jeff is leaning toward yes. I believe Roc is leaning toward no. One thing to think about would be the idea of layers, having a virtual backing store. i.e. layers that don't currently have a backing store would "faulted in" by do the drawing.
- Jeff: One advantage of them always having a backing store is that it gives the users the api some performance guarantees. I'd argue that without these the layer abstraction becomes less valuable and perhaps duplicates the frame tree?
- Roc: assuming that "container" layers are always formed by compositing together their children, what is the point of having backing store for them? For leaf layers, I assume we would keep them in backing store (as long as clients hold references to them, which may only be as long as one paint cycle). My proposal makes no provision for "faulting in" the contents of discarded layers.
- Jeff: My response to that is: what's the point creating layers for things that we don't want a backing store for?
- Roc: so we can do hardware accelerated operations on them
- It seems that after IRC conversation, we currently don't have any argument for having the API support drawing directly into container layers. Whether the rendering of a container layer is cached in a surface is an implementation detail, which may vary based on speed-vs-VRAM tradeoffs.

How is layer z-order maintained?
- Roc: child order. i.e. the nth child is beneath n+1th child

Use case

Assume we have a webpage with a container containing some web content and a video. This container has a 3d-transform applied to it. What should the layer hierarchy look like?

Roc:
- Master layer for the page or window (no drawn content of its own)
  - Retained content that's behind the transformed container
  - Transformed container layer (no drawn content of its own)
    - Retained content behind the video
    - Video (YUV)
    - Retained content in front of the video (not needed if there is no such content)
  - Retained content that's in front of the transformed container (not needed if there is no such content)

Roc

Display list processing

Taking a step back, let me write down how I think layout and display lists could/should build and maintain a layer tree.

Currently when we paint we only build a display list for the items that intersect the bounding rectangle of the invalid area. Instead let's assume we have to build a display list for the entire visible window area. This is not likely to be a performance problem. This does not mean we actually have to repaint the entire visible window every time, it just means we figure out what would be painted.

Then traversing the display list lets us (re)build the layer tree.

There are three kinds of layers: 1) Explicit container layers that we create because we need to take advantage of hardware acceleration. Example: element with transform, element with opacity. 2) Explicit leaf layers that we create because we need to take advantage of hardware acceleration. Example: video YUV layer, image element with transform or opacity. 3) Implicit "in between" layers that represent all the content rendered between two explicit layers, or before the first explicit layer of a container, or after the last explicit layer of a container. These are always leaves. All cairo-rendered content is in these layers.

Observation: An explicit layer is always associated with a single frame. (Although a single frame may render more than just its layer, e.g. an nsVideoFrame can have background, border and outline display items as well as the video YUV layer.) We can record in such a frame its explicit layer(s), if any. I think the maximum number of explicit layers a frame might need is 2: one "intrinsic" layer, e.g. the YUV layer for video, and one "extrinsic" layer, e.g. the layer induced by transforms or opacity. For example an nsVideoFrame with borders and a 3D transform would need two layers. I think each frame can have 0 or 1 explicit container layers and 0 or 1 explicit leaf layers. Any kind of frame can have an explicit container layer, but only certain types of frames have explicit leaf layers.

Observation: in-between layers can span arbitrary chunks of the frame tree. Worse, different parts of the same frame can be at different z-levels, and since an in-between layer represents a run of z-contiguous display items, different display items of the same frame can be in different in-between layers, and a single in-between layer can contain display items for frames very far apart in the frame tree. HOWEVER, every in-between layer is either the layer before some explicit layer, or the last layer of some container.

The display list system actually builds a tree of lists. A display item corresponding to an explicit container layer will have a child list with the container contents. If we treat the outermost list for the window as the children of a root container layer, then our job is, given a display list representing the children of a container layer, determine the child layers:

For any items in the list that represent the explicit container layer for a frame, that layer will be a child layer. We can retrieve this layer if it's cached in the frame. But we need to recursively descend into these display items to determine their child layers.
For any items in the list that represent the explicit leaf layer for a frame, that layer will be a child layer. We can retrieve this layer if it's cached in the frame.
For each run of consecutive items in the list that are not an explicit container or leaf layer, we have one implicit layer. It seems painful to attach this layer to any frame or another layer. However, we can keep a list of them in their container layer.
- An item in the display list might not have an explicit layer, but might contain a child list containing an item with an explicit layer, etc. This is a problem. We basically need to disable explicit layers for children in this situation, OR guarantee this never happens, by ensuring that all our display items that contain their own display lists can be implemented as explicit layers. For example, consider an element with an SVG filter applied that contains an element with a 3D transform. If we're going to accelerate the 3D transform and do it off the main thread, we need also need the compositing-thread to apply the SVG filter (preferably accelerated), so SVG filters need to be supported by the layer API. What needs to be supported:
  - Rectangle clipping (easy)
  - Opacity (easy)
  - Transforms (easy)
  - SVG filters (hard)
  - SVG path clipping (unsure, probably easy if we're willing to rasterize the path ahead of time)
  - SVG mask (easy)
  - SVG foreignObject inside arbitrary SVG gunk ... hmm. Maybe if we support all the above, *and* render SVG using display lists (which we currently don't), that would be enough.

The hard remaining problem is to know how to reuse or update the implicit layers belonging to a container. The content rendered in those layers may have been invalidated. Updating the invalid rectangle in each "implicit" layer is not that hard. The hard part is knowing when to create and/or destroy implicit layers because content has changed z-index, or an explicit layer child has been inserted or removed. I think we can do this with special flags to tag invalidations caused by a frame with an explicit layer child being inserted or removed.

Thought experiment: suppose we have a regular Web page so everything can be rendered in one implicit layer. Then suppose someone inserts a small element with its own layer (say a 3D-transformed button) into the page, or it gets scrolled into view from offscreen. Suddenly the page content splits into two implicit layers, the stuff below the button and the stuff above the button. From the display list we can easily compute the bounding regions of these two layers. If they're both non-empty, then we need to split the old layer into two layers. We need to re-render at least the intersection of the two regions. I think this can be done fairly efficiently, but it will be tricky.

Implicit layers can also change size, need to be scrolled, etc.

Proposal

LayerManager

Every layer belongs to a LayerManager. The idea is that a window will provide an associated LayerManager. (Documents being rendered by their own process will probably have their own LayerManagers.) Every layer in a layer tree must have the same LayerManager.

Updates to the layer tree are performed within a transaction. Nested transactions are not needed or allowed. Only layer tree states between transactions will be rendered. Unless otherwise noted, all layer-related APIs may only be used within a transaction on the appropriate LayerManager, and only on the main thread.

class LayerManager {
  void beginTransaction();
  void setRoot(Layer*);
  void endTransaction();
};

LayerManager::SetRoot sets a layer to be the root layer. This is how you get a layer to be displayed.

Layer

Layer is the superclass of all layers. A Layer could be anything that can be rendered into a destination surface. It has methods to set various properties that affect the rendering of a layer into its parent: opacity, transform, filter, etc... (For simplicity I'm only showing 'opacity' here.)

class Layer {
  LayerManager getManager();
  void setOpacity(TimeStamp, float);
};

Animated properties are supported. To animate, call a setter method one or more times, passing in different timestamped values. When we render, we use the last supplied property value that is before the current time. When calling a setter method with TimeStamp T, all values for times >= T are discarded. To set a constant value, just pass a null TimeStamp, which is interpreted as being at the beginning of time.

Layers should be referenced counted. The LayerManager holds a reference to its root layer and parent layers hold references to their children.

RenderedLayer

A RenderedLayer is the basic leaf layer that you can render into using cairo/Thebes.

class RenderedLayer : public Layer {
  RenderedLayer(LayerManager, ContainerLayer parent);

  gfxContext* beginDraw(const nsIntRegion& aVisibleRegion,
                        const nsIntRegion& aChangedRegion,
                        nsIntRegion* aRegionToDraw);
  void endDraw();

  void copy(RenderedLayer, const nsIntRegion& aRegion,
            nsIntPoint aDelta);
}

RenderedLayers are conceptually infinite in extent. Each RenderedLayer has an internal "valid region" which is finite. (An implementation would create a surface large enough to hold the entire valid region.) The initial valid region is empty. The implementation is allowed to discard all or part of the buffered contents of a RenderedLayer between transactions. Drawing into the RenderedLayer adds to the valid region, and discarding parts of the buffer removes from the valid region.

When calling beginDraw, the caller specifies in aVisibleRegion a region that needs to be valid when drawing is done. (This is the area that will be visible to the user.) The caller can also specify aChangedRegion to indicate that content in that region has changed and will need to be repainted. The implementation returns aRegionToDraw to indicate the area that must be repainted. Typically this will be aNeedRegion minus (the currently valid region minus aChangedRegion). aRegionToDraw is added to the valid region.

The content in a RenderedLayer can change size. If the size decreases, aChangedRegion will include the area of content that has gone away, and aVisibleRegion will exclude that area. The implementation may trim its buffers appropriately. If the size increases the implementation will need to increase the buffer.

It is possible for aRegionToDraw to return empty, e.g. when nothing changed and the entire visible area is still buffered. The caller should optimize by skipping painting in this case.

For scrolling and to enable intelligent reuse of parts of RenderedLayers by other layers...

ContainerLayer

class ContainerLayer {
  // Add an existing layer
  addLayer(Layer);

  // Open a child container layer. This child must finish()
  // before another child can be added or this builder finishes.
  ContainerLayerBuilder addContainerChild(size, format);
  // Open a child rendered layer. This child must finish()
  // before another child can be added or this builder finishes.
  // RenderedLayers constructed this way may not need a temporary surface.
  RenderedLayerBuilder addRenderedChild(size, format);
};

class RenderedLayerBuilder : LayerBuilder {
  // format can be RGB, ARGB (eventually ARAGAB?)
  // This constructs a layer rendered via gfx that can be used anywhere
  // (and therefore requires a temporary surface).
  RenderedLayerBuilder(size, format);

  // create a (conceptual) copy of the given RenderedLayer so we can modify its
  // parameters or draw into it. The underlying buffer can be managed with
  // copy on write so if we don't ever call getContext, the buffer need not
  // be copied.
  RenderedLayerBuilder(Layer layer);

  // This can only be called after all LayerBuilder property setters are
  // done. The context cannot be used after finish() is called.
  gfxContext* getContext();
};

class YUVLayerBuilder : LayerBuilder {
  // Create a YUV layer with given size and format, and adopt the memory buffer
  YUVLayerBuilder(size, format, bufferToAdopt);
};

class WebGLBufferLayerBuilder : LayerBuilder {
  // Create a layer that's a logical copy (ideally copy on write) of the
  // underlying buffer managed by a WebGL canvas
  WebGLLayerBuilder(webGLBuffer);
};

Add a method gfxContext::SetSource(Layer).

Add a way to return a Layer from a paint event (or just set it directly on the widget), so it gets rendered, possibly asynchronously on another thread.

Clients can use a mixture of retained Layers and recursive painting with each recursion level delimited by ContainerLayerBuilder::addContainerChild followed by finish() on the child.

The goal is to allow a pure cairo implementation of this API that's as efficient as we have today. In that implementation RenderedLayerBuilder::getContext tries to return a context that renders directly into the underlying surface for some ancestor. Of course we also want to have a GL or D3D implementation that's fast, but will require more temporary surfaces if we're not using cairo-gl.

When we go to off-main-thread compositing we'll want to add support for animation and other stuff. For example we might want a YUVSeriesLayerBuilder that can select from a queue of timestamped frames based on the current time. The rendering property setters on LayerBuilder would be extended with animating setters that take a list of timestamped values, or perhaps the parameters of actual transition functions.

Jeff

Layers have two basic operations:

InvalidateRect()/Draw() - Gets content into a layer. Content is draw on the main thread using cairo, a video decoder etc.
Composite() - Composites a layer into the scene. Will perform color conversion, filter etc. as needed.

Possible example of how filter layers could work:

Assume we have 5 layers:

3 layers in the background
1 filter layer
1 layer on top

To render this scene we:

create a texture the size of the filter layer
set the render target to the texture
composite the 3 background layers into the texture
set the render target to the framebuffer
composite the 3 background layers into the framebuffer (if the background layers are completely occluded by the filter layer, we can ommit this)
composite the filter layer, applying a blur kernel to the texture we rendered earlier
composite the top layer

A note about how to implement imperative animation in this world: CoreAnimation will ask a layer to redraw it's contents at particular timestamp. The layer can choose to do so, or ignore the request.

WebKit

Here are some observations on WebKit's implementation: Adding the style "-webkit-perspective: 800;" to a div will promote it to a layer. This seems to cause two regressions:

This content will move separately from the background. i.e. The div and the rest of the content do not move together, the div lags behind.
Having this layer also causes us to repaint all of the background content when scrolling, instead of just the newly exposed area.

It would be nice if we could avoid these problems.

For the example above it looks like webkit creates three layers:

One for the view
One for the document content (the size of the entire document)
One for the div - 100x100

Bas

// Interface implemented by 'renderers'. This interface can for example
// be implemented by Cairo(possibly with cairo-gl), OpenGL, Direct3D or
// any other framework we want. This should allow easy switching between
// different renderers, and provide needed software fallback mechanisms.
class IRenderer
{
  // Set the widget this renders to.
  void SetWidget(widget);
}

// The controlling class that controls the composition of frames. This
// lives on a rectangular area on the client's screen, and controls the
// composition of all layers on the compositor. This runs its own thread
// internally from which all OpenGL/D3D operations are executed. All re-
// scheduling of drawing and invalidations are run based on operations
// executed on the compositor and its layers.
class Compositor
{
  // Create a layer that can be used to render to, the size here
  // describes the size in pixels. The format the format of the data,
  // This can be RGB, RGBA, YUV. The compositor will know what to do
  // with these layers, and how to render them properly. When the last
  // reference to the layer dies there will be only one left, and it's
  // ready to be destroyed. Type can be one of hardware or managed.
  // Only managed layers can be drawn to directly from software.
  // Any created layer can contain other layers inside, places anywhere
  // on its surface. The layer is initially locked, meaning it's not
  // shown until unlocked.
  Layer *CreateLayer(size, format, type);

  // This sets the renderer that this compositor uses, without a renderer
  // the compositor essentially does nothing.
  void SetRenderer(renderer)
};

// These are operations that can be executed on all layers.
class ILayer
{
  // Color by which the layers pixels are multiplied,
  // This contains an alpha value so opacity can implicitly
  // be controlled.
  void SetColor(color); 

  // Sets an affine transformation to place the layer with.
  void SetTransform(matrix);

  // Add a layer to this layer. This layer may be blitted onto
  // this layer's hardware surface.
  void AddLayer(ILayer);

  // Optional pixel shader program to run on this layer. This can be
  // used to apply a variety of effects to the layer when rendered.
  void SetShader(shader);

  // Lock the layer, this makes no changes take effect while in the
  // locked state.
  void Lock();

  // Unlock the layer, this will cause the compositor to traverse
  // passed this frame in the tree when compositing.
  void Unlock();

  // Clone an instance of this layer, it will contain a copy-on-write
  // reference to the contents of this layer. This layer will initially 
  // be locked.
  ILayer *Clone();
};

// Layers exposing this interface allow access to the surface. Double
// buffered, this means that if it's currently being drawn to the compositor
// will simply draw the texture. This will ensure rendering of the compositor
// area doesn't stall waiting on an expensive software render.
class ILockableLayer
{
  // Lock the surface of this layer. Returns a gfxContext to draw to.
  gfxContext *Lock();

  // Unlock the surface, this means we're done. And will signal the
  // compositor to update the associated texture and redraw.
  void Unlock();
};

// Layers exposing this interface can have their hardware surface accessed,
// which can then be used as a render target for other accelerated parts of
// the code.
class IHardwareLayer
{
  // Return hardware surface in whatever structure we pick. Might need
  // locking/unlocking logic.
  HardwareSurface *Surface();
};

// This class controls animations on objects, any class can be made to
// implement it, but we'd most likely provide some standard implementations.
// Any state it wants to maintain is contained on an implementation level.
class IAnimator
{
  // Called by the compositor when starting a rendering cycle, with
  // the elapsed time.
  virtual void AdvanceTime(double aTime);

  // This assigns the animator to a frame and registers with its compositor.
  void Assign(ILayer *aLayer);
}

Integration with windowing systems

Webkit/Coreanimmation

AppKit supports hosting a layer tree in a NSView using the following technique:

[aView setLayer:rootLayer];

[aView setWantsLayer:YES];

It seems that this method creates a transparent CGSurface the size of the viewport that all of the layers are drawn on to. The window server then takes care of compositing the background content and layered content.

If we set an animating div to have a z-index of -1, we seem to get a large texture and a bunch of tiles?

Scrolling

What should we do to scroll?

Bas: Use a tile cache.

Anholt: destination tile cache sounds pretty nice to me, and with APPLE_object_purgeable use the GPU could throw things out of the cache when appropriate (rather than you having to guess).

Comparison with other APIs

CoreAnimation

Rendering Model

http://developer.apple.com/mac/library/documentation/Cocoa/Conceptual/CoreAnimation_guide/Articles/CoreAnimationArchitecture.html#//apple_ref/doc/uid/TP40006655

CoreAnimation has three trees that is uses.

Layer tree - the model -- this is the tree that applications mostly interact with
Presentation tree - the view -- contains the current values of the animating properties
Render tree - the view

Having separate trees sort of corresponds to Roc's LayerBuilder infrastructure described above.

Classes

CALayer - CALayer is the model class for layer-tree objects. It encapsulates the position, size, and transform of a layer, which defines its coordinate system. It also encapsulates the duration and pacing of a layer and its animations by adopting the CAMediaTiming protocol, which defines a layer’s time space.

CARenderer - renders a layer tree

CAShapeLayer - Added in 10.6 presumably provides hardware accelerated rendering of paths

Clutter

ClutterActor - Every actor is a 2D surface positioned and optionally transformed in 3D space. The actor is positioned relative to top left corner of it parent with the childs origin being its anchor point (also top left by default).

ClutterGroup - A group of Actors positioned relative to the group position ClutterStage - a place where Actors are composited

Gecko:Layers

Contents

Things we'd like to be able to do

Requirements

Open questions

Use case

Roc

Display list processing

Proposal

LayerManager

Layer

RenderedLayer

ContainerLayer

Jeff

WebKit

Bas

Integration with windowing systems

Webkit/Coreanimmation

Scrolling

Comparison with other APIs

CoreAnimation

Rendering Model

Classes

Clutter

Navigation menu

Gecko:Layers

Things we'd like to be able to do

Requirements

Open questions

Use case

Roc

Display list processing

Proposal

LayerManager

Layer

RenderedLayer

ContainerLayer

Jeff

WebKit

Bas

Integration with windowing systems

Webkit/Coreanimmation

Scrolling

Comparison with other APIs

CoreAnimation

Rendering Model

Classes

Clutter

Navigation menu

Search