Gecko:CrossProcessLayers: Difference between revisions
| Line 60: | Line 60: | ||
= Important cases on fennec = | = Important cases on fennec = | ||
Will the browser process need to see all layers in a content process, or just a single container/image/screen layer? Need plan for optimal performance (responsiveness and frame rate) for the following cases. | <s>Will the browser process need to see all layers in a content process, or just a single container/image/screen layer?</s> We always want video in their own layers to decouple framerate from content- and browser-process main-thread work. So at minimum, we need to publish a tree consisting of a layer "below" the video and a layer "above" the video. Need plan for optimal performance (responsiveness and frame rate) for the following cases. | ||
* '''Panning''': browser immediately translates coords of | * '''Panning''': browser immediately translates coords of container layer before delivering event to content process. Content (or browser) later uses event to update region painting heuristics. | ||
* '''Volume rocker zoom''': browser immediately sets scaling matrix for | * '''Volume rocker zoom''': browser immediately sets scaling matrix for container layer (fuzzy zoom). Content (or browser) later uses event to update region painting heuristics. | ||
* '''Double-tap zoom''' | * '''Double-tap zoom''' | ||
** Question: How long does it typically take to determine the zoom target? | ** Question: How long does it typically take to determine the zoom target? | ||
** Container layer: can we use layer-tree heuristics to do a fuzzy zoom while content process figures out target? (Better perceived responsiveness.) Might want three-step process: (i) send event to content process, have it determine new viewport (?); (ii) publish that update to browser, browser initiates fuzzy scale; (iii) content process initiates "real" upscale. | |||
** Container layer: can we use layer-tree heuristics to do a fuzzy zoom while content process figures out target? (Better perceived responsiveness) | * '''Video''': video layers directly connect to the compositor process from off-main-thread in the content process, so that new frames can be published independently of content- and browser-process main threads. | ||
* '''canvas''': Initially can store in SysV shmem. Better would be mapping memory accessible by the video card in a privileged process (e.g. bc-cat), and sharing this mapping to the content process. | |||
* '''CSS transforms and SVG filters''': low priority, assume SW-only in content process for now. CSS transforms might be easy to accelerate, SVG filters may never be (Bas cites problems with the N900's GPU). | |||
* | * '''Animations''': low priority, not discussed wrt fennectrolysis. | ||
* '''canvas''': | |||
* '''CSS transforms and SVG filters''': | |||
* '''Animations''': | |||
<s>Question: for a given layer subtree, can reasonably guess how expensive the transformation/compositing operations will take in CPU and GPU time? Could use this information for distributed scheduling.</s> Low priority. | |||
Question: for a given layer subtree, can reasonably guess how expensive the transformation/compositing operations will take in CPU and GPU time? Could use this information for distributed scheduling. | |||
= Concurrency model for remote layers = | = Concurrency model for remote layers = | ||
Revision as of 21:39, 29 March 2010
Proposal
- Have a dedicated "GPU Process" responsible for all access to the GPU via D2D/D3D/OpenGL
- This process can be killed and restarted to fix leaks or cover for other driver bugs
- This process would be privileged, allowing content processes to be sandboxed but still use the GPU remotely
- In some configurations the GPU process could be a set of threads in a regular browser process (the master process)
- Browser processes (content and chrome) maintain a layer tree on their main threads
- Layer tree is maintained by layout code
- Each transaction that updates a layer tree pushes a set of changes over to a "shadow layer tree"
- This shadow layer tree is what we use for rendering off the main thread
- This is necessary for layer-based animation to work without being blocked by the main thread
- Since we're pushing changes across threads anyway, we might as well push them across process boundaries at the same time, so push them all the way to the GPU process
- Therefore, the GPU process maintains a set of shadow layer trees and is responsible for compositing them
- Question: why "set of" rather than "layer tree"? Latter option would have main thread maintain "master tree", push updates of this to compositor process. Scheduling could be done by setting "frame-rate goal" attributes on layers (or something).
- Compositing a shadow layer tree either results in a buffer that's rendered into a window, or a buffer that is later composited into some other layer tree
- We can reduce VRAM usage at the expense of increased recomposition work by recompositing a content process layer tree every time we composite its parent layer tree and not having a persistent intermediate buffer
- Question: would this require a synchronous request to content to re-composite, or would composition be done on content pixels in shared memory?
- We can control the scheduling of content layer tree composition
- We can reduce VRAM usage at the expense of increased recomposition work by recompositing a content process layer tree every time we composite its parent layer tree and not having a persistent intermediate buffer
This proposal lets us use a generic remoting layer backend. Hardware/platform specific backends are isolated to the GPU process and do not need to do their own remoting.
Implementation Steps
The immediate need is to get something working for Fennec. Proposal:
- Initially, let the GPU process be a thread in the master process
- Build the remoting layers backend
- Publish changes from the content process layer trees and the master process chrome layer tree to shadow trees managed by the GPU thread
TODO stuart/mobile guys/gfx guys.
- Strawman proposal 1: move tile manager into content process, publish canvas tiles as layers.
- Strawman proposal 2: drop tile manager, start from desktop-browser layer manager in content process, add fennec heuristics to that.
Implementation Details for Fennec
- Key question: what cairo backend do we use to draw into ThebesLayers?
- Image backend?
- Allocate shared system memory or bc-cat buffers for the regions of ThebesLayers to update
- Gecko processes draw into those areas using cairo image backend
- GL backend uploads textures from system memory or acquires texture handle for bc-cat buffer across processes (e.g. using texture_to_pixmap); composites those changes into its ThebesLayer buffers
- Windowless plugins suffer, except for Flash where we have NPP_DrawImage and can use layers to composite those images together
- GTK theme rendering suffers
- do we care?
- Xlib backend?
- Allocate textures for changed ThebesLayer areas and map to pixmap using texture_to_pixmap
- Gecko processes draw into those pixmaps using cairo Xlib backend
- GL backend acquires texture handle across processes; composites those changes into its ThebesLayer buffers
- Maybe we can have some XShm or bc-cat hack that lets us do it all ... an X pixmap backed that we can also poke directly through shared memory that's also a texture!
- Image backend?
Future Details
- How to handle D2D?
- Direct access
- Allocate D3D buffer for changed ThebesLayer areas
- Gecko processes draw into it using cairo D2D backend
- Indirect access (for sandboxed content processes etc)
- Remote cairo calls across to the GPU process, creating a command queue that gets posted instead of a new buffer
- Direct access
Important cases on fennec
Will the browser process need to see all layers in a content process, or just a single container/image/screen layer? We always want video in their own layers to decouple framerate from content- and browser-process main-thread work. So at minimum, we need to publish a tree consisting of a layer "below" the video and a layer "above" the video. Need plan for optimal performance (responsiveness and frame rate) for the following cases.
- Panning: browser immediately translates coords of container layer before delivering event to content process. Content (or browser) later uses event to update region painting heuristics.
- Volume rocker zoom: browser immediately sets scaling matrix for container layer (fuzzy zoom). Content (or browser) later uses event to update region painting heuristics.
- Double-tap zoom
- Question: How long does it typically take to determine the zoom target?
- Container layer: can we use layer-tree heuristics to do a fuzzy zoom while content process figures out target? (Better perceived responsiveness.) Might want three-step process: (i) send event to content process, have it determine new viewport (?); (ii) publish that update to browser, browser initiates fuzzy scale; (iii) content process initiates "real" upscale.
- Video: video layers directly connect to the compositor process from off-main-thread in the content process, so that new frames can be published independently of content- and browser-process main threads.
- canvas: Initially can store in SysV shmem. Better would be mapping memory accessible by the video card in a privileged process (e.g. bc-cat), and sharing this mapping to the content process.
- CSS transforms and SVG filters: low priority, assume SW-only in content process for now. CSS transforms might be easy to accelerate, SVG filters may never be (Bas cites problems with the N900's GPU).
- Animations: low priority, not discussed wrt fennectrolysis.
Question: for a given layer subtree, can reasonably guess how expensive the transformation/compositing operations will take in CPU and GPU time? Could use this information for distributed scheduling. Low priority.
Concurrency model for remote layers
Kinda somewhat a lower-level implementation detail, kinda somewhat not.
Assume we have a master process M and a slave process S. M and S maintain their own local layer trees M_l and S_l. M_l may have a leaf RemoteContainer layer R into which updates from S are published. The contents of R are immutable wrt M, but M may freely modify R. R contains the "shadow layer tree" R_s published by S. R_s is semantically a copy of a (possibly) partially-composited S_l.
Updates to R_s are atomic wrt painting. When S wishes to publish updates to M, it sends an "Update(cset)" message to M containing all R_s changes to be applied. This message is processed in its own "task" ("event") in M. This task will (?? create a layer tree transaction and ??) apply cset. cset will include layer additions, removals, and attribute changes. Initially we probably want Update(cset) to be synchronous. (cjones believes it can be made asynchronous, but that would add unnecessary complexity for a first implementation.) Under the covers (opaque to M), in-place updates will be made to existing R_s layers.
Question: how should M publish updates of R_s to its own master MM? One approach is to apply Update(cset) to R_s, then synchronously publish Update(cset union M_cset) to its master MM. This is an optimization that allows us to maintain copy semantics without actually copying.
A cset C can be constructed by implementing a TransactionRecorder interface for layers (layer managers?). The recorder will observe all mutations performed on a tree and package them into an IPC message. (This interface could also be used for debugging, to dump layer modifications to stdout.)
Video decoding fits into this model: decoders will be a slave S that's a thread in a content process, and the decoders publish updates directly to a master M that's the compositor process. The content and browser main threads will publish special "placeholder" video layers that reference "real" layers in the compositor process.