The surfaces discussed here are graphics surfaces that may have to be passed around between a producer and one or multiple consumers living either in the same process or in a different process.
The intended use cases revolve around compositing (layers); but we are interested in developing good abstractions for the producer's side, which shouldn't depend on compositing details. While the compositor is an area where interesting surface-passing takes place, it is not the only one. For example, canvas rendering contexts run their own swap-chain upstream of the compositor, and video playback manages a queue of decoded frames upstream of the compositor. Even once a frame has been passed to the compositor, the compositor isn't necessarily its only consumer. For example, screenshotting is another possible consumer.
We will be using some semi-standard vocabulary like "producer" and "consumer" that is well explained in this document:
Our goals are:
- A. Understand what are our current surface abstractions, what are our current surface-passing mechanisms, and how they interplay.
- B. Develop a single MozSurface concept that should be the only type of surface that producer-side code needs to reference, when it needs to render to a surface that will possibly have to be passed around, possibly over IPC.
- C. Try to unify as much as possible of the logic of passing around surfaces, under a general 'stream' mechanism. If some code really cannot be unified with the rest under such a 'stream' mechanism, then explain exactly why.
Understanding our current code
Let's focus on B2G as an example, both to make things concrete and as it's in some respects that most complex platform.
Let's study our different surface abstractions and surface-passing mechanisms on B2G, starting from the lowest-level.
At the lowest level, B2G reuses Android's native surface type: android::GraphicBuffer. It's what we often refer to as 'gralloc'; more details here:
In Android (as opposed to B2G), android::GraphicBuffer is really the single universal surface type. Indeed, in Android, it handles a lot of things, including:
- Fallback to regular shmem when shareable-with-GPU memory is unavailable;
- Serialization and deserialization over IPC;
But more on Android below. How are we doing all this on B2G? At the moment, we have separate layers of abstraction to handle these different features. It's worth detailing them as that is a good way to continue our exploration of our current graphics platform.
We do not use android::GraphicBuffer's own IPC mechanisms, because these assume that the IPC system is Android's Binder. So in order to pass around android::GraphicBuffer's over IPC, we wrap them in our own class, GrallocBufferActor, which is the actor for an IPDL protocol, PGrallocBuffer.
We have generic IPC code that needs to pass around surface of arbitrary types. Our abstraction for that is SurfaceDescriptor, a union for all the different types of surfaces that we know how to pass over IPC. The gralloc case is SurfaceDescriptorGralloc, but there are other types of SurfaceDescriptor's, such as SurfaceDescriptorShmem (for regular shmems, which unlike gralloc are not shareable directly with the GPU) and SurfaceDescriptorMemory (for regular memory buffers that aren't shmems).
So at this point we can draw a diagram of our different surface abstractions at the lowest levels, where we actually create surfaces and pass them around over IPC:
*** Figure 1: our low-level IPC surface abstractions *** Each Arrow means "wraps" SurfaceDescriptor / \ / \ V V SurfaceDescriptorGralloc Other types of SurfaceDescriptor's like e.g. SurfaceDescriptorShmem | | V GrallocBufferActor | | V android::GraphicBuffer
Thus SurfaceDescriptor is our universal abstraction for a surface that can be passed over IPC. It should only be used for passing things over IPC, especially as it does not keep alive the surface that it wraps (it doesn't refcount). But as, for a long time, it was our only abstraction for this wide variety of surface types, people resorted to using it outside of IPC code to refer to surfaces in cross-platform code. That's why at the moment you can see SurfaceDescriptor's all over our codebase.
Let's continue walking up to higher levels.
We mentioned that in the Android world, android::GraphicBuffer provided two key features: serialization, and locking. As we mentioned, in our world, serialization is done by GrallocBufferActor and wrapped by SurfaceDescriptor in cross-platform code. If we walk one level of abstraction up, we're now going to see where we take care of locking.
Locking is currently taken care of by a pair of classes, TextureClient and TextureHost. The producer sees the surface as a TextureClient, and the consumer receives it as a TextureHost.
For example, here is the diagram for the B2G case:
*** Figure 2: TextureClient/TextureHost, our current best approximation *** *** of a sane surface concept *** paired / IPC TextureClient ================== TextureHost | | implemented | implemented | by V by V GrallocTextureClientOGL GrallocTextureHostOGL | | wraps | wraps | V V paired / IPC GrallocBufferActor ============== GrallocBufferActor
Thus, from the perspective of the producer, TextureClient is the right abstraction for "a surface that we can lock, render to, and share over IPC".
But there are a couple of things at the moment that still prevent TextureClient from being universally used (see next section), so at the moment we still have many places that manipulate SurfaceDescriptor's directly, for lack of a better abstraction.
Let's further continue walking up to ever higher levels of abstraction. We are done describing our abstractions for "a surface"; let's now describe how these surfaces are passed around.
The thing that knows how to pass around surfaces is a Compositable. On the client side, that's a CompositableClient. Currently, each type of CompositableClient (CanvasClient, ImageClient, ...) implements its own logic to pass around surfaces, for example: implementing double-buffering or triple-buffering, queuing produced frames before they are consumed, etc.
That's suboptimal, as there is a great deal of nontrivial logic there that can be shared by using a suitable abstraction. It is nontrivial to find a suitable abstraction for passing surfaces around, but that has been done rather successfully as the notion of "EGL streams" emerged. The spec for EGL_KHR_stream, already mentioned above, is highly recommended reading:
In fact, some particular kinds of CompositableClients already effectively use their own implementation of EGLStream to implement their surfaces-handling logic. Specifically, CanvasClients use a class named SurfaceStream that is basically a standard implementation of EGLStream, to handle triple-buffering for canvases.
Other CompositableClients have their own custom code that is not currently using anything like an EGLStream, but could be.
Understanding what Android 4.3 is doing
In Android 4.3, as we said above, there is a universal surface type, android::GraphicBuffer. Moreover, there are shared surface-passing mechanisms (BufferQueue). So it looks like this:
*** Figure 3: overview of Android's surface-passing *** . . . . . . +----------+ . . |EGLSurface| . . | handle | . . +----------+ . . . . | . . | references . . v . . dequeue() +--------+ . +-----------+ . +------------------+ --------------> +------+ |Consumer| <----------- |BufferQueue| <------------ | Surface class | |Client| +--------+ consumed by +-----------+ presented to |e.g. ANativeWindow| | | . . +------------------+ <-------------- +------+ | . | . queue() | holds . | wraps . | | . | . | holds v . v . v . . +-------------+ . +---------------+ . +-------------+ |GraphicBuffer| . | queue impl. | . |GraphicBuffer| +-------------+ . +---------------+ . +-------------+ . . . | . . | holds . . v . . . . +-------------+ . . |GraphicBuffer| . . +-------------+ . . . . . . . . . +----------------------+ +----------------------+ | In Mozilla's case | | In Android's case | |case, the IPC frontier| |case, the IPC frontier| | is here | | is here | +----------------------+ +----------------------+
Developing a standard MozSurface abstraction
As explained in the previous section, we currently have two different abstractions, at two different levels, for "a surface that a producer can render to, and then pass over IPC". One is TextureClient, the other is SurfaceDescriptor.
As explained above, TextureClient is closest to being the right one that we want everything to use; while SurfaceDescriptor is too low-level and has only become widely used because we didn't have anything else to do that job (it predates TextureClient). The key differences between TextureClient and SurfaceDescriptor is that TextureClient handles locking and cross-process memory management; so the key problem with code that uses SurfaceDescriptor's directly is that it is hard for such code to get locking and memory management right.
Let's use the term "MozSurface" to refer to that universal abstraction for "a surface that a producer can render to, and then pass over IPC".
TextureClient isn't quite ready yet to be blessed as "MozSurface", for the following reasons:
- 1. Not all layer types use the new TextureClient/Host API yet, some still use the DeprecatedTextureClient/Host. We can't use MozSurface with layers that are still on the deprecated API.
- 2. MozSurface should be free of any dependency on Layers classes, because even though ultimately graphics aren't visible to the user until they get composited as layers, we can have to start creating and even passing around surfaces before layers are created. A good example is a canvas that hasn't been inserted into the DOM yet. Currently TextureClient/Host refer to compositables to identify themselves over IPC, and the lifetime of shared texture data is constrained by the lifetime of the Compositable. This is getting fixed with the introduction of the PTexture protocol in bug 897452.
The goal is to make TextureClient evolve incrementally into an abstraction that fulfills all the requirements of MozSurface. Note that the design of (the non-deprectaed) TextureClient was driven almost by the same needs as MozSurface. MozSurfaces pushes the requirement a little bit further, making the abstraction the "universal" surface abstraction, while TextureClient's goal was to be the required abstraction for any surface that may be shared with the compositor (potentially leaving out some use cases where we may use MozSurfaces that we know will not be used for compositing.
As a longer term goal, we may want to merge some of TextureHost's functionalities into TextureClient/MozSurface, for instance the ability to expose one or several TextureSources for compositing. This will only be useful if we decide to prioritize being able to use the Compositor API outside the compositor process.
A few design principles
- We need a clear separation between abstractions that are about data, and abstractions that are about logic. MozSurface is strictly about Data. It's purpose is to handle safety through reference counting, locking and eventually IPC synchronization to ensure that data is always used in safe ways, without double-delete, use-after-free or other read-write races. By contrast, DrawTarget is an abstraction for the drawing logic. MozSurface may expose APIs like "GetAsDrawTarget" to delegate drawing to the DrawTarget abstraction, but should not be considered as a drawing tool.
- A MozSurface should be the only object owning its underlying buffer. Users of this buffer *must* go through a MozSurface. This is what makes it possible for MozSurface to safely control the access and the lifetime of the underlying memory.
- MozSurface should not compromise performances. No implicit surface copies under the hood, etc.
- MozSurface wraps one and only one texture/buffer/piece of memory. Tracking this the underlying memory should be equivalent to tracking the MozSurface.
Concrete short-term plan to move towards "MozSurface"
In the short term, we just iteratively switch more and more things to use TextureClient instead of less suitable or less generic surface abstractions (at least SurfaceDescriptor and probably SharedSurface), extending and fixing TextureClient along the way as needed.
Eventually, TextureClient will be "MozSurface" (or whatever it will be called then).
*** Figure 4: The plan for MozSurface *** Land PTexture 897452 | Refcount ISurfaceAllocator | Split it away from ClientLayerManager | Allow checking if connection still up V 933082 Do not invoke / RemoveTextureClient / manually / 926745 / / | / | / | / | / | / V V Replace all bad uses of <------ Finish port to SurfaceDescriptor new textures by TexureClient 893300 941389 | V Replace SharedSurface by TextureClient 941390 | | | | | V | | Keep iterating on other | existing "surface"-like | abstractions as needed V Use TextureClient also in main-thread-compositing <--- Merge TextureHost use cases (at least GetTextureSource) (if there still are any) into TextureClient call that "MozSurface"
Developing a standard MozStream abstraction
The discussion below is about a separate abstraction. MozSurface and MozStream are two separate discussions, although MozStream would use MozSurface.
As explained earlier, having a good MozSurface abstraction is only half the way towards a good architecture. The other half is to share as much surface-handling code as possible in a common abstraction, and the standard type of abstraction for doing this is that of a "stream", as described in the above-mentioned EGLStream specification, http://www.khronos.org/registry/egl/extensions/KHR/EGL_KHR_stream.txt.
As we already said, we even have an implementation of a fairly standard stream, SurfaceStream, and are already using it for <canvas>.
The question is how much more of our surface-passing code could we unify behind a shared stream mechanism, maybe by suitably extending SurfaceStream's capabilities?
Is there anything that we need to do, that fundamentally cannot be unified behind such a stream abstraction?
The problem with typical streams is that they do not allow for consumers to be on multiple processes, and we do have use cases where we need to consume a surface on two different processes:
- 1. ThebesLayers need to get back the front buffer to do partial updates;
- 2. Drawing video frames into a canvas. It also seems that WEBGL_dynamic_texture would hit the same problem.
- 3. More importantly we have different plans for some of the <video> use cases which are based on a stream abstraction that is not a swap chain.
- a. We currently do some screenshotting on the content side.
These use cases fall out of the scope of standard streams because they need a surface to be consumed by both compositor and content.
During the graphics sessions however, Jeff G and Dan proposed a solution to some of these problems.
The reason why typical streams don't like the idea of a surface being consumed on two different processes, is that typical streams want to own the surfaces that they pass around. Having multiple processes hold references to the same surface makes that impossible (or would require one process to wait for the other to be done with that surface).
The idea proposed by Jeff G and Dan was to have a stream that wouldn't own surfaces; instead, surfaces would be reference-counted, and multiple consumers could simply hold references to a surface if e.g. they want a screenshot.
*** Figure 5. JeffG and Dan's magic swap chain that allows *** *** consumers on multiple processes *** Producer on content process | | draws into V +---------+ | frame N | +---------+ | | producer is about to call | presentFrame() to insert the frame N | into the swap chain V The swap chain +---------------+ | | | +-----------+ | "Working" frames | | frame N-1 | | These frames (here, there is one frame) are still being | +-----------+ | asynchronously rendered (e.g. by the GL) | | | | | +-------|-------+ | V | | | | +-----------+ | | | frame N-2 | | "Done" frame, ready to be consumed | +-----------+ | | | +---------------+ getFrontFrame() is / \ getFrontFrame() is called again, about to pull / \ might get another reference to frame N-2 from / \ the same frame the swap chain / \ / \ V V Consumer #1 on Consumer #2 compositor process on another process gets a reference might also get a reference to frame N-2 to frame N-2; might also still hold references to older frames...
If the surfaces are being allocated from a "surface pool", then they aren't returned to the pool as long as anything holds a reference to them. Thus, e.g. a "screenshotting" consumer can hold screenshots alive as long as it wants without blocking the swap chain --- as long as the surface pool doesn't run out.
Another complicated use case to take into account is that Gecko's media framework does not use the notion of swap chain. Media produces frames that are, if possible, already in shared memory, and present the same frames (so the same underlying MozSurfaces) to several swap chains, without any form of restriction: MozSurfaces can be passed to any number of swap chains in any order, maybe at the same time, maybe not.
*** Figure 6. JeffG and Dan's magic swap chain that allows *** *** a frame to be in multiple swap chains *** Producer on content process | | draws into V +---------+ | frame N | +---------+ | \ | \ | \ | \ V V The swap chain Another Swap chain +---------------+ +---------------+ | | | | | +-----------+ | | +-----------+ | | | frame N-1 | | | | frame N-1 | | | +-----------+ | | +-----------+ | | | | | | | | | | | +-------|-------+ +-------|-------+ | V | | V | | | | | | +-----------+ | | +-----------+ | | | frame N-2 | | | | frame N-2 | | | +-----------+ | | +-----------+ | | | | | +---------------+ +---------------+ getFrontFrame() is / \ \ about to pull / \ \ frame N-2 from / \ \ the swap chain / \ \ / \ \ V V V Consumer #1 on Consumer #2 Consumer #3 compositor process on another process gets a reference might also get a reference to frame N-2 to frame N-2; might also still hold references to older frames...
Note that Video compositing is a bit particular because we want to queue frames on the compositor side with timestamps, so that AV synchronization can be done at presentation time. Currently AV sync is done on the content side and the compositor just consumes the last frame that was shared over IPC, so we suffer from the latency that comes with IPC.
In the figure below, we use the generic term "stream" rather than swap chain because the implementation of the stream may not require all the features of the swap chains described above, and already differs by the need to queue frames in advance on the compositor side with presentation times.
*** Figure 7. Video compositing with timestamped frames *** Producer | V +-----------+ | frame N+1 | +-----------+ / \ / \ The frame is put into several / \ streams with different presentation / \ times. V V Stream 1 Stream2 +------------------+ +------------------+ | +--------------+ | +-----------+ | +--------------+ | | | timestamp T1 |-+--->| Frame N |<--+-| timestamp T3 | | | +--------------+ | +-----------+ | +--------------+ | | | | | | +--------------+ | +-----------+ | +--------------+ | | | timestamp T1 |-+--->| Frame N-1 |<--+-| timestamp T3 | | | +--------------+ | +-----------+ | +--------------+ | | | | | | +--------------+ | +-----------+ | | | | timestamp T1 |-+--->| Frame N-2 | | | | +--------------+ | +-----------+ | | | | | | +------------------+ +------------------+
The need for being able to let a MozSurface be used by several swap chains comes from two things: - A) we want several layers to be able to use the same MozSurface and right now each layer has its own equivalent of a swap chain - B) the media framework may pass MozSurfaces to any different layers without any restriction.
B) is a stronger constraint but it is limited to video, which may have a separate implementation. A) could be solved by either letting the same MozSurface be in several swap chains, or making it possible for several layers to use the same swap chain (which probably makes more sense, but requires us to think about how we expose this functionality).
Some of the video use cases are going to need a separate implementation than the MozStream swap chain. However these two streams could share some components.
- Client side buffer pool. We need faster surface allocation, especially with gralloc surfaces that can only be allocated on the compositor process. This can be helped with keeping surface pools to avoid the cost of actual allocation, and both video streams and MozStream can benefit from this.
- Both video and MozStream should be implemented on top of MozSurface.