Platform/GFX/DeferredGPUs

From MozillaWiki
< Platform‎ | GFX
Revision as of 23:02, 28 March 2013 by Bjacob (talk | contribs) (Created page with "Everybody's heard stories about Mobile GPUs being different from desktop GPUs in that they do "deferred" rendering instead of "immediate" rendering. Do we actually know what ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Everybody's heard stories about Mobile GPUs being different from desktop GPUs in that they do "deferred" rendering instead of "immediate" rendering.

Do we actually know what this means? What are the implications for the performance of our mobile gfx code? What should we change?

Gathering raw documentation from the source

There doesn't seem to exist a good source of information on "deferred" GPUs in general. Worse, "deferred" means different things on different GPUs with different performance implications.

So for lack of something better, let's start with the only information available: documentation from GPU vendors, that's going to be biased for their own GPUs. Once we understand that, we'll hopefully be able to aggregate that into a big vendor-neutral picture with only well-identified vendor-specific parts.

ARM (Mali)

Intel (not clear what their mobile GPUs are? licensed PowerVR?)

Qualcomm (Adreno)

NVIDIA (Tegra)

Imagination Technologies (PowerVR)

What does "deferred" actually mean in various GPUs?

Performance implications of "deferred"

Draw-calls are very expensive, can only do 50 per frame to get 60 FPS

...At least on ARM Mali, as ARM said in a session at GDC. They said the reason is that this GPU does "deferred rasterization" and somehow this makes each draw-call very expensive. Need to read ARM documentation carefully to make sense of this and understand to what extent that applies to other GPUs.

=== Corollary: we should batch draw-calls. That would mean that we group textures into bigger textures. That would be done by glTexSubImage2D. The idea was suggested by ARM people, so it's not crazy.

Multiple passes with FBOs introduce stalls

Traditional GPU wisdom says that glReadPixels is evil because it introduce stalls. Deferred GPU wisdom says that any multi-pass rendering using a FBO as an intermediate surface also introduces stalls, because it introduces a barrier in how much rendering can be deferred. On the other hand, MRTs (multiple render targets) are said to be deferred-friendly.

How bad are we currently?

I mean throughout our gfx/layers code?

What can we do to be better?