|
|
| (27 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| Everybody's heard stories about Mobile GPUs being different from desktop GPUs in that they do "deferred" rendering instead of "immediate" rendering.
| | This page moved to [[Platform/GFX/MobileGPUs|MobileGPUs]] as we realized that it was more useful to have a page for all mobile GPUs. |
| | |
| Do we actually know what this means? What are the implications for the performance of our mobile gfx code? What should we change?
| |
| | |
| = Gathering raw documentation from the source =
| |
| | |
| There doesn't seem to exist a good source of information on "deferred" GPUs in general. Worse, "deferred" means different things on different GPUs with different performance implications.
| |
| | |
| So for lack of something better, let's start with the only information available: documentation from GPU vendors, that's going to be biased for their own GPUs. Once we understand that, we'll hopefully be able to aggregate that into a big vendor-neutral picture with only well-identified vendor-specific parts.
| |
| | |
| == ARM (Mali) ==
| |
| | |
| == Intel (not clear what their mobile GPUs are? licensed PowerVR?) ==
| |
| | |
| == Qualcomm (Adreno) ==
| |
| | |
| == NVIDIA (Tegra) ==
| |
| | |
| == Imagination Technologies (PowerVR) ==
| |
| | |
| = What does "deferred" actually mean in various GPUs? =
| |
| | |
| = Performance implications of "deferred" =
| |
| | |
| == Draw-calls are very expensive, can only do 50 per frame to get 60 FPS ==
| |
| | |
| ...At least on ARM Mali, as ARM said in a session at GDC. They said the reason is that this GPU does "deferred rasterization" and somehow this makes each draw-call very expensive. Need to read ARM documentation carefully to make sense of this and understand to what extent that applies to other GPUs.
| |
| | |
| === Corollary: we should batch draw-calls. That would mean that we group textures into bigger textures. That would be done by glTexSubImage2D. The idea was suggested by ARM people, so it's not crazy.
| |
| | |
| == Multiple passes with FBOs introduce stalls ==
| |
| | |
| Traditional GPU wisdom says that glReadPixels is evil because it introduce stalls. Deferred GPU wisdom says that any multi-pass rendering using a FBO as an intermediate surface also introduces stalls, because it introduces a barrier in how much rendering can be deferred. On the other hand, MRTs (multiple render targets) are said to be deferred-friendly.
| |
| | |
| = How bad are we currently? =
| |
| | |
| I mean throughout our gfx/layers code?
| |
| | |
| = What can we do to be better? =
| |