Platform/GFX/DeferredGPUs: Difference between revisions

Replaced content with "This page moved to MobileGPUs as we realized that it was more useful to have a page for all mobile GPUs."
(Replaced content with "This page moved to MobileGPUs as we realized that it was more useful to have a page for all mobile GPUs.")
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
Everybody's heard stories about Mobile GPUs being different from desktop GPUs in that they do "deferred" rendering instead of "immediate" rendering.
This page moved to [[Platform/GFX/MobileGPUs|MobileGPUs]] as we realized that it was more useful to have a page for all mobile GPUs.
 
Do we actually know what this means? What are the implications for the performance of our mobile gfx code? What should we change?
 
= Gathering raw documentation from the source =
 
There doesn't seem to exist a good source of information on "deferred" GPUs in general. Worse, "deferred" means different things on different GPUs with different performance implications.
 
So for lack of something better, let's start with the only information available: documentation from GPU vendors, that's going to be biased for their own GPUs. Once we understand that, we'll hopefully be able to aggregate that into a big vendor-neutral picture with only well-identified vendor-specific parts.
 
== ARM (Mali) ==
 
[http://malideveloper.arm.com/documentation/ Mali Developer Center]
 
[http://infocenter.arm.com/help/topic/com.arm.doc.dui0555a/DUI0555A_mali_optimization_guide.pdf Mali GPU Application Optimization Guide (2011)]
 
== Qualcomm (Adreno) ==
 
[https://developer.qualcomm.com/download/adreno200performanceoptimizationopenglestipsandtricksmarch10.pdf Adreno 200 Performance Optimization (2010)]
 
== NVIDIA (Tegra) ==
 
Tegras are the only major mobile GPUs that are immediate, like desktop GPUs --- and not deferred like other mobile GPUs.
 
[http://www.nvidia.ca/object/white-papers.html NVIDIA White Papers]
 
[http://www.nvidia.ca/docs/IO/116757/Tegra_4_GPU_Whitepaper_FINALv2.pdf NVIDIA Tegra 4 Family GPU Architecture]
 
== Imagination Technologies (PowerVR) ==
 
[http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf POWERVR Series5 Graphics]
 
[http://www.imgtec.com/powervr/insider/powervr_presentations/GDC%20HardwareAndOptimisation.pdf PowerVR -- A Master Class in Graphics Technology and Optimization (2012)]
 
More documentations seems to exist at their "PowerVR insider" site but there appears to be a paywall there.
 
== Intel ==
 
Intel currently just licenses PowerVR. They seem to be preparing different hardware for the future, but for now we only need to care about PowerVR-based Intel.
 
== Other GPU vendors without known public documentation ==
 
[http://www.broadcom.com/products/technology/mobmm_videocore.php Broadcom VideoCore]
 
[http://www.vivantecorp.com/index.php/en/technology/3d Vivante]
 
= What does "deferred" actually mean in various GPUs? =
 
The best document that I could find on this is [http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf POWERVR Series5 Graphics], section 3. However, we won't use exactly the terminology of this document because it reserves the word "deferred" solely for PowerVR's version of "deferred".
 
In our terminology here, are 3 types of GPUs: immediate, tile-based deferred rasterization, and tile-based deferred HSR, where '''HSR stands for Hidden Surface Removal'''.
 
We will abbreviate "tile-based deferred rasterization" as '''tbd-rast''' and "Tile-based deferred HSR" as '''tbd-hsr'''.
 
Here's a table summarizing how this maps to various GPU vendors' terminology, what GPUs fall into which category, and what each term actually means.
 
{|class="wikitable"
!rowspan="2"|Our terminology
!rowspan="2"|Immediate
!colspan="2" style="text-align: center" |Deferred
|-
|'''Tile-based deferred rasterization''', abbreviated as '''tbd-rast'''
|'''Tile-based deferred HSR''', abbreviated as '''tbd-hsr'''
|-
![http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf ImgTec terminology]
|Immediate rendering
|Tile-based rendering (TBR)
|Tile-based deferred rendering (TBDR)
|-
![http://infocenter.arm.com/help/topic/com.arm.doc.dui0555a/DUI0555A_mali_optimization_guide.pdf ARM terminology]
|Immediate rendering
| scope="row" colspan="2" style="text-align: center" |Interchangeably "tile-based rendering" or "tile-based deferred rendering"
|-
!Hardware
|NVIDIA Tegra, desktops
|ARM Mali, Qualcomm Adreno
|ImgTec PowerVR
|-
!Meaning
|Submitted geometry is immediately rendered; no tiling is used.
|Submitted geometry is immediately transformed and stored in per-tile lists. Rasterization is then done separately for each tile.
|Submitted geometry is immediately transformed and stored in per-tile lists. HSR is then done for each tile, yielding a list of visible fragments.
|-
!Performance implications
|Good old desktop GPU optimization
|Optimizations discussed below for deferred GPUs
|Optimizations discussed below for deferred GPUs; the only difference is that there is no need for front-to-back sorting, as HSR is efficiently handled by hardware.
|}
 
= Performance implications of "deferred" =
 
== Draw-calls are very expensive, can only do 50 per frame to get 60 FPS ==
 
...At least on ARM Mali, as ARM said in a session at GDC. They said the reason is that this GPU does "deferred rasterization" and somehow this makes each draw-call very expensive. Need to read ARM documentation carefully to make sense of this and understand to what extent that applies to other GPUs.
 
=== Corollary: we should batch draw-calls. That would mean that we group textures into bigger textures. That would be done by glTexSubImage2D. The idea was suggested by ARM people, so it's not crazy.
 
== Multiple passes with FBOs introduce stalls ==
 
Traditional GPU wisdom says that glReadPixels is evil because it introduce stalls. Deferred GPU wisdom says that any multi-pass rendering using a FBO as an intermediate surface also introduces stalls, because it introduces a barrier in how much rendering can be deferred. On the other hand, MRTs (multiple render targets) are said to be deferred-friendly.
 
= How bad are we currently? =
 
I mean throughout our gfx/layers code?
 
= What can we do to be better? =
Confirmed users
753

edits