|
|
| (17 intermediate revisions by the same user not shown) |
| Line 1: |
Line 1: |
| Everybody's heard stories about Mobile GPUs being different from desktop GPUs in that they do "deferred" rendering instead of "immediate" rendering.
| | This page moved to [[Platform/GFX/MobileGPUs|MobileGPUs]] as we realized that it was more useful to have a page for all mobile GPUs. |
| | |
| Do we actually know what this means? What are the implications for the performance of our mobile gfx code? What should we change?
| |
| | |
| = Gathering raw documentation from the source =
| |
| | |
| There doesn't seem to exist a good source of information on "deferred" GPUs in general. Worse, "deferred" means different things on different GPUs with different performance implications.
| |
| | |
| So for lack of something better, let's start with the only information available: documentation from GPU vendors, that's going to be biased for their own GPUs. Once we understand that, we'll hopefully be able to aggregate that into a big vendor-neutral picture with only well-identified vendor-specific parts.
| |
| | |
| == ARM (Mali) ==
| |
| | |
| [http://malideveloper.arm.com/documentation/ Mali Developer Center] | |
| | |
| [http://infocenter.arm.com/help/topic/com.arm.doc.dui0555a/DUI0555A_mali_optimization_guide.pdf Mali GPU Application Optimization Guide (2011)] | |
| | |
| == Qualcomm (Adreno) ==
| |
| | |
| [https://developer.qualcomm.com/download/adreno200performanceoptimizationopenglestipsandtricksmarch10.pdf Adreno 200 Performance Optimization (2010)]
| |
| | |
| == NVIDIA (Tegra) ==
| |
| | |
| Tegras are the only major mobile GPUs that are immediate, like desktop GPUs --- and not deferred like other mobile GPUs.
| |
| | |
| [http://www.nvidia.ca/object/white-papers.html NVIDIA White Papers]
| |
| | |
| [http://www.nvidia.ca/docs/IO/116757/Tegra_4_GPU_Whitepaper_FINALv2.pdf NVIDIA Tegra 4 Family GPU Architecture]
| |
| | |
| == Imagination Technologies (PowerVR) ==
| |
| | |
| [http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf POWERVR Series5 Graphics]
| |
| | |
| [http://www.imgtec.com/powervr/insider/powervr_presentations/GDC%20HardwareAndOptimisation.pdf PowerVR -- A Master Class in Graphics Technology and Optimization (2012)]
| |
| | |
| More documentations seems to exist at their "PowerVR insider" site but there appears to be a paywall there.
| |
| | |
| == Intel ==
| |
| | |
| Intel currently just licenses PowerVR. They seem to be preparing different hardware for the future, but for now we only need to care about PowerVR-based Intel.
| |
| | |
| == Other GPU vendors without known public documentation ==
| |
| | |
| [http://www.broadcom.com/products/technology/mobmm_videocore.php Broadcom VideoCore]
| |
| | |
| [http://www.vivantecorp.com/index.php/en/technology/3d Vivante]
| |
| | |
| = What does "deferred" actually mean in various GPUs? =
| |
| | |
| The best document that I could find on this is [http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf POWERVR Series5 Graphics], section 3. However, we won't use exactly the terminology of this document because it reserves the word "deferred" solely for PowerVR's version of "deferred".
| |
| | |
| In our terminology here, are 3 types of GPUs: immediate, tile-based deferred rasterization, and tile-based deferred HSR, where '''HSR stands for Hidden Surface Removal'''.
| |
| | |
| We will abbreviate "tile-based deferred rasterization" as '''tbd-rast''' and "Tile-based deferred HSR" as '''tbd-hsr'''.
| |
| | |
| Here's a table summarizing how this maps to various GPU vendors' terminology, what GPUs fall into which category, and what each term actually means.
| |
| | |
| {|class="wikitable"
| |
| !rowspan="2"|Our terminology
| |
| |rowspan="2"|Immediate
| |
| |colspan="2" style="text-align: center" |Deferred
| |
| |-
| |
| |Tile-based deferred rasterization, abbreviated as '''tbd-rast'''
| |
| |Tile-based deferred HSR, abbreviated as '''tbd-hsr'''
| |
| |-
| |
| ![http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf ImgTec terminology]
| |
| |Immediate rendering
| |
| |Tile-based rendering (TBR)
| |
| |Tile-based deferred rendering (TBDR)
| |
| |-
| |
| ![http://infocenter.arm.com/help/topic/com.arm.doc.dui0555a/DUI0555A_mali_optimization_guide.pdf ARM terminology]
| |
| |Immediate rendering
| |
| | scope="row" colspan="2" style="text-align: center" |Interchangeably "tile-based rendering" or "tile-based deferred rendering"
| |
| |-
| |
| !Hardware
| |
| |NVIDIA Tegra, desktops
| |
| |ARM Mali, Qualcomm Adreno
| |
| |ImgTec PowerVR
| |
| |-
| |
| !Meaning
| |
| |Submitted geometry is immediately rendered; no tiling is used.
| |
| |Submitted geometry is immediately transformed and stored in per-tile lists. Rasterization is then done separately for each tile.
| |
| |Submitted geometry is immediately transformed and stored in per-tile lists. HSR is then done for each tile, yielding a list of visible fragments.
| |
| |-
| |
| |Performance implications
| |
| |Good old desktop GPU optimization
| |
| |Optimizations discussed below for deferred GPUs
| |
| |Optimizations discussed below for deferred GPUs, plus there is no need for front-to-back sorting, as HSR is efficiently handled by hardware.
| |
| |}
| |
| | |
| = Performance implications of "deferred" =
| |
| | |
| == Draw-calls are very expensive, can only do 50 per frame to get 60 FPS ==
| |
| | |
| ...At least on ARM Mali, as ARM said in a session at GDC. They said the reason is that this GPU does "deferred rasterization" and somehow this makes each draw-call very expensive. Need to read ARM documentation carefully to make sense of this and understand to what extent that applies to other GPUs.
| |
| | |
| === Corollary: we should batch draw-calls. That would mean that we group textures into bigger textures. That would be done by glTexSubImage2D. The idea was suggested by ARM people, so it's not crazy.
| |
| | |
| == Multiple passes with FBOs introduce stalls ==
| |
| | |
| Traditional GPU wisdom says that glReadPixels is evil because it introduce stalls. Deferred GPU wisdom says that any multi-pass rendering using a FBO as an intermediate surface also introduces stalls, because it introduces a barrier in how much rendering can be deferred. On the other hand, MRTs (multiple render targets) are said to be deferred-friendly.
| |
| | |
| = How bad are we currently? =
| |
| | |
| I mean throughout our gfx/layers code?
| |
| | |
| = What can we do to be better? =
| |