Platform/GFX/DeferredGPUs: Difference between revisions

Jump to navigation Jump to search
Line 88: Line 88:
|}
|}


In a '''tbd-rast''' GPU, upon submitting geometry, vertex shaders are run and resulting triangles are clipped, but instead of proceeding further down the pipeline as an immediate renderer would, the resulting triangles are only recorded in tile-specific triangle lists. The actual rasterization of the triangles in each tile is delayed until the frame needs to be resolved, whence the name: ''tile-based deferred rasterization''. Deferring rasterization until all the triangles in a given tile are known, allows tbd-rast GPUs to achieve higher efficiency, if only through higher cache coherency of framebuffer accesses --- in practice, the tile size is small enough that the framebuffer tile will fit in cache memory, considerably limiting framebuffer memory bandwidth. There probably are more gains too, although they will depend on GPU specifics. For example, deferred rendering may allow GPUs to sort primitives by textures, achieving higher texture cache coherency.
In a '''tbd-rast''' GPU, upon submitting geometry, vertex shaders are run and resulting triangles are clipped, but instead of proceeding further down the pipeline as an immediate renderer would, the resulting triangles are only recorded in tile-specific triangle lists. The actual rasterization of the triangles in each tile is delayed until the frame needs to be resolved, whence the name: ''tile-based deferred rasterization''. Deferring rasterization until all the triangles in a given tile are known, allows '''tbd-rast''' GPUs to achieve higher efficiency over immediate GPUs, if only through higher cache coherency of framebuffer accesses --- in practice, the tile size is small enough that the framebuffer tile will fit in cache memory, considerably limiting framebuffer memory bandwidth. There probably are more gains too, although they will depend on GPU specifics. For example, deferred rendering may allow GPUs to sort primitives by textures, achieving higher texture cache coherency.


All the same applies to '''tbd-hsr''' GPUs such as PowerVR's, which are similar to '''tbd-rast''' GPUs except for an additional optimization they they automatically perform: when a '''tbd-hsr''' GPU is about to start rasterizing the triangles in a given tile, it first identifies for each fragment which primitives may be visible at that fragment: see Section 4.4 in [http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf this PowerVR document]. What this means in practice is that a '''tbd-hsr''' GPU will be equally efficient regardless of the ordering of opaque primitives, whereas other types of GPUs will perform better if opaque geometry is submitted in front-to-back order.
All of that also applies to '''tbd-hsr''' GPUs such as PowerVR's, which are similar to '''tbd-rast''' GPUs except for an additional optimization that they automatically perform: when a '''tbd-hsr''' GPU is about to start rasterizing the triangles in a given tile, it first identifies for each fragment which primitives may be visible at that fragment: see Section 4.4 in [http://www.imgtec.com/powervr/insider/docs/POWERVR%20Series5%20Graphics.SGX%20architecture%20guide%20for%20developers.1.0.8.External.pdf this PowerVR document]. What this means in practice is that a '''tbd-hsr''' GPU will be equally efficient regardless of the ordering of opaque primitives, whereas other types of GPUs will perform better if opaque geometry is submitted in front-to-back order.


= Performance implications of "deferred" =
= Performance implications of "deferred" =
Confirmed users
753

edits

Navigation menu