Implementing DACS in Hardware
High-Level Motivation and Context
At HPG 2018, we introduced Deferred Adaptive Compute Shading (DACS), an adaptive undersampling method that provided excellent speedups for only minimal image quality loss.
DACS involves defining a new shading order for pixels, but shading orders are really something that ought to be managed by hardware—a fact underscored by subsequent changes to GPU drivers since 2018 making the DACS algorithm (at least as it was originally written) no longer a performance win. Therefore, DACS itself should be implemented in hardware, so that its performance is stable and overhead can be minimized.
However, it is not obvious how to do this. Therefore, we present this paper, in which we analyze a G-buffer swizzle pattern, an on-chip irregular tiling scheme, and useful extensions, to show how a hardware implementation of DACS might be achieved. We also compare against NVIDIA Variable-Rate Shading (VRS, released literally three days after DACS) and show that DACS is still a significant quality win.
Efficient Adaptive Deferred Shading
with Hardware Scatter Tiles
Ian Mallett , Cem Yuksel , Larry Seiler
HPG '20 Proceedings of High Performance Graphics, 2020
(Wolfgang Straßer Best Paper Award)
Irregularly shaped scatter tile used for efficiently passing data between thread groups. See also Figure 3 in the paper.
Abstract
Adaptive shading is an effective mechanism for reducing the number of shaded pixels to a subset of the image resolution with minimal impact on final rendering quality. We present a new scheduling method based on on-chip tiles that, along with relatively minor modifications to the GPU architecture, provides efficient hardware support. As compared to software implementations on current hardware using compute shaders, our approach dramatically reduces memory bandwidth requirements, thereby significantly improving performance and energy use. We also introduce the concept of a fragment pre-shader for programmatically controlling when a fragment shader is invoked, and describe advanced techniques for utilizing our approach to further reduce the number of shaded pixels via temporal filtering, or to adjust rendering quality to maintain stable framerates.