Friday, September 11, 2009

DiagSplit: Parallel, Crack-Free, Adaptive Tessellation for Micropolygon Rendering

As hoped, we just kicked another micropolygon-related paper out the door. We've been working on a parallel algorithm for generating micropolygons via adaptive tessellation for the past year and the result of this study is an algorithm that we've called DiagSplit.

DiagSplit is an implementation of Split-Dice with two interesting modifications. First, instead of what many consider to be the "traditional Reyes" dicer that generates tensor-product UV grids of quadrilateral micropolygons, DiagSplit's dicing step is the D3D11 Tessellation stage. Thus, it produces slightly irregular meshes as output. Second, to get everything to work without creating cracks, the splitting process must sometimes split subpatches along non-isoparametric directions. In other words, the algorithm sometimes makes diagonal splits in parametric space (hence the name DiagSplit).

DiagSplit is intended for tight integration with the real-time graphics pipeline. In the short term, an implementation might do all the splitting on the CPU or within a compute shader, then ship diceable subpatches (not final triangles) over to the graphics pipeline for all the heavily lifting of dicing and surface evaluation. We've really designed DiagSplit for even tighter integration with future graphics pipelines and can imagine the entire adaptive splitting process being implemented in the pipeline itself with only a few extensions to D3D11. For those interested in an early read, the final draft of the paper, which will appear in SIGGRAPH Asia 2009, has been placed online here.

Paper Abstract:

We present DiagSplit, a parallel algorithm for adaptively tessellating displaced parametric surfaces into high-quality, crack-free micropolygon meshes. DiagSplit modifies the split-dice tessellation algorithm to allow splits along non-isoparametric directions in the surface's parametric domain, and uses a dicing scheme that supports unique tessellation factors for each subpatch edge. Edge tessellation factors are computed using only information local to subpatch edges. These modifications allow all subpatches generated by DiagSplit to be processed independently without introducing T-junctions or mesh cracks and without incurring the tessellation overhead of binary dicing. We demonstrate that DiagSplit produces output that is better (in terms of image quality and number of micropolygons produced) than existing parallel tessellation schemes, and as good as highly adaptive split-dice implementations that are less amenable to parallelization.

Friday, July 24, 2009

HPG09 submission: Data-parallel Rasterization of Micropolygons with Defocus and Motion Blur

For those interested, I've placed our HPG09 paper, Data-parallel Rasterization of Micropolygons with Defocus and Motion Blur, online on the Stanford Graphics Lab pages. It was surprisingly how tricky this problem can be, and, as it's clear from the paper, there's still room for improvement in this area. Look for more micropolygon-related papers to come (we hope).

One of the major research goals at Stanford right now is the design of a real time micropolygon rendering pipeline. There's a lot of recent and interesting work out there on implementing REYES-like algorithms on existing GPUs (see the RenderAnts folks, Anjul Patney's tessellation work, and NVIDIA's upcoming tech demos at SIGGRAPH). Our interest is not necessarily in implementing REYES; there are a lot of merits to the existing graphics pipeline. Rather, we're trying to determine how a real-time graphics pipeline, such as D3D11, (as well as corresponding future GPU architectures), should evolve to efficiently accommodate micropolygon workloads. At the Beyond Programmable Shading II course at SIGGRAPH 2009, I'll be getting the chance to talk a bit about what those pipeline changes might be, and what we (and the rest of the field) have learned about building an efficient real-time micropolygon rendering pipeline. Also, in the morning session of the course I will give an extended version of last year's GPU architecture talk: From Shader Code to a Teraflop: How a GPU Core Works.

HPG09 Paper abstract: Current GPUs rasterize micropolygons (polygons approximately one pixel in size) inefficiently. We design and analyze the costs of three alternative data-parallel algorithms for rasterizing micropolygon workloads for the real-time domain. First, we demonstrate that efficient micropolygon rasterization requires parallelism across many polygons, not just within a single polygon. Second, we produce a data-parallel implementation of an existing stochastic rasterization algorithm by Pixar, which is able to produce motion blur and depth-of-field effects. Third, we provide an algorithm that leverages interleaved sampling for motion blur and camera defocus. This algorithm outperforms Pixar's algorithm when rendering objects undergoing moderate defocus or high motion and has the added benefit of predictable performance.