Frequency requirement for shading Adaptive Image Space Shading for Motion and Defocus Blur
Karthik Vaidynathan, Robert Toth, Marco Salvi, Solomon Boulos, Aaron Lefohn
In Proceedings of High Performance Graphics 2012

We present a novel anisotropic sampling algorithm for image space shading which builds upon recent advance- ments in decoupled sampling for stochastic rasterization pipelines. First, we analyze the frequency content of a pixel in the presence of motion and defocus blur. We use this analysis to derive bounds for the spectrum of a surface defined over a two-dimensional and motion-aligned shading space. Second, we present a simple algorithm that uses the new frequency bounds to reduce the number of shaded quads and the size of decoupling cache respectively by 2X and 16X, while largely preserving image detail and minimizing additional aliasing.

tz-pyramid Space-Time Hierarchical Occlusion Culling for Micropolygon Rendering with Motion Blur
Solomon Boulos, Edward Luong, Kayvon Fatahalian, Henry Moreton and Pat Hanrahan
In Proceedings of High Performance Graphics 2010

Occlusion culling using a traditional hierarchical depth buffer, or z-pyramid, is less effective when rendering with motion blur. We present a new data structure, the tz-pyramid, that extends the traditional z-pyramid to represent scene depth values in time. This temporal information improves culling efficacy when rendering with motion blur. The tz-pyramid allows occlusion culling to adapt to the amount of scene motion, providing a balance of high efficacy with large motion and low cost in terms of depth comparisons when motion is small. Compared to a traditional z-pyramid, using the tz-pyramid for occlusion culling reduces the number of micropolygons shaded by up to 3.5x. In addition to better culling, the tz-pyramid reduces the number of depth comparisons by up to 1.4x.

Quad-Fragment Merging Reducing Shading on GPUs using Quad-Fragment Merging
Kayvon Fatahalian, Solomon Boulos, James Hegarty, Kurt Akeley, William R. Mark, Henry Moreton and Pat Hanrahan
In Proceedings of SIGGRAPH 2010
Video (Quicktime, 300MB)

Current GPUs perform a significant amount of redundant shading when surfaces are tessellated into small triangles. We address this inefficiency by augmenting the GPU pipeline to gather and merge rasterized fragments from adjacent triangles in a mesh. This approach has minimal impact on output image quality, is amenable to implementation in fixed-function hardware, and, when rendering pixel-sized triangles, requires only a small amount of buffering to reduce overall pipeline shading work by a factor of eight. We find that a fragment-shading pipeline with this optimization is competitive with the REYES pipeline approach of shading at micropolygon vertices and, in cases of complex occlusion, can perform up to two times less shading work.

DiagSplit DiagSplit: Parallel, Crack-Free, Adaptive Tessellation for Micropolygon Rendering
Matthew Fisher, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, William R. Mark and Pat Hanrahan
In Proceedings of SIGGRAPH Asia 2009
Video (Quicktime, 61MB)

We present DiagSplit, a parallel algorithm for adaptively tessellating displaced parametric surfaces into high-quality, crack-free micropolygon meshes. DiagSplit modifies the split-dice tessellation algorithm to allow splits along non-isoparametric directions in the surface's parametric domain, and uses a dicing scheme that supports unique tessellation factors for each subpatch edge. Edge tessellation factors are computed using only information local to subpatch edges. These modifications allow all subpatches generated by DiagSplit to be processed independently without introducing T-junctions or mesh cracks and without incurring the tessellation overhead of binary dicing. We demonstrate that DiagSplit produces output that is better (in terms of image quality and number of micropolygons produced) than existing parallel tessellation schemes, and as good as highly adaptive split-dice implementations that are less amenable to parallelization.

Micropolygon Rasterization Data-Parallel Rasterization of Micropolygons with Defocus and Motion Blur
Kayvon Fatahalian, Edward Luong, Solomon Boulos, Kurt Akeley, William R. Mark and Pat Hanrahan
In Proceedings of High Performance Graphics 2009
Video (Quicktime, 50MB)

Current GPUs rasterize micropolygons (polygons approximately one pixel in size) inefficiently. We design and analyze the costs of three alternative data-parallel algorithms for rasterizing micropolygon workloads for the real-time domain. First, we demonstrate that efficient micropolygon rasterization requires parallelism across many polygons, not just within a single polygon. Second, we produce a data-parallel implementation of an existing stochastic rasterization algorithm by Pixar, which is able to produce motion blur and depth-of-field effects. Third, we provide an algorithm that leverages interleaved sampling for motion blur and camera defocus. This algorithm outperforms Pixar's algorithm when rendering objects undergoing moderate defocus or high motion and has the added benefit of predictable performance.

GRAMPS graphs GRAMPS: A Programming Model for Graphics Pipelines
Jeremy Sugerman, Kayvon Fatahalian, Solomon Boulos, Kurt Akeley, and Pat Hanrahan
ACM Transactions on Graphics, Volume 29, Issue 1, January 2009

We introduce GRAMPS, a programming model that generalizes concepts from modern real-time graphics pipelines by exposing a model of execution containing both fixed-function and application-programmable processing stages that exchange data via queues. GRAMPS allows the number, type, and connectivity of these processing stages to be defined by software, permitting arbitrary processing pipelines or even processing graphs. Applications achieve high performance using GRAMPS by expressing advanced rendering algorithms as custom pipelines, then using the pipeline as a rendering engine. We describe the design of GRAMPS, then evaluate it by implementing three pipelines—Direct3D, a ray tracer, and a hybridization of the two—and running them on emulations of two different GRAMPS implementations: a traditional GPU-like architecture; and a CPU-like multi-core architecture. In our tests, our GRAMPS schedulers run our pipelines with 500 to 1500 KB of queue usage at their peaks.

SIMD Box Test Speedup Adaptive Ray Packet Reordering
Solomon Boulos, Ingo Wald and Carsten Benthin
Proceedings of IEEE Symposium on Interactive Ray Tracing 2008
Slides (PDF)

Modern high-performance ray tracers use large ray packets and SIMD instruction sets to decrease both the computational and bandwidth cost compared to a single ray implementation. Current global illumination renderers, however, are still based around single ray implementations and interfaces. The presumption is that while packets have been shown to work well for highly coherent rays, in the presence of less coherent secondary ray distributions the gains of both packet and SIMD techniques dwindle rapidly. With low enough coherence, performance can be reduced to being as slow as reasonable single ray code -- if not worse -- so the benefit of packets for a global illumination system is assumed to be next to none. With SIMD width expanding in future architectures, leaving SIMD units underutilized means a massive loss in performance compared to the maximum performance achievable. In this paper, we present a method for recovering packet and SIMD coherence for incoherent secondary ray distributions through demand-driven reordering of rays into more coherent packets. We demonstrate that the reordering overhead is outweighed by the increased coherence within a prototypical implementation in the Manta realtime ray tracer among a wide variety of ray distributions, including diffuse path tracing.

N-ary BVH Getting Rid of Packets: Efficient SIMD Single-Ray Traversal using Multi-branching BVHs
Ingo Wald, Carsten Benthin and Solomon Boulos
Proceedings of IEEE Symposium on Interactive Ray Tracing 2008

While contemporary approaches to SIMD ray tracing typically rely on traversing packets of coherent rays through a binary data structure, we instead evaluate the alternative of traversing individual rays through a bounding volume hierarchy with a branching factor of 16.Though obviously less efficient than high-performance packet techniques for primary rays, we demonstrate that for less coherent secondary ray distributions this approach is at least competitive with (and often faster than) typical packet traversal techniques.

View from shading point for normal (a) and prefiltered (b) BVH Raytracing Prefiltered Occlusion for Aggregate Geometry
Dylan Lacewell, Brent Burley, Solomon Boulos and Peter Shirley
Proceedings of IEEE Symposium on Interactive Ray Tracing 2008

We prefilter occlusion of aggregate geometry, e.g., foliage or hair, storing local occlusion as a directional opacity in each node of a bounding volume hierarchy (BVH). During intersection, we terminate rays early at BVH nodes based on ray differential, and composite the stored opacities. This makes intersection cost independent of geometric complexity for rays with large differentials, and simultaneously reduces the variance of occlusion estimates. These two algorithmic improvements result in significant performance gains for soft shadows and ambient occlusion. The prefiltered opacity data depends only on geometry, not lights, and can be computed in linear time based on assumptions about the statistics of aggregate geometry.

oRGB Gamut Visualizations oRGB: A Practical Opponent Color Space for Computer Graphics
Margarita Bratkova, Solomon Boulos, and Peter Shirley
IEEE CG&A, Volume 29, Issue 1, 2009

We present a new color model, oRGB, that is based on opponent color theory. Like HSV, it is designed specifically for computer graphics. However, it is also designed to work well for computational applications such as color transfer, where HSV falters. Despite being geared towards computation, oRGB's natural axes facilitate HSV-style color selection and manipulation. oRGB also allows for new applications such as a quantitative cool-to-warm metric, intuitive color manipulations and variations, and simple gamut mapping. This new color model strikes a balance between simplicity and the computational qualities of color spaces such as CIE L*a*b*.

Note: The version here is a pre-print in the ACM SIGGRAPH format.
TRaX Core TRaX: A Multi-Threaded Architecture for Real-Time Ray Tracing
Josef Spjut, Daniel Kopta, Erik Brunvand, Solomon Boulos, and Spencer Kellis
Proceedings of the IEEE Symposium on Application Specific Processors, SASP 2008

Ray tracing is a technique used for generating highly realistic computer graphics images. In this paper, we explore the design of a simple but extremely parallel, multi-threaded, multi-core processor architecture that performs real-time ray tracing. Our architecture, called TRaX for Threaded Ray eXecution, consists of a set of thread states that include commonly used functional units for each thread and share large functional units through a programmable interconnect to maximize utilization. The memory system takes ad- vantage of the application's read-only access to the scene database and write-only access to the frame buffer output to provide efficient data delivery with a relatively simple structure. Preliminary results indicate that a multi-core version of the architecture running at a modest speed of 500 MHz already provides real-time ray traced images for scenes of a complexity found in video games. We also explore the architectural impact of a ray tracer that uses procedural (computed) textures rather than image-based (look-up) textures to trade computation for reduced memory bandwidth.
RTSL Velvet Sphere RTSL: a Ray Tracing Shading Language
Steven G. Parker, Solomon Boulos, James Bigler, and Austin Robinson.
Proceedings of IEEE Symposium on Interactive Ray Tracing 2007

We present a new domain-specific programming language suitable for extending both interactive and non-interactive ray tracing systems. This language, called ``ray tracing shading language'' (RTSL), builds on the GLSL language that is a part of the OpenGL specification and familiar to GPU programmers. This language allows a programmer to implement new cameras, primitives, textures, lights, and materials that can be used in multiple rendering systems. RTSL presents a single-ray interface that is easy to program for novice programmers. Through an advanced compiler, packet-based SIMD-optimized code can be generated that is performance competitive with hand-optimized code.
Pulli Edit Interactive Editing and Modeling of Bidirectional Texture Functions
Jan Kautz, Solomon Boulos, and Fredo Durand.
Proceedings of ACM SIGGRAPH 2007
Video (DivX 6) | Code

While measured Bidirectional Texture Functions (BTF) enable impressive realism in material appearance, they offer little control, which limits their use for content creation. In this work, we inter- actively manipulate BTFs and create new BTFs from flat textures. We present an out-of-core approach to manage the size of BTFs and introduce new editing operations that modify the appearance of a material. These tools achieve their full potential when selectively applied to subsets of the BTF through the use of new selection operators. We further analyze the use of our editing operators for the modification of important visual characteristics such as highlights, roughness, and fuzziness. Results compare favorably to the direct alteration of micro-geometry and reflectances of synthetic reference data.
Conference Scene (DRT) Packet-based Whitted and Distribution Ray Tracing
Solomon Boulos, David Edwards, J. Dylan Lacewell, Joe Kniss, Jan Kautz, Ingo Wald and Peter Shirley
Proceedings of Graphics Interface 2007
Presentation (PDF)

Much progress has been made toward interactive ray tracing, but most research has focused specifically on ray casting. A common approach is to use "packets" of rays to amortize cost across sets of rays. Whether "packets" can be used to speed up the cost of reflection and refraction rays is unclear. The issue is complicated since such rays do not share common origins and often have less directional coherence than viewing and shadow rays. Since the primary advantage of ray tracing over rasterization is the computation of global effects, such as accurate reflection and refraction, this lack of knowledge should be corrected. We are also interested in exploring whether distribution ray tracing, due to its stochastic properties, further erodes the effectiveness of techniques used to accelerate ray casting. This paper addresses the question of whether packet-based ray tracing algorithms can be effectively used for more than visibility computation. We show that by choosing an appropriate data structure and a suitable packet assembly algorithm we can extend the idea of "packets" from ray casting to Whitted-style and distribution ray tracing, while maintaining efficiency.

Note: a previous version of this work is contained in a technical report (see below).
Fairy Forest Ray Tracing Deformable Scenes using Dynamic Bounding Volume Hierarchies
Ingo Wald, Solomon Boulos and Peter Shirley
ACM Transaction on Graphics, Volume 26, Issue 1, January 2007

The most significant deficiency of most of today's interactive ray tracers is that they are restricted to static walkthroughs. This restriction is due to the static nature of the acceleration structures used. While the best reported frame-rates for static geometric models have been achieved using carefully constructed kd-trees, this paper shows that bounding volume hierarchies (BVHs) can be used to efficiently ray trace large static models.

More importantly, the BVH can be used to ray trace deformable models (sets of triangles whose positions change over time) with little loss of performance. A variety of efficiency techniques are used to achieve this performance, but three algorithmic changes to the typical BVH algorithm are mainly responsible. First, the BVH is built using a variant of the "surface area heuristic" conventionally used to build kd-trees. Second, the topology of the BVH is not changed over time so that only the bounding volumes need be re-fit from frame to frame. Third, and most importantly, packets of rays are traced together through the BVH using a novel integrated packet-frustum traversal scheme. This traversal scheme elegantly combines the advantages of both packet traversal and frustum traversal, and allows for rapid hierarchy descent for packets that hit bounding volumes, as well as rapid exits for packets that miss. A BVH-based ray tracing system using these techniques is shown to achieve performance for deformable models comparable to that previously available only for static models.
Hogum Image Synthesis using Adjoint Photons
R. Keith Morley, Solomon Boulos, Jared Johnson, Dave Edwards, Peter Shirley, Michael Ashikhmin and Simon Premoze
Proceedings of Graphics Interface 2006
Presentation (PPT)

The most straightforward image synthesis algorithm is to follow photon-like particles from luminaires through the environment. These particles scatter or are absorbed when they interact with a surface or a volume. They contribute to the image if and when they strike a sensor. Such an algorithm implicitly solves the light trans- port equation. Alternatively, adjoint photons can be traced from the sensor to the luminaires to produce the same image. This "adjoint photon" tracing algorithm is described, and its strengths and weaknesses are discussed, as well as details needed to make adjoint photon tracing practical.
Transparent Boeing 777 An Application of Scalable Massive Model Interaction using Shared-Memory Systems
Abe Stephens, Solomon Boulos, James Bigler, Ingo Wald, and Steven G. Parker
Proceedings of the 7th Eurographics Symposium on Parallel Graphics and Visualization, May 2006

During the end-to-end digital design of a commercial airliner, a massive amount of geometric data is produced. This data can be used for inspection or maintenance throughout the life of the aircraft. Massive model interactive ray tracing can provide maintenance personnel with the capability to easily visualize the entire aircraft at once. This paper describes the design of the renderer used to demonstrate the feasibility of integrating interactive ray tracing in a commercial aircraft inspection and maintenance scenario. We describe the feasibility demonstration involving actual personnel performing real-world tasks and the scalable architecture of the parallel shared memory renderer.
Anisotropic Spheres The Halfway Vector Disk for BRDF Modeling
Dave Edwards, Solomon Boulos, Jared Johnson, Peter Shirley, Michael Ashkikhmin, Michael Stark and Chris Wyman
ACM Transactions on Graphics Volume 25 Issue 1, Jan 2006

We present a mathematical framework for enforcing energy conservation in a BRDF by specifying halfway vector distributions in simple two-dimensional domains. Energy-conserving BRDFs can produce plausible rendered images with accurate reflectance behavior, especially near grazing angles. Using our framework, we create an empirical BRDF that allows easy specification of diffuse, specular, and retroreflective materials. We also present a second BRDF model that is useful for data fitting; although it does not preserve energy, it uses the same halfway vector domain as the first model. We show that this data-fitting BRDF can be used to match measured data extremely well using only a small set of parameters. We believe that this is an improvement over table-based lookups and factored versions of BRDF data.

Lucy image Memory Sharing for Interactive Ray Tracing on Clusters
David E. DeMarle, Christiaan P. Gribble, Solomon Boulos and Steven G. Parker
Parallel Computing, Vol. 31, No. 2, pp. 221--242. 2005.

We present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster's memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed.

Technical Reports
Adaptively subdivided production scene Packet-based Ray Tracing of Catmull-Clark Subdivision Surfaces
Carsten Benthin, Solomon Boulos, Dylan Lacewell, and Ingo Wald
Updated version of Technical Report, SCI Institute, University of Utah, No UUSCI-2007-011, 2007

Efficient ray tracing of subdivision surfaces is an important problem in production rendering, and for interactive applications in the near future. The current hardware trends for both CPUs and GPUs suggest that compute power is outpacing bandwidth. Despite this, current approaches for ray tracing subdivision surfaces favor geometry caches or full pre-tessellation. We demonstrate that directly ray tracing subdivision surfaces using ray packets uses much less bandwidth, while still providing amortization benefits. Our proposed method performs competitively with pre-tessellation even on current hardware, outperforms a single-ray implementation by up to 16x and Pixar's PRMan 13.0 geometry caching by up to 23.1x.
Densely tessellated blade model SIMD Ray Stream Tracing
Ingo Wald, Christiaan Gribble, Solomon Boulos and Andrew Kensler
Technical Report, SCI Institute, University of Utah, No UUSCI-2007-012, 2007

Achieving high performance on modern CPUs requires efficient utilization of SIMD units. Doing so requires that algorithms are able to take full advantage of the SIMD width offered and to not waste SIMD instructions on low utilization cases. Ray tracers exploit SIMD extensions through packet tracing. This re-casts the ray tracing algorithm into a SIMD framework, but high SIMD efficiency is only achieved for moderately complex scenes, and highly coherent packets. In this paper, we present a stream programming oriented traversal algorithm that processes streams of rays in SIMD fashion; the algorithm is motivated by breadth-first ray traversal and implicitly re-orders streams of rays on the fly by removing deactivated rays after each traversal step using a stream compaction step. This improves SIMD efficiency in the presence of complex scenes and diverging packets, and is, in particular, designed for potential wider-than-four SIMD architectures with scatter/gather support.
An ashtray rendered with distribution ray tracing Interactive Distribution Ray Tracing
Solomon Boulos, David Edwards, J. Dylan Lacewell, Joe Kniss, Jan Kautz, Ingo Wald and Peter Shirley
Technical Report, SCI Institute, University of Utah, No UUSCI-2006-022, 2006
Movie (Quicktime)  | SIGGRAPH 2006 Course Presentation (PDF)

Distribution ray tracing uses multiple samples per pixel to produce antialiased images that include soft shad- ows, glossy reflection, motion blur, and depth-of-field. The two main potential barriers to making distribution ray tracing interactive are that many rays might be required, and that those rays are not coherent enough to derive efficiency from tracing them in packets. A new interleaved sampling approach based on the Sudoku puzzle is used to minimize the number of rays per pixel. An empirical demonstration is used to show that there is still enough coherence in the rays to allow for a per-ray cost near that of a traditional ray tracer. In addition, a demonstration is provided that participating media can also be handled interactively in a distribution ray tracer.
IA Example Geometric and Arithmetic Culling Methods for Entire Ray Packets
Solomon Boulos, Ingo Wald and Peter Shirley
Technical Report, School of Computing, University of Utah, No UUCS-06-10, 2006

Recent interactive ray tracing performance has been mainly derived from the use of ray packets. Larger ray packets allow for significant amortization of both computations and memory accesses; however, the majority of primitives are still intersected by each ray in a packet. This paper discusses several methods to cull entire ray packets against common primitives (box, triangle, and sphere) that allows an arbitrary number of rays to be tested by a single test. This provides cheap "all miss" or "all hit" tests and may substantially improve the performance of an interactive ray tracer. The paper surveys current methods, provides details on three particular approaches using interval arithmetic, bounding planes, and corner rays, describes how the respective bounding primitives can be easily and efficiently constructed, and points out the relation among the different fundamental concepts.
Course Notes
EG07Star State of the Art in Ray Tracing Animated Scenes
Ingo Wald, William R. Mark, Johannes Gunther, Solomon Boulos, Thiago Ize, Warren Hunt, Steven G Parker, and Peter Shirley
Eurographics 2007 State of the Art Reports

Ray tracing has long been a method of choice for off-line rendering, but traditionally was too slow for interactive use. With faster hardware and algorithmic improvements this has recently changed, and real-time ray tracing is finally within reach. However, real-time capability also opens up new problems that do not exist in an off-line environment. In particular real-time ray tracing offers the opportunity to interactively ray trace moving/animated scene content. This presents a challenge to the data structures that have been developed for ray tracing over the past few decades. Spatial data structures crucial for fast ray tracing must be rebuilt or updated as the scene changes, and this can become a bottleneck for the speed of ray tracing. This bottleneck has received much recent attention by researchers that has resulted in a multitude of different algorithms, data structures, and strategies for handling animated scenes. The effectiveness of techniques for ray tracing dynamic scenes vary dramatically depending on details such as scene complexity, model structure, type of motion, and the coherency of the rays. Consequently, there is so far no approach that is best in all cases, and determining the best technique for a particular problem can be a challenge. In this STAR, we aim to survey the different approaches to ray tracing animated scenes, discussing their strengths and weaknesses, and their relationship to other approaches. The overall goal is to help the reader choose the best approach depending on the situation, and to expose promising areas where there is potential for algorithmic improvements.
Ray-BVH Intersection Notes on Efficient Ray Tracing
Solomon Boulos
SIGGRAPH 2005 Course: Introduction to Real-Time Ray Tracing

This little set of notes was largely ignored but represented what I had learned about making a ray tracer fast circa summer 2005. A lot of stuff has changed due to ray packets, but for single ray code this is still a good piece of information. Consequently, Eric Haines found it lying around on a SIGGRAPH DVD and we talked about it in the Ray Tracing News.