"Light Makes Right"
October 27, 1989
Volume 2, Number 8
Compiled by
All contents are copyright (c) 1989, all rights reserved by the individual authors
Archive locations: anonymous FTP at
ftp://ftp-graphics.stanford.edu/pub/Graphics/RTNews/,
wuarchive.wustl.edu:/graphics/graphics/RTNews, and many others.
You may also want to check out the Ray Tracing News issue guide and the Mother of all Ray Tracing Pages.
back to contents
Over the years I have learnt a variety of tricks to increase the performance and image quality of my ray tracer. It's almost a cliche that today's successful trick is tomorrow's established technique. Photorealistic computer graphics is, after all, concerned with figuring out shortcuts and approximations for rendering various physical phenomena, i.e. tricks.
For whatever reason, many of the tricks mentioned here are not common knowledge. Some have been published (and sometimes overlooked), some have been discussed informally and have never made it into research papers, and others seem to have appeared out of nowhere. It's most likely that there are tricks that are commonly known that have not percolated over to me yet.
When possible, I have tried to give appropriate references or attributions; if not attributed, the ideas are my own (I think!). My apologies if I have overlooked anyone. Only references that do not appear in the book's "Ray Tracing Bibliography" are included at the end of this article. For more general rendering hacks, see [Whitted85], which originally inspired me to attempt to pass on some ideas from my bag of tricks.
back to contents
back to contents
While at the University of Utah, John Peterson and Tom Malley actually implemented Whitted/Rubin, Kay/Kajiya, and an octree scheme, and found that all three schemes were within 10-20% of each other speedwise. In an informal survey at SIGGRAPH '88, the BV Hierarchy, Octree, Grid and 5D schemes all had about the same number of users (all the 5D users were from Apollo; on the other hand, 5D is the new kid on the block).
There are a number of techniques I have found to be generally useful for all efficiency schemes.
1) When shadow testing, keep the opaque object (if any) which shadowed each light for each ray tree node. Try this object immediately during the next shadow test at that ray tree node. Odds are that whatever shadowed your last intersection point will shadow again. If the object is hit you can immediately stop testing because the light is not seen. This was first published in [Haines86].
2) When shadow testing, save transparent objects for later intersection. Only if no opaque object is hit should the transparent objects be tested. The idea here is to avoid doing work on transparent filters when in fact the light does not reach the surface.
3) Don't calculate the normal for each intersection. Get the normal only after all intersection calculations are done and the closest object for each node is known. After all, each ray can have only one intersection point and one normal. Saving intermediate results is worthwhile for some intersection calculations, which are then used if the object is actually hit. This idea was first mentioned in [Whitted85]. Similarly, other calculations about the surface can be delayed, such as (u,v) location, etc.
4) One other idea (which I have not tested) is sorting each intersection list by various criteria. Most efficiency schemes have in common the idea of lists of objects to be tested. For a given list, the order of testing is important. For example, all else being equal, if a list contained a spline surface and a polygon, I would want to test the polygon first since it is usually a quicker intersection test. Given an opaque object and a bounding box in a list, I probably want to test the opaque object first when doing shadow testing, since I want to find any intersection as soon as possible. If two polygons are on the list, I probably want to test the larger one first, as it is more likely to cast a shadow or give me a maximum depth (see next section). There are many variations on this theme and at this point little work has been done on these possibilities.
back to contents
I have found a number of tricks to speed up hierarchy traversal, most of which are simple to implement. Some of the ideas can also be useful for other efficiency schemes.
1) Keep track of the closest intersection distance. Whenever an object is hit, keep its distance as the maximum distance to search. During further intersection testing use this distance to cut short the intersection calculations: if an object or bounding box is beyond the maximum distance, no further testing of it needs to be done. Note that for shadow testing the distance to the light provides an initial maximum.
2) When building a ray intersection tree, keep the ray tree which was previously built. For each ray tree node, intersect the object in the old ray tree, then proceed to intersect the bounding box/object tree. By intersecting the old object first you can usually obtain a good maximum distance immediately, which can then be used to aid trick #1.
3) When shooting rays from a surface (e.g. reflection, refraction, or shadow rays), get the initial list of objects to intersect from the bounding volume hierarchy. For example, a ray beginning on a sphere must hit the sphere's bounding volume, so include all other objects in this bounding volume in the immediate test list. The bounding volume which is the parent of the sphere's bounding volume must also automatically be hit, and its other children should automatically be added to the test list, and so on up the object tree. Note also that this list can be calculated once for any object, and so could be created and kept around under a least-recently-used storage scheme. Another advantage of this scheme is that nearby neighbors of the object are tested for shadowing first. These neighbors are more likely to cast a shadow on the point than any random object. I first saw this trick used in Weghorst and Hooper's ray tracer at Cornell's Program of Computer Graphics.
4) Similar to trick #3, the idea is simply to do the same list making process for the eye position. Check if the eye position is inside the topmost node of the hierarchy. If it is, check the children which are boxes. Continue to check and unwrap until you are left with a list of objects to intersect. Again, the idea is to avoid wasting time shooting a ray against boxes which you know must be hit.
For light sources, since the farthest endpoint of the ray is also known, it can also be used to open some boxes early on. The tradeoff here, however, is that for shadow testing we want to find any intersection we can. Wasting time opening boxes near the light or ray origin might be better spent trying to find an intersection as fast as possible.
5) An improvement to trick #3 is also to use trick #4 to open more boxes initially. You work up the hierarchy opening all parent boxes; any children of the parent (except the original one, of course) are then tested against the ray position. However, this can be done only when the trick is done on the fly, since the ray's origin will change.
Kay & Kajiya's hierarchy scheme [Kay86] is about the best overall traversal method. However, Jeff Goldsmith and others note that if you do use Kay & Kajiya's heapsort on bounding volumes in order to get the closest, don't bother to do it for illumination rays. In shadowing, you don't care about the closest intersection, but just whether anything blocks the light.
back to contents
In practice, each octree voxel notes whether it is a parent of further voxels or is a leaf and contains a list of objects to hit. If it is a parent, it stores a list of 8 pointers to its subordinate octrees, with null pointers meaning that the subordinate octree is empty; otherwise, a list of objects is used.
One problem with building octrees is deciding when enough is enough. You want to subdivide an octree voxel if the number of objects is too many, but you may find that these further subdivisions do not gain you anything. Olin Lathrop has an interesting method for octree subdivision. First, the biggest win is to subdivide on the fly. Never subdivide anything until you find there is a demand for it (this same idea was used by Arvo and Kirk [Arvo87] in their 5D efficiency scheme). His subdivision criteria are, in order of precedence:
1) Do not subdivide if subdivision generation limit is hit.
2) Do not subdivide if a voxel contains less than X objects (These first two criteria were first proposed in [Glassner84]). Olin uses X=1.
3) Do not subdivide if less than N rays passed through this voxel, but did not hit anything. Olin uses N=4.
4) Do not subdivide if M*K >= N, where M is the number of rays that passed through this voxel that did hit something, and K is a parameter you chose. Olin uses K=2, but suspects it should be higher. This step seeks to avoid subdividing a voxel that may be large, but has a good history of producing real intersections anyway. Keep in mind that for every ray that did hit something, there are probably shadow test rays that did not hit anything. This can distort the statistics, and make a voxel appear less "tight" than it really is, hence the need for larger values of K.
Another possible criterion is to base the subdivision generation limit (criterion 1) on the number of objects in the octree. If you had, say, 6 objects and 5 of them are clustered tightly together, you may find your octree reaching its maximum depth without the subordinate octrees actually splitting up the 5 objects. These octree voxels are useless, costing extra time and memory. They could be avoid by setting the limit based on the total number of objects. I use something along the lines of the depth limit being equal to log sub 4 of (number of objects).
Andrew Glassner has a better method to avoid this problem. When you subdivide a voxel, look at its children. If only one child is non-empty, replace the original voxel with its non-null child. Do this recursively until the subdivision criterion is satisfied. He does this in his spacetime ray tracer, and the speedup can be large. Note that this scheme means adding a field to the octree structure to identify what level in the hierarchy it represents.
An idea to speed octree traversal was first mentioned to me by Andrew Glassner and later by Mike Kaplan. The idea is to place a pointer on each face of each octree voxel. If a voxel's face is next to a larger or same size voxel, a pointer to this neighbor is stored. If the voxel face's neighbors are smaller, then the face pointer is set to point at the bordering voxel of the same size (which is these neighbors' common parent). If there are no neighbors (i.e. the face is on the exterior), a null pointer is stored.
When a ray exits a voxel, the voxel face is accessed and the next voxel found directly. This voxel may have to be descended, but this trick saves having to descend the octree from the top.
Mike Kaplan independently arrived at a similar method, in which he stores quadtrees at the faces so that he can immediately access the next voxel and avoid any descent altogether.
back to contents
back to contents
Peterson's subdivision criterion is to use a bounding box around each quad generated, subdividing until the box is smaller than a certain number of pixels. A drawback of this method is that it does not elegantly handle objects that are part of the scene yet do not appear in the viewing frustum (e.g. if the teapot is only seen in a mirror, we cannot get a good sense of how much to subdivide it). Snyder and Barr [Snyder87] have some good recommendations on this process, and they use the change in the tangent vector between the quad's points as a subdivision criterion. This article also points out other pitfalls of tessellation and of rendering polygons with a normal at each vertex.
If adaptive techniques are used, one problem to guard against is cracking. Say there are two adjacent quadrilaterals, and one has been subdivided into four smaller quads. Something must be done along the seam between the two large quadrilaterals, as normally the subdivision point between the two common vertices will not lie on the large, undivided quad. If rendered from some angle, there will be a noticeable crack between the large quad and the two neighboring smaller quads.
back to contents
back to contents
[Press88] Press, William H. et al, Numerical Recipes in C, Cambridge University Press, Cambridge, 1988.
back to contents