*Ray Casting.*Determining the intersection of a ray with the first object in the scene that it meets. This involves acceleration structures, intersection calculations, transformations, etc. And*Ray Generation.*Deciding, based on optics, statistics, probability, sampling theory, surface characteristics, etc. what rays to generate. Some of the calculations required to generate these rays are fairly deterministic (e.g. reflection, refraction), some are regular even if not determined specifically by the geometry (e.g. camera aperture), and some are completely statistical in nature (e.g. diffuse and semi-specular surface parameters).

*Height field Acceleration.*Develop height field representations and ray casting algorithms that are efficient and very high performance.*Surface Simulation.*Implement as a height field a microfacet-like surface that can be shaded by simply casting and following reflected rays. (No diffuse or semi-specular surface characteristics.) Attempt to match renderings of this surface with those of the model microfacet-shaded surface.

*Optimized triangle intersection.*Starting with the algorithm described in "Fast, Minimum Storage Ray/Triangle Intersection", I implemented intersection code that is optimized for a single cell in a height field. Such a cell comprises two triangles that share an edge. This shared edge is used as "edge2" in the algorithm, allowing much duplication in computation to be avoided. Furthermore the intersection code's flow of control is organized as an expanding tree, allowing computation for each triangle to be performed only as needed. (But requiring some duplication of code to implement all the paths.)*Object coordinate intersection.*The "walking" of the height field almost has to be done in object coordinates, as is the bounding box test. The triangle intersection code now also operates in object coordinates. No precomputation is done, and only the provided height field values are stored, so setup is very fast. Normals are computed only when an intersection is detected.*Fast above/below rejection.*Even the accelerated quad intersection test is expensive, so it makes sense to do trivial rejection of cells when possible. When "MINMAX" is defined, the code compiles to a version that computes entry and exit height values for the ray as it is traced. If the ray's height is increasing, each cell first tests if the ray's entry height is above the maximum for the cell, or the ray's exit height is below the minimum for the cell. Alternately, if the ray's height is decreasing, each cell first tests if the entry ray height is below the minimum for the cell, or the ray's exit height is above the maximum for the cell. Separate code loops are implemented for the increasing height and decreasing height cases, to minimize the code length within the loops. Rather than storing a minimum and maximum value for each cell, which would triple the storage requirement for the height field, two 2-bit indexes are stored for each cell. (Thus the storage increase is 25% for a single byte.) Since the height field is contiguous, the ray must intersect a triangle to go from above to below the height field surface. So an additional optimization could be made to reduce the min/max test to a single test per cell. I have not done this, though.*Tiled storage.*Storing large arrays in the obvious 2D manner can result in poor memory coherence when tracing a path. Tiling is a well-known solution to this problem. When "TILED" is defined, the code compiles to a version that does simple bit arithmetic on all cell X and Y addresses, allowing 2D locality of data reference. I made no attempt to align these 2D tiles with cache lines, however.*Height field replication.*I also modified the heightfield code and the the API module to allow a single height field to be replicated in X and Y. In RIB the replication factor for a dimension replaces the text string "nonperiodic" in the PatchMesh primitive. Thus extremely large tiled fields can be defined with the data of a single tile. The data size in a dimension that is replicated must be a power of 2, however.

Setup (sec) | Intersect (sec) | Total (sec) | Code (lines) | |
---|---|---|---|---|

Original | 35 | 1179 | 1213.9 | 75 |

HW2 | 16 | 18 | 33.9 | 450 |

Quad Test | 4 | 10 | 13.8 | 750 |

Min/Max | 4 | 6 | 10.3 | 1175 |

Tiled | 4 | 6 | 10.3 | 1200 |

The hemispheres cover roughly 90% of the area in this tight packing. If smaller hemispheres are inserted into the interstitial spaces, the coverage can be increased to roughly 95%:

All the subsequent renderings were done using this interstitial packing, though it likely has a very small effect on the results. Because unmodified versions of LRT cannot handle height field replication, I ran the benchmark tests using the following 1024x1024 height field. Note that the single height field defines a 4-tile by 4-tile surface:

```
Surface "shinymetal" "Ka" 0 "Ks" .25 "Kr" 0 "roughness"
```

*rough*

The microfacet renderings all replace the flat metalic surface with a replicated height field based on the tile described above. The tile replication is 256x256 in each image, resulting in over 100 billion triangles. The surface material is specified as purely reflective

`Surface "shinymetal" "Ka" 0 "Ks" 0 "Kr" 1 `

```
No lights
are included, so the images simply show the reflection of the small colored
spheres off the microfacet surface. To match variations in the roughness of the
traditional images, the microfacet surfaces are scaled vertically, resulting in
highly warped hemispheres.
```The first image comparison is at low roughness (high specularity) and is a
fairly good match:

Traditional Rendering
Microfacet Rendering
(Roughness = 0.01)
(Vertical Scale = 0.1)
(4 Samples / Pixel)
(256 Samples / Pixel)

The second image comparison is at higher roughness, hence less sharp, and
here the match isn't as good. Note that even with 256 samples per pixel the
Microfacet rendering is undersampled in terms of shading.

Traditional Rendering
Microfacet Rendering
(Roughness = 0.1)
(Vertical Scale = 0.3)
(4 Samples / Pixel)
(256 Samples / Pixel)

## Conclusion

I am happy with the results of tuning and extending the
height field capability of LRT. The performance increase of optimized
intersection, reduced initialization and storage, and min/max testing are
significant, and came with reasonable code complexity. I was surprised that
tiling the memory access had no positive effect on performance, but I also
didn't investigate this very carefully. It is possible that careful alignment
with cache lines would make a difference.
The image matches are close enough to demonstrate the feasibility of modeling
surface reflectance as pure geometry, though there is certainly room for
improvement here. While it is instructive to see that a straightforward ray
tracer can handle huge surface complexity (100's of billions of triangles
served) it is more instructive to see just how much complexity is required to
brute force shading this way. At 256 samples per pixel the shading is just
adequate, compared to 4 samples (and it could be one) for the typical shading
approach. It sure pays to do this stuff right!

Thus, regarding the motivation of this project, I'm convinced that brute
force geometric rendering (emphasizing Ray Casting over Ray Generation), while
perhaps attractive in a simplistic hardware sense, is almost certainly a losing
proposition for even the simplest of scenes. Looks like statistics, probability,
and all the rest of the theory are here to stay.

## LRT Code