Assignment 2 Lazy KD-Tree
Sean Rosenbaum
Date submitted: 11 PM Apr 25 2006
Code emailed: 10:50 PM Apr 25 2006
Approach
Initially none of the primitives are refined and no nodes exist other than a placeholder for the root storing the indices of the primitives it contains and a flag specifying its type and status. When the intersection routine is called, it traverses down the tree, building nodes along the paths it tests for intersection. When a child node meeting the termination criteria is formed, each of the primitives it contains are checked if they are intersectable. Any primitive that must be refined, is refined to the next level. If after the primitive is refined, there is only one refined, intersectable primitive in its place, the original primitive is replaced with this. If that sole refined primitive is still not intersectable, I refine that and follow the same procedure. Otherwise, if multiple refined primitives are recovered, I replace the original primitive with a Lazy KD-Tree storing those primitives. To be precise, replacement is global - the tree's reference to the primitive is replaced by this refinement. Mailboxes are used for both aggregates as well as fully refined primitives. As you can see, this algorithm delays refinement for as long as possible by incrementally refining primitives when they are in a terminal node (and building a new tree for them), and by lazy evaluation of the tree.
In order to make effective use of system and cache memory, I track the depth of nodes possibly needing to be expanded during the intersection routine. This way depth is only stored for a subset of the nodes needing to be expanded during a given intersect call. The same could have been done with each node's bounds, but I opted to store an array of them in the tree to avoid having to add a number of conditional statements within the intersect routine. Ultimately, this is probably not all that important, though it is potentially another small optimization. Within each tree node, I store a number of fields, including its status (leaf, interior, expandable), indices of the primitives (if a leaf that may be expanded), number of bad refines, below and above child indices, and the remaining fields used in the original KD-Tree. All but one of these fields required more memory be used. The new status field and number of bad refines were added to the union used by the flags variable, among others. They occupy the third and fourth least significant bits. This requires the maximum allowed bad refines be three since at most two can be stored in the node and the third count is computed during the expansion. For expandable nodes, the indices of the primitives was stuffed in one of the other unions. The below child index is a bad apple: since nodes are not expanded in order, you do not know in advance where each child is located. There are a variety of ways to deal with this. First, the entire tree can be pre-allocated and a function defined to map each position to some node entry. Alternatively, a table can be kept to map a node's consecutive index to its child; however, when we access the table there is additional overhead that may make things worse despite the better cache layout of the nodes. I opted to include the child index in the node, making it a total of twelve bytes (assuming ints and floats are four bytes) and unfortunately less than ideal for optimal cache performance, yet fairly compact nonetheless. Given more time, the table would have been worth experimenting with.
I decided not to further optimize the tree based on the camera's configuration because I wanted the tree to perform well in general. In particular, a good tree probably depends a lot on the materials in the scene and the positioning of the lights relative to them, so I was not convinced that biasing the cost function in favor of the camera was worthwhile. Moreover, my cost function remains unchanged so that when collections of unrefined primitives are to be expanded on they are treated as equals. Given the absence of information, there are not many other options. Then by spawning new trees for their refinements, the lazy tree operates as the original KD-Tree would with the benefits of lazy evaluation when dealing with that geometry (assuming these primitives are fully refined).
Final Images Rendered with my implementation of lz-kdtree.cpp
The following images were rendered on my Centrino 1.6 Ghz, 1GB laptop.
...After comparing my times to those of my classmates, I am convinced this system should never be used again for ray tracing.
killeroos-view1.pbrt (Killeroos visible)
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
430.4 |
0.2 |
0.046% |
total time (secs) |
738.0 |
656.7 |
88.9% |
Num of nodes made |
2.752M |
1.357M |
49.3% |
Triangle ray intersections |
673.3k |
827.7k |
122.9% |
killeroos-view2.pbrt (Killeroos invisible)
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
430.6 |
0.2 |
0.046% |
total time (secs) |
689.5 |
263.8 |
38.3% |
Num of nodes made |
2.752M |
47 |
0.002% |
Triangle ray intersections |
758.4k |
776.1k |
102.3% |
killeroos-view3.pbrt (close-up)
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
430.7 |
0.2 |
0.046% |
total time (secs) |
772.8 |
525.1 |
67.9% |
Num of nodes made |
2.752M |
373.0k |
13.6% |
Triangle ray intersections |
644.2k |
1.018M |
158.0% |
plants-view1.pbrt
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
1982.6 |
0.8 |
0.040% |
total time (secs) |
12372.7 |
16460.2 |
133.04% |
Num of nodes made |
14.886M |
4.044M |
27.2% |
Triangle ray intersections |
20.257M |
27.669M |
136.6% |
plants-view2.pbrt
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
1987.8 |
0.8 |
0.040% |
total time (secs) |
19667.8 |
29186.6 |
148.40% |
Num of nodes made |
14.886M |
5.171M |
34.7% |
Triangle ray intersections |
25.945M |
31.251M |
120.5% |