Assignment 2 Lazy KD-Tree
Julie Tung
Date submitted: 26 Apr 2006
Code emailed: 26 Apr 2006
General Strategy
The general strategy of my implementation is that at the beginning, I don't really build any of the kd-tree (I build only to depth 1, which builds one interior node and two lazy nodes). Then, as it goes into rendering mode, when the Intersect routine gets called, every time it hits a lazy node, it runs the function that either splits that node into two interior nodes or creates a leaf node, and then it goes back to the regular Intersect routine, which can handle both interior and leaf nodes.
Storage strategies
A lot of temporary storage is needed (to store edges and the prims0 and prims1 arrays) for the splitting that occurs during the rendering stage now, because it's building the kd-tree lazily. At first I just allocated the exact size that I needed for each of these data structures, used them in the split, and then freed them. However, since this was done for every time a lazy node was split into an interior node, this became extremely time-wasting. Instead, I created global temporary data structures, first initializing them to a particular size. Before each use, I check to make sure the size is sufficient; if so, I proceed, if not, then I reallocate the structures and delete the old ones. Using this scheme, the structures are reallocated very few times, and we just keep reusing the same space, although we're frequently holding more space than we need, it's worth it to avoid the malloc/free costs each time. Also, since the data structures are global, all the kd-trees can share this same temporary storage. This temporary storage optimization saved my rendering time about 20% on the killeroos-1 image.
For the KdAccelNode, I was able to keep the storage for the struct to 8 bytes by doing the following: 1. Since we can no longer position one child next to the parent node, instead position the two children next to each other, so that you only have to keep the index for one child, and the other's position is that + 1. 2. In the flags variable, which distinguishes between interior and leaf nodes, I marked lazy nodes with the same flag as the leaf (having 1's in the last 2 bits). Then to distinguish between leaf and lazy nodes, I used the least significant bit in the pointer that points to the MailboxPrims in the leaf case or the extra info in the lazy node case. Since pointers are byte-aligned, the last two bits don't matter. So every time that I access the variable as a pointer, I mask out those last two bits, and when I need to distinguish between the two node types, I check that last bit. For the lazy node, all the storage is contained in dynamically allocated storage pointed to by the info pointer, which contains the BBox (bounding box), number of badRefines, and the indices of the primitives contained in the lazy node. I don't store the depth of the lazy node, because I just keep track of the depth while doing the traversal, and pass the current depth to the subroutine that handles deciding whether or not to make a leaf node.
Lazy refinement
My implementation also lazily refines the primitives by not fully refining in the beginning. When it needs to intersect a particular primitive, it refines it one step and creates a new kd-tree from that somewhat-refined list of primitives, so that nodes in the top-level kd-trees contain kd-trees as primitives themselves.
Results
On killeroos-view1, the two implementations perform roughly the same, since the lazy implementation doesn't really save much, because all the primitives end up getting used and built into the kd-tree, because everything is visible from the camera. In killeroos-view2 and 3 however, the lazy implementation saves a significant amount of time because we get to skip out on a significant portion of the work, as so many primitives are never intersected with rays.
In the plants scenes however, the lazy implementation performs significantly worse. This is likely due to a few factors--most primitives are probably visible from the camera, so that we don't get much savings by not building them into the tree; we lost quite a bit of the locality (like parent next to child) by building incrementally; we also incur somewhat significant cost in allocating and deallocating storage for the lazy nodes, which we don't need to do at all in the original implementation.
Final Images Rendered with my implementation of Lazy kd-tree
killeroos-view1.pbrt (Killeroos visible)
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
19.8 |
0.0 |
0% |
total time (secs) |
30.9 |
28.0 |
90.6% |
Num of nodes made |
2.754M |
1359.9k |
49.4% |
Triangle ray intersections |
673.2k |
826.8k |
122.8% |
"killeroos-view2.pbrt (Killeroos invisible)"
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
19.1 |
0.0 |
0% |
total time (secs) |
28.2 |
9.9 |
35.1% |
Num of nodes made |
2.754M |
47 |
.002% |
Triangle ray intersections |
758.4k |
776.1k |
102.3% |
"killeroos-view3.pbrt (close-up)"
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
20.1 |
0.0 |
0% |
total time (secs) |
32.2 |
20.6 |
63.9% |
Num of nodes made |
2.754M |
373.2k |
13.5% |
Triangle ray intersections |
644.2k |
1.018M |
158.1% |
"plants-view1.pbrt"
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
49.9 |
0.1 |
0.2% |
total time (secs) |
493.8 |
844.6 |
171% |
Num of nodes made |
14.838M |
4.044M |
27% |
Triangle ray intersections |
20.264M |
27.614M |
136.3% |
"plants-view2.pbrt"
|
KD Tree |
Lazy KD Tree |
Ratio |
build time (secs) |
49.8 |
1.2 |
2.4% |
total time (secs) |
784.3 |
1472.1 |
187.7% |
Num of nodes made |
14.838M |
5.157M |
34.8% |
Triangle ray intersections |
25.894M |
31.217M |
120.5% |