Assignment 2 Lazy KD-Tree

Kurt Berglund

Date submitted: 26 Apr 2006

Code emailed: 26 Apr 2006

Description of implementation approach and comments

Final Approach

In my final approach I tried to hold off on doing any work until necessary. To implement this strategy I first modified the KdAccelNode data structure to represent a lazy node. I took an extra bit from nPrims to represent this lazy state. Interior nodes stayed the same, but for leaf nodes, this bit is 0 if it is not lazy. A lazy node then looks like a leaf node but with this bit set. A lazy node also stores a pointer to a data structure containing the data needed to create the subsequent parts of the tree, such as a list of primitives, the current depth, number of bad refinements, etc...

To keep storage at a minimum, whenever I create an interior node, its children are always one index apart. This makes it so only the first needs to be stored, and kept my node sizes at 8 bytes – the same as for a regular KD-Tree node. In my first attempt, I expanded the size to 16 bytes since I couldn’t find a way to store two node pointers, but after talking with Kayvon in office hours, he gave me the tip on how to just store one node index.

With this data structure, I then modified the original KD-Tree code to do the following. The constructor, rather than building any part of the tree, simply creates a lazy node at the root, supplying it with the necessary data to build the tree (primitives, depth, etc…). Then, when a ray traversal occurs, it starts at the root and works its way down as always. However, when it encounters a lazy node, it looks up the data necessary to construct it, and then creates a non-lazy node. This creation function then runs the standard KD-Tree build algorithm to find the split plane, and itself creates two lazy nodes as children. This process then continues until an intersection is found.

After watching the times for the various operations, I realized that refinement can be a fairly time consuming operation. Therefore, I hold off on refinement as long as possible. When a new lazy KD-Tree is created, it doesn’t do any refinement of the passed in primitives. When a leaf node is constructed that contains a non-intersectable primitive, this primitive is replaced by a lazy KD-Tree composed of the results of a call to Refine on that primitive. I chose not to fully refine, since it could be possible that only one of the refinements would end up being intersected against.

Optimization Experiments

Lazy Tree Building

I tried a couple of tree building variations to get runtimes down, but unfortunately didn’t experience many performance improvements (and usually saw regressions). The first thing I tried, after a suggestion from Kayvon in office hours, was to expand more than just one lazy node at a time. Every time my system ran in to a lazy node, it then tried to expand the tree a certain depth from that node. I tried expanding as a percentage of the maxDepth, but at 10% and 20% (or 2 if it was larger), I didn’t see any improvements, and actually saw the times decrease. This largely seemed due to an increase in the number of nodes created. At a fixed expansion depth of 3, I created 2.739M nodes on killeroos-1, while the final approach only made 1.722M nodes.

Since it appeared I was making nodes that didn’t help in the end, I also tried expanding in the direction that the ray would traverse, rather than just creating two lazy nodes and returning. I had hoped this would save some allocation costs, but performance ended up being more or less the same. Since the code I had to do this became much more complicated, I rolled back to the original version. I also tried expanding the entire tree once I hit a certain depth (i.e. if the remaining depth is 4, do a full expansion), but I also didn’t see gains from this.

In all the above cases, I also experimented with allowing the recursive expansion to generate leaf nodes or not. The reason being refinement can be time consuming, so I only wanted to have it occur when the ray needed to intersect with the geometry in the primitive. However, neither choice made much of a performance gain against my final implementation.

Memory Management

Cutting from 16 bytes to 8 bytes for the KD-Tree node data structure helped some with the ecosystem scenes, but largely I didn’t see any big gains from this (not like the 20% improvement discussed in the book).

I also investigated whether the construction data allocated with each lazy node was a performance problem, since for a KD-Tree, there could potentially be many of these allocated structures around. To do so I removed them altogether, and created the data they stored on demand. I did so by generating the bounding box for a KD-Tree node as I traversed the tree, as well as the list of primitives within its bounds on the fly by doing bounding box overlap checks. This data was then passed to the lazy node expansion routines. However, the cost of doing bounding box overlap checks was too much in the end (since for many of the given scenes, there are many primitives in a KD-Tree), and I stuck with storing a pointer to construction data with each lazy node.

Cost Function

Since rather than building a KD-Tree of intersectable objects, mine was building one of primitives that may need to be refined later, the default intersection cost estimates were off. I experimented a bit with the cost function to try to get a better construction. For each non intersectable primitive, I estimated its cost as – iSectCost * lzKdTreeDepthEstimate * traversalCost. Since each non intersectable primitive would be turned in to a lazy KD-Tree, this seemed like a more reasonable cost approximation. I didn’t see much of a gain with my basic implementation on the killeroo scenes though and so stuck with the original formula. I think the extra cost of having to lookup each object’s cost may have helped slow things down as well. If I would have spent more time on this, I would have liked to add some sort of “RefinePrimEstimate” to the Primitive class, which would return a guess at how many primitives a refinement would cause. This may have helped to better guess at the cost. Also, I think adding a build time parameter to the estimate may also have led to better results, but both of these ideas I left to future work.

Final Images Rendered with my implementation of lz-kdtree.cpp

killeroos-view1.pbrt (Killeroos visible)

KD Tree

Lazy KD Tree

Ratio

build time (secs)

16.9

0.0

0%

total time (secs)

26.6

24.9

93.6%

Num of nodes made

2.754M

1.722M

62.5%

Triangle ray intersections

673.2k

826.8k

122.8%

"killeroos-view2.pbrt (Killeroos invisible)"

KD Tree

Lazy KD Tree

Ratio

build time (secs)

16.9

0.0

0%

total time (secs)

24.9

8.4

33.7%

Num of nodes made

2.754M

58

0.002%

Triangle ray intersections

758.4k

776.1k

102%

"killeroos-view3.pbrt (close-up)"

KD Tree

Lazy KD Tree

Ratio

build time (secs)

16.9

0.0

0%

total time (secs)

27.2

16.8

61.76%

Num of nodes made

2.754M

441.3k

16.02%

Triangle ray intersections

644.2k

1.018M

158.03%

"plants-view1.pbrt"

KD Tree

Lazy KD Tree

Ratio

build time (secs)

42.9

0.0

0.0%

total time (secs)

422.9

609.7

144.17%

Num of nodes made

14.838M

4.538M

30.58%

Triangle ray intersections

20.264M

27.626M

136.33%

"plants-view2.pbrt"

KD Tree

Lazy KD Tree

Ratio

build time (secs)

43.1

0.0

0.0%

total time (secs)

663.6

1039.6

156.66%

Num of nodes made

14.838M

5.826M

39.26%

Triangle ray intersections

25.887M

31.190M

120.49%