Assignment 2 - Lazy KD-Tree

LeeHendrickson

Date submitted: 25 Apr 2006

Code emailed: 25 Apr 2006

Description of implementation approach and comments

There are two main concepts behind a lazy kd-tree:

A node should not be created until absolutely necessary. That is, delay until the last possible second any work that needs be done to create a node.
Refining an aggregate primitive is normally an expensive operation and should not be carried out until it is known for certain that intersection tests have to be carried out on the refined primitives.

Of the two concepts above, the easier to implement is (2) and that is where work began. However, the very first step was to refactor the current code base. There is a overwhelming amount of code duplication between the Intersect and IntersectP routines, so these two functions were collapsed into a single subroutine with Intersect and IntersectP acting as wrappers around them. Likewise, within the routine the work of performing intersections was refactored to handle single and multiple primitives within the same block of code. These two refactorings allowed the work on (1) to go much more smoothly as no code needed to be duplicated and there was a single "flow" of execution (code-wise) through the kd-tree.

Phase 1

With those changes in place the implementation of (2) was straightforward. Now, instead of a direct intersection, the primitive was queried as to whether it could be intersected. A non-intersectable primitive was refined, and those refined primitives placed into a new kd-tree structure. This new kd-tree replaced the previously aggregrate primitive in the first kd-tree. In this way a kd-tree of kd-trees is built, with the sub trees only built when necessary and themselves lazily evaluated.

A purposeful design decision was to refine aggregrate primitives only one "step", with no use of Primitive::FullyRefine. This was to allow the lazy kd-tree to be as general as possible and allow for the existence of an aggregate of aggregates where only some sub-aggregates may need to be further refined.

Phase 2

Next, nodes needed to be lazily created themselves and the decision was made to create nodes one at a time. That is, when a tree is first constructed only the root node is fully created, its two children are marked as "pending". Only when one of the children must be traversed by a ray is it actually constructed. Since it is not known when a child might be traversed (indeed, even if it will ever be traversed), or in what order they might be traversed, when a parent node is created all the data that is necessary to create both children is stored in a look-aside data structure. Thus, when a pending node must be created the data necessary to carry out that construction is looked up, the node then created and the data freed. The ray can then intersect that node, possibly descending into further children as necessary, each child built as it is needed. In this way the tree is built on demand and in the limit the lazy tree approaches the full tree the standard implementation builds.

Performance wise, the lazy implementation suffers from having to manage the necessary temporary data structures. Since the standard implementation builds the entire tree upfront all the data can be computed and kept on the stack during the recursive build process. This is not the case in the lazy implementation and thus a non-trivial amount of time is spent allocating and performing accounting on memory. However, by using data structures held in the top-level tree the nodes themselves can be kept to 8 bytes, the only necessary addition a "pending" flag which is kept in the same memory as the primitive pointers in a manner similar to the "flags" flag. Furthermore, the intersect routine itself is more complex, as "pending" nodes must be detected and the routine left in order to build those nodes, resulting in a less memory and code coherent intersection function.

Additional Optimizations

One shortcoming with the current kd-tree implementation is that it treats all intersections as having the same cost. This is certainly not the case with a lazy tree, within which certain primitives can be basic triangles, spheres, etc. and others can be unrefined aggregates which incur the cost of refining and tree construction if intersected. The cost function that decides where to split a node should take into account these disparate costs. In the lazy implementation a refined primitive that can be intersected is given the default cost passed to the tree constructor, but a non-intersectable primitive (normally an aggregate that must be refined) is given a cost equal to twice that of an intersectable primitive (a multiplicative factor arrived at empirically). The cost function is modified to take into account per-primitive costs and make its splitting decision accordingly.

Further optimizations were more micro than algorithmic in scope. Floating point factors are only computed once, STL containers are accessed as minimally as necessary, memory is allocated in aligned blocks, etc.

Final Images Rendered with my implementation of lz-kdtree.cpp

"killeroos-view1.pbrt (Killeroos visible)"

killeroos-view1

	KD Tree	Lazy KD Tree	Ratio
build time (secs)	18.97	0.0016	0.0084%
total time (secs)	37.67	31.4	83.36%
Num of nodes made	2.752M	279.1k	10.14%
Triangle ray intersections	673.2k	856.0k	127.15%

"killeroos-view2.pbrt (Killeroos invisible)"

killeroos-view2

	KD Tree	Lazy KD Tree	Ratio
build time (secs)	18.98	0.0016	0.0084%
total time (secs)	34.68	16.3	47.0%
Num of nodes made	2.752M	28	0.0001%
Triangle ray intersections	758.4k	776.1k	102.33%

"killeroos-view3.pbrt (close-up)"

killeroos-view3

	KD Tree	Lazy KD Tree	Ratio
build time (secs)	18.96	0.0016	0.0084%
total time (secs)	39.96	27.9	69.82%
Num of nodes made	2.752M	327.3k	11.89%
Triangle ray intersections	644.1k	644.6k	100.08%

"plants-view1.pbrt"

plants-view1

	KD Tree	Lazy KD Tree	Ratio
build time (secs)	52.26	0.144	0.28%
total time (secs)	706.06	1102.1	156.09%
Num of nodes made	14.862M	10.574M	71.15%
Triangle ray intersections	20.291M	27.757M	136.80%

"plants-view2.pbrt"

plants-view2

	KD Tree	Lazy KD Tree	Ratio
build time (secs)	52.48	0.144	0.27%
total time (secs)	1154.78	1926.2	166.8%
Num of nodes made	14.862M	8.89M	59.81%
Triangle ray intersections	25.829M	30.177M	116.83%