CS348b2007

Assignment 2

Assignment2Graeb.zip

Implementation

Lazy Primitive Refinement

I realized that, when I needed to perform an intersection test upon a non-intersectable primitive, I could not simply refine it and spit any new primitives which result into the existing Kd-Tree. Problems would arise because other nodes may contain this primitive as well. If this 1 primitive becomes 8000, it would be very difficult to update all the other nodes and let them know about their new contents. This would likely involve allocating new arrays a good deal of copying.

Therefore, since Kd-Trees are primitives as well, I decided that upon intersecting a leaf node containing an unrefined primitive, I would replace that primitive with a Kd-Tree that contains the primitive. In the constructor of each new Kd-Tree I conduct the actual refinement. I look at my list of primitives and if this list contains one primitive which is non-intersectable then I refine until I have more than one primitive or the one primitive I have can be intersected. By not fully refining a primitive the first time I strike it, I support lazy refinement of primitives with multiple levels of containment, such as ShapeSets. By refining until I have at least one intersectable primitive, I avoid creating a Kd-Tree which immediately contains another Kd-Tree, as when a LoopSubdiv refines to a single TriangleMesh before finally refining to individual Triangles.

Lazy Kd-Tree Building

Upon creation, a Kd-Tree sets all all the data it needs and creates a single, unfinished, node. When a ray attempts to intersect any unfinished node, the node is analyzed and turned into a leaf, or turned into an internal node, split into two unfinished nodes. This continues, completing nodes only when they are intersected, until a collision within the tree can be confirmed or denied.

I decided not to "fill out" the Kd-Tree upon initial implementation. What if the first few steps force an out-of-sight primitive to refine to a huge number of primitives? What if we take 8 steps and we will never intersect 4 of those? Enforcing mandatory initial steps seemed like a fairly arbitrary decision, one which would decrease simplicity and grace. The Kd-Tree is refined only as needed.

Weight

With lazy refinement a LoopSubdiv is a single primitive yet it will explode into 8000 when refined. Building a tree thinking that a LoopSubdiv primitive is just as computationally dangerous as a single triangle would be a terrible mistake. Therefore, I added a virtual Weight() function to Primitive which returned 1 by default. For LoopSubdiv, Weight() is overloaded to return the number of triangles created upon full refinement through some extremely fast math. TriangleMeshes return the number of contained triangles, ShapeSets and KdTreeAccels return the combined weights of their contained primitives.

When deciding the cost of a split these weights are used, rather than the number of primitives, which should lead to an identical series of splits whether leading up to a fully refined, or not yet refined, Killeroo.

Data for Unfinished Nodes

BuildTree passed a fair amount of state data along as it created each node. It also passed along a number of temporary data structures that each iteration of the function would overwrite. In converting BuildTree to BuildNode, I had to figure out how to handle this.

vector<BBox> allPrimBounds; and  // bounding boxes of all primitives, saved
BoundEdge *edges[3]; // pre-allocated space for edge info

These I made members of KdTreeAccel. The disadvantage of keeping these available at all times is that they will never be deleted. However, allPrimBounds would only cease to be used if every single leaf were made, which is quite unlikely. If edges[3] were not kept around each call of BuildNode would have to recreate it. This would be an undesirable delay.

Unique state data needed by each node was stored in a new object I created, UnfinishedNodeData. Its members were:

u_int *primNums; // array containing index (in mailboxPrims) for each prim in this node
u_int nPrims; // number primitives in primNums
BBox nodeBounds; // bounds of this node
int depth; // number of splits we can do from here
int badRefines; // number of consecutive refines that scored (slightly) worse than parent
int weight; // sum weight of contained primitives

Most variables are necessary, as this data is specific to each unfinished node and must be remembered somewhere. weight could be recalculated but I thought the benefits of storing it outweighed the cost of constantly recalculating it.

I could not think of any clever way to allocate the memory necessary for UnfinishedNodeData; it is simply managed via new and delete. Since it is deleted at arbitrary times I did not use the MemoryArena. Since pre-allocating space would require 2^maxDepth space, I decided against that as well. I did do a clever thing upon creating a parent's two children. While one child's UnfinishedNodeData must be allocated new, the other child is simply passed the UnfinishedNodeData that the parent had formerly used. Thus, there is only 1 new per 2 unfinished nodes created.

KdAccelNode

I really did not want to increase KdAccelNode beyond 8 bytes. After a good amount of thinking, I kept it small and made it only 1/230 worse, which is a very small number. To mark a node as unfinished, I nearly used an extra flag bit. However, this would have removed a third bit of precision from the splits in internal nodes and leaf nodes' maximum capacity would be 229 less. I decided, instead, to use the flag value 0xFFFFFFFF as the indicator that the node was unused. Internal nodes can never have this value because the last two bits of the flag will contain at least one 0. Leaf nodes now hold only 1 less than before.

Since nodes created lazily, I could no longer guarantee that the "below" node would immediately follow the parent and pondered adding an additional pointer for it. I realized I could get away with the same single pointer because both children are always created simultaneously. By positioning the below child and above child in sequence, I could point only to the first one and know the other was in the next possible location. Thus, KdAccelNode remained 8 bytes.

Results

I set a timer from the time pbrt was instantiated to the time it completed. This is the number used for "Total build time".

With this time, I found the non-lazy implementation had 134 seconds of overhead which were not accounted for in the combined build and render times.

I wasn't positive how to calculate exact build or render times for my lazy Kd-Tree implementation, I only had the total execution time and the render time, which now included Kd-Tree buliding. I assumed the time spent doing pure rendering (traversing the Kd-Tree and computing samples) was identical to the non-lazy implementation. Any time lost due to switching between pure render and tree building I'm counting as build time. I also assumed an identical 134 seconds of overhead. The formula I used was:

lazy_build_time = total_execution_time - (overhead_time + non_lazy_render_time)

View: 1

pbrt K-D Tree

my Lazy K-D Tree

Ratio (lazy/non-lazy reference)

build time (secs)

580

413

0.71

total time (secs)

1095

929

0.85

nodes made

9.76M

3.58M

0.37

Triangle ray intersections

2231k

2345k

1.05

View: 2

pbrt K-D Tree

my Lazy K-D Tree

Ratio (lazy/non-lazy reference)

build time (secs)

585

302

0.52

total time (secs)

1155

872

0.75

nodes made

9.77M

1.89M

0.19

Triangle ray intersections

4.84M

5.37M

1.11

View: 3

pbrt K-D Tree

my Lazy K-D Tree

Ratio (lazy/non-lazy reference)

build time (secs)

582

98

0.17

total time (secs)

1032

548

0.53

nodes made

9.77M

0.14M

0.01

Triangle ray intersections

4.86M

6.49M

1.34

View: 3 Refined

pbrt K-D Tree

my Lazy K-D Tree

Ratio (lazy/non-lazy reference)

build time (secs)

-

238

-

total time (secs)

-

689

-

nodes made

-

0.36M

-

Triangle ray intersections

-

6.78M

-

* My computer could not complete a render with the original implementation due to memory demands

last edited 2007-05-02 06:41:49 by MichaelGraeb