Project 2 FAQ
CS348B Spring 99
Prof: Frank Crow
TA: Elena Vileshina

------------------------------------------------------------------------

My program makes incorrect pictures when I compile with highest
optimization flags like O3 or O4 but works fine with no optimization
or at O1. Why and what should I do?

I don't know if it's A) real memory bugs that are only exposed when
the compiler tries hard to optimize, B) precision problems that are
only exposed because the compiler changes the order of operations or
C) the compiler actually is corrupting perfectly good code. I'm
inclined to guess B) if you're just missing some polygons or A) if you
have massive problems. I'll admit C) is always a possibility.

You should document this situation in your README file. Your main
timings and images should be produced with the correct but slower
code. If you also want to include one timing with the faster code to
show us the magnitude of the difference, that's fine.

To those of you chasing mysterious bugs - make sure you're *not*
optimizing yet, so that you can rule out compiler wierdness as the
cause. Use the optimization flags on the compiler only after
everything else works.

------------------------------------------------------------------------
Q:

Does top work on the raptors yet?

Yup!

------------------------------------------------------------------------
Q:

Can I run my timings on the raptors?

No. Your timings must be on the firebirds. Also, don't forget to
submit your images as well as your README and your code.

How do I measure "pre-computation time" ?

The easiest thing to do is have a command-line option which tells it
to quit after the precomputaton. So you'll run it twice, once with
that flag set to get the precomputation time alone, and once all the
way through. Your elapsed time is indeed the difference between the
total system+user time from "time" and the pre-computation time.

What if there's a heavy load on the machine when I'm doing my final
timings?

The time used by your program is the sum of the system + user time. If
the total elapsed time is much more than that, feel free to explicitly
reiterate the actual time used by your program in your README. This
would be in addition to, not instead of, the output of the time
command.

Actually, there are a bunch of totally degenerate triangles in
shell_easy.out. Can I just ignore them?

Yes. It's fine (even wise) to catch degenerate triangles in your
preprocessing phase and toss them out. Also, I put a new file
shell_new.out into the proj2 directory, which you can use in place of
shell_easy.out if you wish.

------------------------------------------------------------------------

My program crashes for no apparent reason in the shell_easy scene,
even though I think my interpolation code is correct.

Be warned that there is a degenerate triangle in shell_easy which has
the same second and third vertices. If you're having troubles with
this scene, it could be that your intersection routines assume that
all vertices are distinct.

So if you're having unexplained crashes on shell_easy, you could
change the "0" on line 1714 of shell_easy.out to something like ".01".
There will be no visible difference in your resulting image file.

Thanks to Gerald Cheong for pointing this out.

------------------------------------------------------------------------

Can you be more specific about the timings we need to get a decent
grade? (Variant: Can you be more specific about the timings we need
to have the fastest raytracer in the universe?)

Here are the numbers from the top ten raytracers from last year, in
seconds.

test1 test2 test3
17 42 37
41 43 42
15 153 44
27 88 46
28 71 61
25 87 57
21 137 65
23 120 69
30 135 54
29 146 59

Here's the criteria we used last year for test2 (and test3):

time (seconds) points deducted
-------------- ---------------
< 120 0
< 200 1
< 300 2
< 600 3
< 1000 4
< 2000 5
< 3000 6
< 5000 7
< 10000 8
> 10000 10

We reserve the right to change the exact timings to points mapping,
for instance if the range this year is considerably different from
last year. Treat this as a guide, not a guarantee!

------------------------------------------------------------------------

What's a good way to figure out whether a triangle is inside a voxel?

There are a whole lot of algorithms out there. Many of them are
related to polygon clipping - it's a similar problem to clipping
triangles to a viewing volume. Essentially, you need to intersect the
triangle edges against each of the six faces. You can make it quite
fast since you know your voxel cube is axis-aligned. Some of the
algorithms are highly optimized (i.e. Sutherland-Hodgman clipping,
discussed in FvDFH pp 124-127). If you're even more motivated, you can
take advantage of the fact that you don't actually need the
intersection points, just a yes/no answer.

While it's nice if your preprocessing code is O(n log n) instead of
O(n^2), keep in mind that you're graded on the speeed of walking your
hierarchies, not constructing them.

What's a good way to figure out whether a sphere is inside a voxel?

It's easy to find the closest point inside the volume of a voxel to
the sphere origin. Do one dimension at a time e.g. x, y, and z
separately. For example to compute x, the x coord would either be the
minimum x of the voxel, the maximum x of the voxel, or the sphere
origin's x.

Once you've computed this closest point, you need to compare the
distance between that point and the sphere origin to the sphere
radius. There's an intersection if:

(distance to the closest cube point <= sphere radius &&
distance to the farthest cube point >= sphere radius)

You need to check against the farthest point so that you don't think
there's an intersection when the voxel is totally enclosed by a large
sphere. You can find it using a similar algorithm to the way you found
the closest point.

You could also use another method of simply finding a bounding box for
the sphere and then checking the bounding box faces against the
(axis-aligned) voxel faces as above. It would be slightly less
accurate than the direct sphere test. You wouldn't miss any spheres,
but you'd sometimes count a sphere as being a candidate object that
you could have thrown out using the more precise test. The method
above is pretty easy, so maybe you should just use that.

------------------------------------------------------------------------

How do I use barycentric coordinates for interpolating?

It's

P = P1 + s2 (P2 - P1) + s3 (P3 - P1)

This is exactly the same as what you need for Badouel's Graphics Gem.
In his notation, it's

N = N0 + a (N1-N0) + b (N2-N0)

Note that in the Badouel text he gives

N = (1 - (a+b)) N0 + a N1 + b N2

which turns into the above equation with a little bit of algebra.

I know we have to normalize the interpolated normal, what about the
interpolated diffColor, ambColor ? I guess not, but just want to
confirm with you.

You are correct. No need to normalize diffColor, ambColor, etc:
they're 3-channel quantities. You do need to normalize the normals
since they're assumed to be vectors of unit length.

------------------------------------------------------------------------

If I don't specify an output file, I'd like to start up the GUI as
usual and draw the newly rendered scene into the canvas. However, I
get a "can't open display" error if I call "Render" and then
"LiftOff."

Consider specifying an input file to be the equivalent of the
LoadScene button, and the output file to be the equivalent of the
Render button. So if they specify both then you don't put up a window
at all. Never call LiftOff, just render and saveppm and exit. If they
specify just the input button you do LoadScene before LiftOff but do
*not* do a Render yet - that's now under the control of the
interactive user and won't happen til they hit that button.

I can't top to work on the raptors.

Remember, you're supposed to run your final timings on the firebirds,
not the raptors. This info is just in case you'd like to know on the
raptors.

Short answer: on both the raptors and the firebirds you can get the
total memory usage in KB by running
getsize <your process name>

Long answer:

The leland folks don't want to enable the Irix version because of
security concerns, and their standard version won't run under Irix6.4
yet. The long workaround (which is encapsulated in the getsize script,
which I just wrote and put into /usr/class/cs348b/bin/) is to do

ps -el | grep <your process name>

You'll see something like

raptor2:~> ps -el | grep myrt2
b0 R 2718 25795 16119 0 20 20 * 428:288 - pts/0 0:01 myrt2

Multiply the number before the colon by 4 to get the total memory
usage of your program in kilobytes (since ps uses the units of page
size).

Also, occasionally top will flake out on the firebirds if the machine
gets confused. This problem will go away if you wait long enough, or
you can just use the fabulous new getsize command.

Should I switch from C++ to C for HW2 because of the overhead?

No. Use your language of comfort.

You need to change the asymptotic behavior of your program, e.g. from
12 hours to 30 seconds for a big scene. The C vs. C++ difference is
negligible compared to this!

*After* you've dealt with & debugged all the major stuff, you might
chose to turn a few intensively used functions into inlines as you
tune inner loops.

------------------------------------------------------------------------

What are some strategies for automatic bounding volume calculation?
The Glassner book doesn't really cover it. [Asked during problem
session]

Even simple schemes will give you some reasonable performance
benefits. For instance, a *really* simple approach is to just start
dicing up your polygon meshes objects into chunks of say 10 polygons,
and test to make sure those chunks are "good" by some metric - for
example that their total bounding box is within a certain size. If the
chunk is bad, throw away one or two polygons and try again.

For some more ideas, take a look at the archived online version of
Eric Haines' Ray Tracing News. The issues from the late 80's have some
extremely relevant discussions of exactly what you're supposed to be
thinking about: what's the best algorithmic way to speed up your
raytracer. Lots of discussions about the merits of various schemes and
options for hybridizing them, lots of stuff about the automatic
bounding volume generation problem. See for example the forward and
backward threads from
http://www.povray.org/rtn/rtnews1b.html#art7
http://www.povray.org/rtn/rtnews3a.html#art6

Plus it's a great way for them to peek in on the thought processes of
some of the major graphics researchers at work! The level of
discussion is of very high caliber. Some of this stuff clearly made it
into the Glassner book, but some of it didn't. Reading through the
entire archive back to front is certainly not the worst thing you
could do with your time :)

------------------------------------------------------------------------

Help! I can't get pixie or gdb or purify to work.

Here's my current understanding: you can run pixie and purify only on
the firebirds, and you'll have to tweak your Makefiles. You can't run
gdb at all. My previous email about purify was incomplete.

To run pixie/prof, change the following things in your Makefile
(presumably you'll have either gcc or g++ but not both):

"gcc" --> "cc -32"
"g++" --> "CC -32"
"-lxsupport" --> "-lxsupport32".

If you're low on disk space you might need to use the /tmp directory
since pixie generates many large files. Try something like this:

% pixie -directory /tmp/bar example.iris4d
% setenv LD_LIBRARY_PATH /tmp/bar
% /tmp/bar/example.iris4d.pixie
% prof -pixie /tmp/bar/example.iris4d /tmp/bar/example.iris4d.Counts > profiledata

To run purify, make the same Makefile changes as above. For your
purify target in your Makefile you'll want something like this:

$(TARGET).pure: $(OBJS) Makefile
purify -cache-dir=/tmp cc -32 $(DEBUG) -o $@ $(OBJS) \
-L$(BASEDIR)/lib/$(HOSTTYPE) \
$(CS348B_LIBS) -lxsupport32 -lXm -lXt -lX11 -lm

Sorry that this is so convoluted. The SGI cluster is clearly not at
the top of the list for the leland system administration types...

[More details for those who might care: I made a new version of the
xsupport libraries that's in the old format (-32) instead of the new
format (-n32). It's in the same directory as the old one, called
libxsupport32.a. Supposedly you can run "gcc -V2.7.2.2" to get
old-style libraries, but I couldn't get this to work.]

------------------------------------------------------------------------

How does the addition of per-vertex normals affect the ray/plane intersect
routine?

It doesn't affect the _intersection_ at all. The visible polygons
are exactly the same ones as before. However, once you've found which
polygon you hit, the normal you return might be different.

Your intersect routine uses the real normal. Don't even think
about the per-vertex normals until you've decided which face you hit.

How about sample "reasonable" rendering times?

There's one set of numbers in the help notes handout. Here's
another set, from someone else's code, just to give you another sample
point. Remember, don't panic if you don't exactly match these numbers!

Test: Unoptimized Optimized
----- ----------- ---------
test1: 19.3 hours 33 seconds
test2: 2.1 hours 13 minutes
test3: 3.9 hours 2 min, 27 seconds

For PP2, it says we can ignore refraction. Can we tear out our whole
transmitted ray code as well? That is, do we need to trace rays into
transparent objects in the test scenes for pp1?

You could, but you might want to keep it around for project3. Consider
leaving it accessible through an option button on your UI panel, so
that you can toggle between "Proj2 specs" and "Proj1 quality". Or if
you're very concerned about speed, you could #ifdef TRANSPARENCY.

But to specifically answer your questions, none of the testcases will
have any material with a ktran greater than 0.

Are we required to fix minor errors in PP! (my background for test1 was
washed out)?

No, you're not required to fix minor errors in PP1, unless they prevent
us from verifying that you're raytracing to the required depth, etc.
We are not doing a side-by-side comparison with snoop on this one.
Send me mail directly if you want specific feedback about which errors
are minor and which are major.

Can we trade off a little in image quality to increase speed? For
instance, is subsampling allowed?

No. You have to trace a ray per pixel.

Would it be good or bad to fine-tune (perhaps manually) bounding
volumes for the test scenes, and have the program read the bounding
volume data from a file at run time? What I'm getting at is the
difference between (1) precomputation that cannot be reproduced by
running the program, and (2) the precomputation phase of the program
itself, i.e., everything the program does before entering the main
ray-tracing loop. Any insight appreciated!

Bad. Don't do anything manually. You should let your program run
automatically. As the assignment states, you shouldn't optimize for
these specific testcases. Your raytracer should work the same on
scenes you haven't seen. Your raytracer can reorganize the bounding
volumes all it wants, but it should be automatic.

So here's the question I bet everybody's asking:
If I pursue acceleration strategy X (BSP Trees in this case), will I have
a chance at achieving decent timing runs?

Decent? Sure. I don't know the times of any raytracer that did BSP
trees, but I'm sure that it would at least reduce the problem to a
matter of minutes (say, under 10), rather than hours, and that would
be enough to get most of the points on the project. Who knows, maybe
it will crush all other raytracers. :-)

Most of the 50 points for speed will be based on reducing the asymptotic
order of your program from O(N) per ray (where N is the number of polygons)
to something smaller, approximating O(log(N)). So if you get below
10 minutes or so per scene, you should be getting at least 40 of the
50 points (roughly). The last 10 points are to reward people who
put in a lot of effort (or get lucky) to get the constant factor low,
not to punish those who tried other ideas.

I want to read more about the <foo> technique which is reference <bar>
in the Arvo chapter from the Glassner book. How can I get it?

First look in the Math/CS library. It's a great resource - one of the
most complete university libraries anywhere. You can either go in
person or check the online catalog (see http://www-sul.stanford.edu
for more info on hours and the catalog, called Socrates).

*If and only if* you have already searched the library and were unable
to find a copy of an article, then send me mail and I'll see if
somebody in the graphics group has it, in which case I can make a
copy.

My octree node includes a list of object pointers which
point to the objects in SceneIO structure.

However, in order to save the octree information to file,
I need to give each object an Id for future restoring the
octree structure.

So I am going to give each object an Id by the sequence
of the objects in the SceneIO object list. My question is
will the sequence of the object list always be unique ?

Assuming the file hasn't changed, the object tree will always
be the same.

How does the image change if you use per-vertex normals?

It would look smooth, rather than faceted. For meshes with per-vertex
normals, you should have a hard time telling where the individual
polygons are (except for at the silhouette edges).

There are created two extra scenes, shell.out and shell_new.out, in
/usr/class/cs348b/proj2/, along with sample raytraced images. These
scenes are not official testcases; you don't need to submit pictures
of them. They are only to help people test their per-vertex code.

In order to include the bounding structure, I added a field in the
scene_io.h file to both the spheres and the polygons. However, it
seems like the program has problems reading in the file properly,
because it uses the read_scene function which is malloced at the old
sizeof(SphereIO), etc...

How can i get around this? I mean, am I supposed to recompile the
scene_io.c using the new structures and then link it in? Any clues
on how this would be done? (I was trying to do it without much
success)

For this project, you'll probably want to create your own data
structures. For example, you'll probably want to be able to
break apart polysets into the individual polygons. You can base
your data structures on the scene_IO if you choose. But when you
read in the file, traverse the tree, copy it to your own data
structures, and then delete the scene_IO tree. This way you'll
have complete control over your data structures, and it will be
more modular/portable.

Alternatively, I suppose you could just build a data structure that
stores the additional information, and a pointer to the scene_io
objects. But you're probably going to find it difficult to write
an efficient raytracer without creating your own data structures
of some sort.

We encouraged you to use the standard data structures in P1 since your
data structure choices should be dictates by the specific approach you
take in P2.

My triangle edges are too noticable. Any guesses on what's wrong with
my interpolation code?

Make sure you normalize your interpolated normals! If you don't, your
normal will have length less than 1. If you then raise this number to
a power when computing the specular component, it will turn the
specular dark on the insides of the triangles.

I've really looked through my code very carefully and modified
anything I can think of, but it still runs slow.

I wonder if the <insert your hypothesis here - for example "dynamic
hash table scheme for finding the next voxel"> is the bottleneck.

It could be. I don't know for sure. A good way to find out is
to profile your program. To do that, run:

pixie myrender
myrender.pixie <arguments> <-- this will write out a huge file
prof -pixie myrender <-- this will analyze the file, and
print stats about where the
program spends most its time

I'm confused about one thing. So when we add a new subdivision
plane and say we cut a sphere into two - what do I do? Polys are
easy to split - but should the sphere just straddle both sides of
the plane. I'm also a little confused as to why we need to split
polys when the plane cuts them - it seems we would be doing the same
as leaving on poly.

This is related to the octree problem shown in the diagram on the top
of p.226 of Glassner, where you have an object that touches the current
volume, but when you compute the intersection, the actual intersection
with the ray is back beyond the current volume. Say you get a time
of intersection t=3, but your curent BSP node only covers t=1 to 2.
You still have to check other nodes (between t=2 and t=3) to make
sure that no other object blocks the ray first.

So you could solve this by not cutting polygons or spheres at all,
and just using a mailbox technique (as described for octrees).
That way, you would only compute an intersection with an object
once per ray.

------------------------------------------------------------------------

This FAQ is mainly based on actual questions sent to TAs in previous
years. Thanks to last year's TA Lucas Pereira for the answers, and
last year's students for the questions. Thanks in advance to you for
your questions, which are added to the top as they roll in. New questions
are added to the top of the document with timestamps, so that it's
easy to find the new stuff.
-Tamara

Since the programming assignment did not change, this document
is and extremely useful source of information. Continuing the tradition,
we will update the document with your questions. Thanks to last year's TA for
the answers and for providing help.
Elena

Updated by lena@graphics