Marc Levoy
Computer Science Department
Stanford University
Click on a picture to see an uncompressed .tif file.
Click on the string (where present) below the picture to examine my original
experimental record, which describes in detail how the image was computed. For
the first image, clicking on the string runs an interactive demo (on SGIs
only).
![]()
Inventor viewer showing cyl0_tex.iv
This image shows a test scene consisting of several cubes and spheres, some
textured, all enclosed in a black-walled box. Due to this enclosing box, if
you invoke the demo (by clicking on the string below the picture), you won't
see anything initially. To get inside the enclosure, zoom in by dragging down
with the left and middle button depressed at the same time (in the usual
Inventor way). The brown plane shows the position and orientation of the
flatland light field. Also indicated is the line of camera positions (red
line) and the focal line (yellow line). Although these are indicated as
straight lines in this visualization, the actual light field is a circle around
the center of the scene. I should fix this aspect of the visualization.
Here is the resulting flatland light field, shown in the form of a
sinugram. This is precisely the sort of data that would be produced by
one revolution of a non-spiral computed tomography (CT) scanner, except that
these projections are opaque rather than transmissive.
For a scene consisting of a single point, i.e. an impulse scene, the
sinugram of the resulting flatland light field is a vertical line as shown in
cell a1. Cell b1 shows an unfiltered backprojection of this sinugram, which
exhibits the characteristic 1/r falloff (see the closeup to its right). Cell
b4 shows a filtered backprojection, yielding the correct reconstruction of the
original point (see the closeup to its right). The backprojection filter is
shown in cell b6 (magnified and plotted in the 2nd from lower-left closeup),
and the filtered sinugram is shown in cell a4. Cells b2 and b3 were used to
verify that all backprojections are properly aligned, in this case that they
cross in the exact center of the scene. Problems in alignment have been shown
to severely degrade reconstructions.
Returning to the original opaque scene, the left image shows filtered
backprojections of each quadrant of the circle followed by the synthetic
camera. Below these is the sum of these four backprojections (cell c5) and its
absolute value (cell c6). In this example, all objects in the scene are
textured. In other images in this document, some are untextured. For
reference, the second and third images show unfiltered (cell d5) and filtered
(cell c5) backprojection of a completely untextured (second image) and
partially textured (third image) scene.
Backprojection of an opaque scene cannot correctly reconstruct the
scene; any attempt to do so produces artifacts. Features on the convex
hull of objects, where at least 180 degrees of rays are unoccluded, are
reconstructed well, although not perfectly (see discussion of color pollution
in epi_erase. Features not on the convex hull fare
worse. In particular, look at the weak left wall of the red cube and the weak
upper-left wall of the tilted cyan cube. The latter is interesting; the shape
of these "ripples" are directly attributable to the extent of the non-occluded
arc of camera views available to reconstruct this wall. The narrower the arc,
the narrower the baseline (in the sense of a shape-from-stereo algorithm), and
the thinner the ripples. Also interesting is the central cube. The top and
side walls are non-occluded over about 90 degrees, unbroken in the case of the
left and right walls, broken into two sections in the case of the top wall.
All three walls are reasonably well reconstructed. The bottom wall are
non-occluded over only 45 degrees, and the reconstruction is much worse. I
despair of finding an operator that can extract these rippled walls. For
sample results, see epi_newgrads2. For an
interesting interpretation of these artifacts as visibility events, see epi_badgrads.
This final example shows that sinugrams with excessively high frequences (these
also contain aliasing) lead to reconstructions with severe artifacts.
See below for Dwight Nishimura's suggestion of
smoother backprojection filters.
Conclusion:
Given a well textured scene imaged to form an alias-free light field, features
that lie on the convex hull of the scene, or that are visible over a arc,
broken or unbroken, of about 45 degrees or more, can be reconstructed
adequately. These visibility requirements are more likely to be satisfied in
3D than in flatland.
Applying a gradient operator to a filtered backprojection yields a sort of
occupancy mask for the scene. This image compares two different gradient
operators. The operator used to compute cell g8 consists of convolving the
filtered backprojection (cell c5) by a horizontal quadrature filter pair (a
gaussian and its derivative) (see
epi_4in1orthoflatbricks_thickpostgrad), summing the absolute value of the
two quadrature components, convolving the backprojection by a vertical
quadrature filter pair, summing those two, taking the max over the three color
components (R,G, and B), and taking the sum over the two directions (horizontal
and vertical). I have tried various combinations of sums and maxes; this
scheme seems best. Note that summing the absolute value of the quadrature
mirror pair is close to the formal definition of the magnitude of the response
of a quadrature mirror pair of filters, which is the sqrt of the sum of the
squared responses. The operator used to compute cell q8 consists of applying
only horizontal filters to the filtered backprojection of the front and back
quadrants of the circle (cells c1 and c3) and only vertical filters to the
backprojection of the left and right quadrants (cells c2 and c4), rather than
first summing the projections (to form cell c5). The former solution (running
on cell c5) produces a cleaner mask (cell g8), as seen in the closeups at
right.
This image was designed to emphasize the streak artifacts that arise
from running a gradient operator on a scene containing occlusions. In cells g6
and g7, the scene is untextured; in cells g9 and g10, some objects are
textured. Cells g6 and g7 (and cells g9 and g10) differ in the choice of the
operator. The resulting occupancy masks show many streaks. In other
experiments, I tried irregular textures (see epi_noise,2). Those also lead to poor
occupancy masks.
Given a 1D sinusoidal texture and a sinusoidal filter of the same frequency, a
quadrature mirror pair suffices, at least in theory, to detect the texture
regardless of its phase. The quality of this detection can be degraded
by several factors:
Looking at one quadrant of a circular light field (see cell c1 in epi_newgrads2), the operations involved in taking the
gradient of each row of a filtered backprojection is seen to be equivalent to
applying a horizontal gradient operator to the image from a camera focused at
the depth corresponding to that row. This means that my shape-from-light-field
operator is equivalent to a popular shape-from-focus operator, except
that in my case the aperture of the camera is enormous - 90 degrees of arc!
Taking this analogy further, applying my operator to the backprojection of a
full-surround light field (cell c5) is equivalent to shape-from-focus with a
360-degree aperture, the ultimate fish-eye lens! My introduction of a
backprojection formulation facilitates this extension of shape-from-focus
algorithms and is a real (i.e. publishable) contribution.
Another interesting relationship is between my operators and
shape-from-epipolar-volumes. Looking for a strong horizontal gradient
on a particular row of a backprojection is equivalent to looking for a pattern
of parallel stripes of a particular orientation in line space. What it detects
is an object surface parallel to the voxel row. Looking for a gradient in
another direction in a backprojection is equivalent to looking for a pattern of
converging stripes, something I have not seen in the literature. What this
detects is an oblique surface. Analyzing the epipolar volume has one advantage
over my pipeline; by taking the absolute value of the derivative along a
stripe, it can make itself immune to stripes that start out as valleys, then
become peaks. This occurs in specular highlights. However, if surfaces are
diffuse, valleys don't become peaks along their length, so the absolute value
can be delayed until later, as it is in my pipeline. For another comparison to
shape-from-epipolar volumes, see epi_reproject2.
By the way, it is interesting to note that the streaks in these occupancy masks
each connect two extrema of the objects in the scene, i.e. occlusion events.
Thus, the set of all such streaks is essentially a discrete representation of
the visibility aspect graph.
Finally, the only alternative to gradient operators that I can think of is
looking for color differences (assuming a distinguishable background).
However, this only works on the line hull of the scene. It is essentially a
shape-from-silhouettes algorithm such as has been proposed by
Richard Szeliski. See epi_silhouette for an experiment
along these lines.
See below for another idea, a
visibility-dependent gradient operator.
Conclusion:
Given a textured scene and a well-matched gradient operator, an adequate
occupancy mask can be obtained for features whose reconstruction is not ruined
by too much occlusion. The match between texture and gradient operator is
delicate.
If each backprojected ray is attenuated by its passage through the occupancy
mask (visibility masking), and color is deposited in the voxels encountered by
the ray only to the extent those voxels are opaque (occupancy masking), the
result is akin to casting a colored light through a semi-transparent scene. It
is also akin to the shadow pass of a volume rendering algorithm with
shadows. Products of such a backprojection include a light strength buffer
a.k.a. shadow map (cell l1), a buffer indicating how long each camera
ray was visible, a.k.a. the incremental erasure mask (cell k1),
a.k.a. visibility mask, indicating how long each pixel in the sinugram
(i.e. each ray) was visible (i.e. how long before it was terminated), and an
occupancy-masked and visibility-masked colored cross section (cell n2).
By comparison, cell i2 shows a colored cross section with occupancy masking but
no visibility masking, i.e. color is multiplied by occupancy, but rays are not
terminated. Note that the polluting effect of the white sphere on the other
objects in the scene, visible in cell i2, is removed from the cubes in cell n2.
This removal of color pollution is exactly what we want to see. However,
nowhere in this example is any attempt made to use occupancy masking to improve
the occupancy mask itself. This is done in the next example.
Since the colored cross section produced by occupancy-masked backprojection is
free from pollution by unrelated objects in the scene, an occupancy mask
generated by taking the gradient of this cross section should be cleaner than
the original occupany mask. This image shows two iterations of such a
pipeline. The improvement is definite but not dramatic; the first iteration,
looks good, but everything starts disintegrating in the second. Furthermore,
the walls start thinning almost as fast as the interiors.
This pipeline is similar to a shape-from-epipolar-volumes with erasure
algorithm Leonard McMillan once described to me in which, after he detected a
left-leaning stripe from the epipolar volume (equivalent in my formulation
to deciding that a voxel near the front of the scene is occupied), he erased
that stripe from the epipolar volume (equivalent to terminating the
backprojection of all rays passing through that voxel) before looking for
less-inclined stripes (equivalent to backprojecting the remaining rays to the
next voxel row and rerunning my gradient operator). I don't do this; I rerun
the gradient operator only once, after the occupancy-masked backprojection is
done. In this sense, his algorithm is like Gauss-Seidel iteration and mine is
like Jacobi iteration.
There are four problems that are potentially addressed by occupancy-masked and,
more importantly, by visibility-masked backprojection, i.e. by ray erasure:
Conclusion:
For scenes that contain opaque objects, volume rendering is the appropriate
method of backprojection, at least once an occupancy mask is available. In
fact, it is the only way to map colors from a sinugram onto the objects in an
extracted scene. However, its utility for actually improving the occupancy
mask is only slight.
This image shows several cycles of backprojection / forward projection
iteration on untextured and textured scenes. Each backprojection produces a
new colored cross section (see epi_color_correct3 for an
example sequence of cross sections), and each forward projection produces a new
sinugram. Unfiltered backprojection is used. The backprojections and
forward projections both employ volume rendering with an occupancy mask
produced from a filtered backprojection using the gradient operator already
described. A correction step consists of computing a difference cross section
by backprojecting the difference between the most recent forward projection and
the original sinugram, then adding this difference cross section to the most
recent cross section to obtain a new cross section. Note, however, that no
attempt is made to improve the occupancy mask. In this respect, the iteration
is not useful to my goal of reconstructing a 3D shape. I address this
deficiency below.
By the way, this iteration is akin to the Algebraic Reconstruction Technique
(ART) used to reconstruct CT data, but it is not equivalent to it. This
experiment shows that, in keeping with results in the ART literature, and
despite the use of volume rendering instead of (additive) integration, the
iteration converges, as evidenced by the close match of the final forward
projection to the original sinugram. I have tried many variations of this
iteration, most notably using filtered backprojection, which usually converges
faster but sometimes introduces artifacts due to the presence of occlusions in
the scene.
Here is the result of an early attempt to improve the occupancy mask on each
iteration. In this example, I compute unmasked (by the occupancy mask)
backprojections of the difference between the most recent forward projection
and the original sinugram, add this difference backprojection to the original
backprojection, and reapply the gradient operator to this sum. The resulting
sequence of occupancy masks and colored cross sections are shown from left to
right. The original unmasked backprojection is shown in cell c6. and a later,
modified, unmasked backprojection is shown in cell m6. By the way, this light
field happens to be planar rather than cylindrical, and the backprojections are
filtered rather than unfiltered. In addition to recomputing the mask after
each backprojection, I gradually increase overall opacity, which is
perhaps akin to decreasing temperature in a simulated annealing algorithm.
Comparing cell q8 to cell i8, the weak parts of the mask definitely strengthen
on each iteration. To evaluate the efficacy of iterating, in cell i3 I crank
up the opacity on the first forward projection. This doesn't work as well, as
evidenced by the lack of opacity of the middle cube, which produces the middle
near-vertical gray stripe in cell i3. On the other hand, note that each
iteration step introduces noise into the mask, as evidenced by the brown fuzz
in cell t3. This sensitivity to noise is also endemic to ART, according to the
literature.
Here is the result of a much later attempt to improve the mask, one that shows
improved robustness with respect to noise. In this example, new colored cross
sections are computed (as in epi_color_correct5) using unfiltered
backprojection, then the ratio of the new cross section to the old cross
section is computed. A blurred version of this ratio is shown in cells s5 and
dd5. Then the filtered but unmasked backprojection of each original sinugram
is multiplied by this ratio, as shown in cells u9 and ff9, and a new mask is
computed from scratch, yielding cells w10 and hh10. Compared to the earlier
experiment above, the judicious use of unfiltered backprojections is probably
responsible for reducing the noise, and a cleaner algebraic formulation
(evident only by looking at the Tcl code) removes the need for many arbitrary
scaling parameters, but the overall result is not dramatically better.
To help understand the limitations of my iterative approach, I tried
backprojecting the input sinugram using a perfect occupancy mask (cell
b5) computed synthetically from the Inventor file. The resulting colored cross
section is shown in cell l5. The imperfections are due partially to the
approximations inherent in volume rendering (e.g. resampling and digital
compositing) and partially to the presence of occlusions in the scene. I then
tried forward projecting this cross section, producing imperfect sinugrams m1
and m3. To help me distinguish imperfections due to occlusions from those due
to volume rendering, I also tried forward projecting a perfect colored cross
section (cell a5), producing sinugrams m5 and m7. Imperfections in these
results are due solely to volume rendering.
In addition to these causes of error, noise in the occupancy mask can cause
self-shadowing noise in forward projections (see epi_color_correct,2). Blurring
the mask helps. A shadow bias might also help. This reminds me of similar
problems in shadowed volume rendering.
The artifacts of iterative reconstruction fall into four classes:
Conclusion #1:
Backprojection and forward projection suffer from artifacts due to scene
occlusions, imperfect resampling, and the digital compositing approximation of
volume rendering. In addition, textured or thick occupany masks lead to
texture interference or self-shadowing during projection. These artifacts make
iterative techniques perform poorly.
Conclusion #2:
I have not yet discovered a method for employing iteration to improve the
*geometric* character of an occupancy mask degraded by the presence of
occlusions. I have only succeeded in making it more opaque. I suspect that a
higher-level representation of the geometry is necessary, for example by
reconstructing a polygonal (i.e. piecewise linear) representation of the scene
and then applying Taubin-like smoothing to it.
Discussion:
I'm not sure I've conclusively proven this second point. I haven't tried
applying very large scale smoothing, or dilation / erosion, to the masks.
However, given the other artifacts listed in conclusion #1, I'm not sure it's
worth the effort.
Many researchers have described the irradiance vector field, but nobody has
ever made a picture of one. Well, here it is. This image shows the Line
Integal Convolution (LIC) of a vector light field produced by placing a circle
of point light sources around my flatland scene. Actually, what this image
shows is a vector visibility field, not a light field, but in fact these
are identical; the scalar irradiance at each point in space becomes the
quantitative visibility of each point, and the irradiance vector at
each point becomes a sort of direction of maximum visibility. This
image includes some backprojections, occupancy masks, visibility images (which
look like shadow maps, as expected), and the X,Y magnitudes and direction of
the vector light field.
I have developed two reasonably successful but completely different pipelines
for extracting a polyline contour from my backprojections of light fields. In
this first example, I perform filtered occupancy-masked backprojection to
produce a shadow image (cell f6). Ignoring the occupancy mask, I perform a
thresholding operation on the shadow image. After some experimentation, I
found that the most successful shadow image was obtained by computing at each
voxel the maximum light strength (over any view) that reaches the voxel, rather
than the total light strength at that voxel as in volume rendering. In
retrospect, this pipeline isn't as strange as it might seem. I am merely
computing the L-infinity norm of visibility at each voxel; if it was
seen from any camera, then it is unoccupied. This pipeline has two problems.
First, as with any approach that relies on maxes, it may not be robust. The
slightest hole in the occupancy mask will result in a slash through the
object's geometry. Second, the use of a visibility threshold means that
the choice of threshold level is critical; slight changes in level move the
extracted contour in space.
In this second example, I work from the occupancy mask (cell b1) rather than
the shadow image. First, I convert the ridges of the occupancy mask into zero
crossings (cell c1), then extract zero crossings to produce a set of contours
(cell b2). My justification for this pipeline is that ridge-following
algorithms are notoriously unrobust (although maybe I should try one anyway),
but zero-crossing algorithms are very robust (as shown in Brian
Curless's thesis). To my knowledge, in order to convert ridges into zero
crossings, one must know the direction perpendicular to the ridge
locally. Amazingly, this is precisely the information given by the vector
light field! So, I perform the conversion by computing the gradient of the
(blurred) occupancy mask in the direction given by (i.e. steered by) a
(blurred) vector light field. The pipeline contains one more step. Since the
gradient image is full of zeros, extracting its zero crossings produces many
garbage contours (cell b2). I mask these contours using a thresholded version
of the occupancy mask. This eliminates all but the desired zero crossings
(cell c2). The closeups at bottom show the contour from one cube superimposed
on a synthetically generated (gold standard) slice of the scene.
In general, this pipeline is superior to the first one because it is not
sensitive to the selection of a threshold level (it is not particularly
sensitive to the threshold used during the masking step); the extracted
contours are always correctly placed in space. However, since the steered
gradient operator depends on the vector light field, which becomes unreliable
at saddle points and in densely occluded areas, the pipeline may be fragile.
Aside from this fragility, there will of course be failures wherever the
occupancy mask is of poor quality, i.e. in regions of poor visibility or
untextured objects.
Conclusion:
Given an adequate occupancy mask, a good vector light field and, consequently,
a good set of contours, can be derived from it.
Smoother backprojection filters (his idea).
Jain's book suggests several backprojection filters that produce less moire in
the backprojection. (I noticed such moire only if I severely contrast-enhanced
my backprojections of points.) These filters might, however, blur the
backprojection of all features slightly. It doesn't seem like this
will make much of a difference to me either way.
Visibility-dependent gradient operator (my idea).
Given a first guess at an occupancy mask, suppose I computed for each voxel the
set of directions over which it is visible. From this, I could determine the
likely (occlusion-degraded) shape of the backprojection of any (textured) wall
that might be present at the voxel. I would not, however, know its direction,
so I would have to look for all directions. Could I use such an operator to
detect the rippled backprojections of semi-occluded walls? If I successfully
detected such a wall, I could use this information in an iterative algorithm,
by recomputing the same visibility information at each voxel and looping.
Filtered backprojection of light fields
![]()
(not in epi_record.html),
epi_back_untex,
epi_back_tex
![]()
epi_4in1orthoflatgrid_aliasing
Gradient operators and occupancy masks
Why is it hard to generate a good occupancy mask?
These factors cannot be completely eliminated, and they conspire to produce
occupancy masks whose features are either textured, or thick, or both. These
artifacts severely degrade the performance of iterative reconstruction
algorithms, as discussed below.
Relationship to other vision algorithms
Occupancy-masked and visibility-masked backprojection
Why doesn't ray erasure lead to better masks?
On the other hand, ray erasure introduces a real danger. In McMillan's
formulation, if he is too quick to detect and erase a stripe, he may miss a
stronger but slightly less inclined stripe. This is equivalent in my
formulation to the self-shadowing problem (see epi_forward_cross2 below). Moreover, since the
stripe he detects becomes his reconstruction of the scene, he is in danger of
spatially shifting features from their correct positions. I take great care to
avoid this (see epi_contour3 and epi_masked_contour).
Iterative backprojection / forward projection
Can anything be done to save iterative reconstruction algorithms?
The vector light (visibility) field
Cell a1: original epipolar image, slab 0 (front)
Cell j9: filtered backprojection of original, sum of all slabs
Cell d6: masked filtered backprojection using mask in f7
Cell f7: occupancy mask derived from j9 in the usual way
Cell f1: shadow image from masked backprojection of slab 0 (front)
Cell e1: visibility image from same backprojection
Cell k5: unnormalized X-component of vector field from same
Cell l5: unnormalized Y-component
Cell f5: sum of shadow images
Cell g5: magnitude of vector field (sqrt of X^2 + Y^2)
Cell k10: normalized X-component (normalized by magnitude)
Cell l10: normalized Y-component
Cell k11: LIC visualization of normalized (unblurred) vector field
Cell l10: gradient of occupancy mask along (blurred) vector field lines
Extracting geometry from backprojections
September 3 meeting with Dwight Nishimura
Printed down to here for Shape From Light Fields notebook,
September 4, 1997
To do
© 1997
Marc Levoy
Last update:
March 4, 2003 10:14:44 PM
levoy@cs.stanford.edu