Shape from Light Fields Using Filtered Backprojection

Summary and discussion

Marc Levoy
Computer Science Department
Stanford University


Click on a picture to see an uncompressed .tif file.
Click on the string (where present) below the picture to examine my original experimental record, which describes in detail how the image was computed. For the first image, clicking on the string runs an interactive demo (on SGIs only).


Flatland light fields


Inventor viewer showing cyl0_tex.iv

This image shows a test scene consisting of several cubes and spheres, some textured, all enclosed in a black-walled box. Due to this enclosing box, if you invoke the demo (by clicking on the string below the picture), you won't see anything initially. To get inside the enclosure, zoom in by dragging down with the left and middle button depressed at the same time (in the usual Inventor way). The brown plane shows the position and orientation of the flatland light field. Also indicated is the line of camera positions (red line) and the focal line (yellow line). Although these are indicated as straight lines in this visualization, the actual light field is a circle around the center of the scene. I should fix this aspect of the visualization.

Here is the resulting flatland light field, shown in the form of a sinugram. This is precisely the sort of data that would be produced by one revolution of a non-spiral computed tomography (CT) scanner, except that these projections are opaque rather than transmissive.


Filtered backprojection of light fields


epi_4in1point_convolve4

For a scene consisting of a single point, i.e. an impulse scene, the sinugram of the resulting flatland light field is a vertical line as shown in cell a1. Cell b1 shows an unfiltered backprojection of this sinugram, which exhibits the characteristic 1/r falloff (see the closeup to its right). Cell b4 shows a filtered backprojection, yielding the correct reconstruction of the original point (see the closeup to its right). The backprojection filter is shown in cell b6 (magnified and plotted in the 2nd from lower-left closeup), and the filtered sinugram is shown in cell a4. Cells b2 and b3 were used to verify that all backprojections are properly aligned, in this case that they cross in the exact center of the scene. Problems in alignment have been shown to severely degrade reconstructions.


(not in epi_record.html), epi_back_untex, epi_back_tex

Returning to the original opaque scene, the left image shows filtered backprojections of each quadrant of the circle followed by the synthetic camera. Below these is the sum of these four backprojections (cell c5) and its absolute value (cell c6). In this example, all objects in the scene are textured. In other images in this document, some are untextured. For reference, the second and third images show unfiltered (cell d5) and filtered (cell c5) backprojection of a completely untextured (second image) and partially textured (third image) scene.

Backprojection of an opaque scene cannot correctly reconstruct the scene; any attempt to do so produces artifacts. Features on the convex hull of objects, where at least 180 degrees of rays are unoccluded, are reconstructed well, although not perfectly (see discussion of color pollution in epi_erase. Features not on the convex hull fare worse. In particular, look at the weak left wall of the red cube and the weak upper-left wall of the tilted cyan cube. The latter is interesting; the shape of these "ripples" are directly attributable to the extent of the non-occluded arc of camera views available to reconstruct this wall. The narrower the arc, the narrower the baseline (in the sense of a shape-from-stereo algorithm), and the thinner the ripples. Also interesting is the central cube. The top and side walls are non-occluded over about 90 degrees, unbroken in the case of the left and right walls, broken into two sections in the case of the top wall. All three walls are reasonably well reconstructed. The bottom wall are non-occluded over only 45 degrees, and the reconstruction is much worse. I despair of finding an operator that can extract these rippled walls. For sample results, see epi_newgrads2. For an interesting interpretation of these artifacts as visibility events, see epi_badgrads.


epi_4in1orthoflatgrid_aliasing

This final example shows that sinugrams with excessively high frequences (these also contain aliasing) lead to reconstructions with severe artifacts.

See below for Dwight Nishimura's suggestion of smoother backprojection filters.

Conclusion: Given a well textured scene imaged to form an alias-free light field, features that lie on the convex hull of the scene, or that are visible over a arc, broken or unbroken, of about 45 degrees or more, can be reconstructed adequately. These visibility requirements are more likely to be satisfied in 3D than in flatland.


Gradient operators and occupancy masks


epi_newgrads2

Applying a gradient operator to a filtered backprojection yields a sort of occupancy mask for the scene. This image compares two different gradient operators. The operator used to compute cell g8 consists of convolving the filtered backprojection (cell c5) by a horizontal quadrature filter pair (a gaussian and its derivative) (see epi_4in1orthoflatbricks_thickpostgrad), summing the absolute value of the two quadrature components, convolving the backprojection by a vertical quadrature filter pair, summing those two, taking the max over the three color components (R,G, and B), and taking the sum over the two directions (horizontal and vertical). I have tried various combinations of sums and maxes; this scheme seems best. Note that summing the absolute value of the quadrature mirror pair is close to the formal definition of the magnitude of the response of a quadrature mirror pair of filters, which is the sqrt of the sum of the squared responses. The operator used to compute cell q8 consists of applying only horizontal filters to the filtered backprojection of the front and back quadrants of the circle (cells c1 and c3) and only vertical filters to the backprojection of the left and right quadrants (cells c2 and c4), rather than first summing the projections (to form cell c5). The former solution (running on cell c5) produces a cleaner mask (cell g8), as seen in the closeups at right.


epi_badgrads

This image was designed to emphasize the streak artifacts that arise from running a gradient operator on a scene containing occlusions. In cells g6 and g7, the scene is untextured; in cells g9 and g10, some objects are textured. Cells g6 and g7 (and cells g9 and g10) differ in the choice of the operator. The resulting occupancy masks show many streaks. In other experiments, I tried irregular textures (see epi_noise,2). Those also lead to poor occupancy masks.

Why is it hard to generate a good occupancy mask?

Given a 1D sinusoidal texture and a sinusoidal filter of the same frequency, a quadrature mirror pair suffices, at least in theory, to detect the texture regardless of its phase. The quality of this detection can be degraded by several factors:

These factors cannot be completely eliminated, and they conspire to produce occupancy masks whose features are either textured, or thick, or both. These artifacts severely degrade the performance of iterative reconstruction algorithms, as discussed
below.

Relationship to other vision algorithms

Looking at one quadrant of a circular light field (see cell c1 in epi_newgrads2), the operations involved in taking the gradient of each row of a filtered backprojection is seen to be equivalent to applying a horizontal gradient operator to the image from a camera focused at the depth corresponding to that row. This means that my shape-from-light-field operator is equivalent to a popular shape-from-focus operator, except that in my case the aperture of the camera is enormous - 90 degrees of arc! Taking this analogy further, applying my operator to the backprojection of a full-surround light field (cell c5) is equivalent to shape-from-focus with a 360-degree aperture, the ultimate fish-eye lens! My introduction of a backprojection formulation facilitates this extension of shape-from-focus algorithms and is a real (i.e. publishable) contribution.

Another interesting relationship is between my operators and shape-from-epipolar-volumes. Looking for a strong horizontal gradient on a particular row of a backprojection is equivalent to looking for a pattern of parallel stripes of a particular orientation in line space. What it detects is an object surface parallel to the voxel row. Looking for a gradient in another direction in a backprojection is equivalent to looking for a pattern of converging stripes, something I have not seen in the literature. What this detects is an oblique surface. Analyzing the epipolar volume has one advantage over my pipeline; by taking the absolute value of the derivative along a stripe, it can make itself immune to stripes that start out as valleys, then become peaks. This occurs in specular highlights. However, if surfaces are diffuse, valleys don't become peaks along their length, so the absolute value can be delayed until later, as it is in my pipeline. For another comparison to shape-from-epipolar volumes, see epi_reproject2.

By the way, it is interesting to note that the streaks in these occupancy masks each connect two extrema of the objects in the scene, i.e. occlusion events. Thus, the set of all such streaks is essentially a discrete representation of the visibility aspect graph.

Finally, the only alternative to gradient operators that I can think of is looking for color differences (assuming a distinguishable background). However, this only works on the line hull of the scene. It is essentially a shape-from-silhouettes algorithm such as has been proposed by Richard Szeliski. See epi_silhouette for an experiment along these lines.

See below for another idea, a visibility-dependent gradient operator.

Conclusion: Given a textured scene and a well-matched gradient operator, an adequate occupancy mask can be obtained for features whose reconstruction is not ruined by too much occlusion. The match between texture and gradient operator is delicate.


Occupancy-masked and visibility-masked backprojection


epi_erase

If each backprojected ray is attenuated by its passage through the occupancy mask (visibility masking), and color is deposited in the voxels encountered by the ray only to the extent those voxels are opaque (occupancy masking), the result is akin to casting a colored light through a semi-transparent scene. It is also akin to the shadow pass of a volume rendering algorithm with shadows. Products of such a backprojection include a light strength buffer a.k.a. shadow map (cell l1), a buffer indicating how long each camera ray was visible, a.k.a. the incremental erasure mask (cell k1), a.k.a. visibility mask, indicating how long each pixel in the sinugram (i.e. each ray) was visible (i.e. how long before it was terminated), and an occupancy-masked and visibility-masked colored cross section (cell n2). By comparison, cell i2 shows a colored cross section with occupancy masking but no visibility masking, i.e. color is multiplied by occupancy, but rays are not terminated. Note that the polluting effect of the white sphere on the other objects in the scene, visible in cell i2, is removed from the cubes in cell n2. This removal of color pollution is exactly what we want to see. However, nowhere in this example is any attempt made to use occupancy masking to improve the occupancy mask itself. This is done in the next example.


epi_reproject2

Since the colored cross section produced by occupancy-masked backprojection is free from pollution by unrelated objects in the scene, an occupancy mask generated by taking the gradient of this cross section should be cleaner than the original occupany mask. This image shows two iterations of such a pipeline. The improvement is definite but not dramatic; the first iteration, looks good, but everything starts disintegrating in the second. Furthermore, the walls start thinning almost as fast as the interiors.

This pipeline is similar to a shape-from-epipolar-volumes with erasure algorithm Leonard McMillan once described to me in which, after he detected a left-leaning stripe from the epipolar volume (equivalent in my formulation to deciding that a voxel near the front of the scene is occupied), he erased that stripe from the epipolar volume (equivalent to terminating the backprojection of all rays passing through that voxel) before looking for less-inclined stripes (equivalent to backprojecting the remaining rays to the next voxel row and rerunning my gradient operator). I don't do this; I rerun the gradient operator only once, after the occupancy-masked backprojection is done. In this sense, his algorithm is like Gauss-Seidel iteration and mine is like Jacobi iteration.

Why doesn't ray erasure lead to better masks?

There are four problems that are potentially addressed by occupancy-masked and, more importantly, by visibility-masked backprojection, i.e. by ray erasure:

On the other hand, ray erasure introduces a real danger. In McMillan's formulation, if he is too quick to detect and erase a stripe, he may miss a stronger but slightly less inclined stripe. This is equivalent in my formulation to the self-shadowing problem (see epi_forward_cross2 below). Moreover, since the stripe he detects becomes his reconstruction of the scene, he is in danger of spatially shifting features from their correct positions. I take great care to avoid this (see epi_contour3 and epi_masked_contour).

Conclusion: For scenes that contain opaque objects, volume rendering is the appropriate method of backprojection, at least once an occupancy mask is available. In fact, it is the only way to map colors from a sinugram onto the objects in an extracted scene. However, its utility for actually improving the occupancy mask is only slight.


Iterative backprojection / forward projection


epi_color_correct5

This image shows several cycles of backprojection / forward projection iteration on untextured and textured scenes. Each backprojection produces a new colored cross section (see epi_color_correct3 for an example sequence of cross sections), and each forward projection produces a new sinugram. Unfiltered backprojection is used. The backprojections and forward projections both employ volume rendering with an occupancy mask produced from a filtered backprojection using the gradient operator already described. A correction step consists of computing a difference cross section by backprojecting the difference between the most recent forward projection and the original sinugram, then adding this difference cross section to the most recent cross section to obtain a new cross section. Note, however, that no attempt is made to improve the occupancy mask. In this respect, the iteration is not useful to my goal of reconstructing a 3D shape. I address this deficiency below.

By the way, this iteration is akin to the Algebraic Reconstruction Technique (ART) used to reconstruct CT data, but it is not equivalent to it. This experiment shows that, in keeping with results in the ART literature, and despite the use of volume rendering instead of (additive) integration, the iteration converges, as evidenced by the close match of the final forward projection to the original sinugram. I have tried many variations of this iteration, most notably using filtered backprojection, which usually converges faster but sometimes introduces artifacts due to the presence of occlusions in the scene.


epi_corrector

Here is the result of an early attempt to improve the occupancy mask on each iteration. In this example, I compute unmasked (by the occupancy mask) backprojections of the difference between the most recent forward projection and the original sinugram, add this difference backprojection to the original backprojection, and reapply the gradient operator to this sum. The resulting sequence of occupancy masks and colored cross sections are shown from left to right. The original unmasked backprojection is shown in cell c6. and a later, modified, unmasked backprojection is shown in cell m6. By the way, this light field happens to be planar rather than cylindrical, and the backprojections are filtered rather than unfiltered. In addition to recomputing the mask after each backprojection, I gradually increase overall opacity, which is perhaps akin to decreasing temperature in a simulated annealing algorithm. Comparing cell q8 to cell i8, the weak parts of the mask definitely strengthen on each iteration. To evaluate the efficacy of iterating, in cell i3 I crank up the opacity on the first forward projection. This doesn't work as well, as evidenced by the lack of opacity of the middle cube, which produces the middle near-vertical gray stripe in cell i3. On the other hand, note that each iteration step introduces noise into the mask, as evidenced by the brown fuzz in cell t3. This sensitivity to noise is also endemic to ART, according to the literature.


epi_corrector5

Here is the result of a much later attempt to improve the mask, one that shows improved robustness with respect to noise. In this example, new colored cross sections are computed (as in epi_color_correct5) using unfiltered backprojection, then the ratio of the new cross section to the old cross section is computed. A blurred version of this ratio is shown in cells s5 and dd5. Then the filtered but unmasked backprojection of each original sinugram is multiplied by this ratio, as shown in cells u9 and ff9, and a new mask is computed from scratch, yielding cells w10 and hh10. Compared to the earlier experiment above, the judicious use of unfiltered backprojections is probably responsible for reducing the noise, and a cleaner algebraic formulation (evident only by looking at the Tcl code) removes the need for many arbitrary scaling parameters, but the overall result is not dramatically better.


epi_forward_cross2

To help understand the limitations of my iterative approach, I tried backprojecting the input sinugram using a perfect occupancy mask (cell b5) computed synthetically from the Inventor file. The resulting colored cross section is shown in cell l5. The imperfections are due partially to the approximations inherent in volume rendering (e.g. resampling and digital compositing) and partially to the presence of occlusions in the scene. I then tried forward projecting this cross section, producing imperfect sinugrams m1 and m3. To help me distinguish imperfections due to occlusions from those due to volume rendering, I also tried forward projecting a perfect colored cross section (cell a5), producing sinugrams m5 and m7. Imperfections in these results are due solely to volume rendering.

In addition to these causes of error, noise in the occupancy mask can cause self-shadowing noise in forward projections (see epi_color_correct,2). Blurring the mask helps. A shadow bias might also help. This reminds me of similar problems in shadowed volume rendering.

Can anything be done to save iterative reconstruction algorithms?

The artifacts of iterative reconstruction fall into four classes:

Conclusion #1: Backprojection and forward projection suffer from artifacts due to scene occlusions, imperfect resampling, and the digital compositing approximation of volume rendering. In addition, textured or thick occupany masks lead to texture interference or self-shadowing during projection. These artifacts make iterative techniques perform poorly.

Conclusion #2: I have not yet discovered a method for employing iteration to improve the *geometric* character of an occupancy mask degraded by the presence of occlusions. I have only succeeded in making it more opaque. I suspect that a higher-level representation of the geometry is necessary, for example by reconstructing a polygonal (i.e. piecewise linear) representation of the scene and then applying Taubin-like smoothing to it.

Discussion: I'm not sure I've conclusively proven this second point. I haven't tried applying very large scale smoothing, or dilation / erosion, to the masks. However, given the other artifacts listed in conclusion #1, I'm not sure it's worth the effort.


The vector light (visibility) field


epi_vector_field2

Many researchers have described the irradiance vector field, but nobody has ever made a picture of one. Well, here it is. This image shows the Line Integal Convolution (LIC) of a vector light field produced by placing a circle of point light sources around my flatland scene. Actually, what this image shows is a vector visibility field, not a light field, but in fact these are identical; the scalar irradiance at each point in space becomes the quantitative visibility of each point, and the irradiance vector at each point becomes a sort of direction of maximum visibility. This image includes some backprojections, occupancy masks, visibility images (which look like shadow maps, as expected), and the X,Y magnitudes and direction of the vector light field.

Cell a1:	original epipolar image, slab 0 (front)
Cell j9:	filtered backprojection of original, sum of all slabs
Cell d6:	masked filtered backprojection using mask in f7
Cell f7:	occupancy mask derived from j9 in the usual way
Cell f1:	shadow image from masked backprojection of slab 0 (front)
Cell e1:	visibility image from same backprojection
Cell k5:	unnormalized X-component of vector field from same
Cell l5:	unnormalized Y-component
Cell f5:	sum of shadow images
Cell g5:	magnitude of vector field (sqrt of X^2 + Y^2)
Cell k10:	normalized X-component (normalized by magnitude)
Cell l10:	normalized Y-component
Cell k11:	LIC visualization of normalized (unblurred) vector field
Cell l10:	gradient of occupancy mask along (blurred) vector field lines


Extracting geometry from backprojections


epi_contour3

I have developed two reasonably successful but completely different pipelines for extracting a polyline contour from my backprojections of light fields. In this first example, I perform filtered occupancy-masked backprojection to produce a shadow image (cell f6). Ignoring the occupancy mask, I perform a thresholding operation on the shadow image. After some experimentation, I found that the most successful shadow image was obtained by computing at each voxel the maximum light strength (over any view) that reaches the voxel, rather than the total light strength at that voxel as in volume rendering. In retrospect, this pipeline isn't as strange as it might seem. I am merely computing the L-infinity norm of visibility at each voxel; if it was seen from any camera, then it is unoccupied. This pipeline has two problems. First, as with any approach that relies on maxes, it may not be robust. The slightest hole in the occupancy mask will result in a slash through the object's geometry. Second, the use of a visibility threshold means that the choice of threshold level is critical; slight changes in level move the extracted contour in space.


epi_masked_contour

In this second example, I work from the occupancy mask (cell b1) rather than the shadow image. First, I convert the ridges of the occupancy mask into zero crossings (cell c1), then extract zero crossings to produce a set of contours (cell b2). My justification for this pipeline is that ridge-following algorithms are notoriously unrobust (although maybe I should try one anyway), but zero-crossing algorithms are very robust (as shown in Brian Curless's thesis). To my knowledge, in order to convert ridges into zero crossings, one must know the direction perpendicular to the ridge locally. Amazingly, this is precisely the information given by the vector light field! So, I perform the conversion by computing the gradient of the (blurred) occupancy mask in the direction given by (i.e. steered by) a (blurred) vector light field. The pipeline contains one more step. Since the gradient image is full of zeros, extracting its zero crossings produces many garbage contours (cell b2). I mask these contours using a thresholded version of the occupancy mask. This eliminates all but the desired zero crossings (cell c2). The closeups at bottom show the contour from one cube superimposed on a synthetically generated (gold standard) slice of the scene.

In general, this pipeline is superior to the first one because it is not sensitive to the selection of a threshold level (it is not particularly sensitive to the threshold used during the masking step); the extracted contours are always correctly placed in space. However, since the steered gradient operator depends on the vector light field, which becomes unreliable at saddle points and in densely occluded areas, the pipeline may be fragile. Aside from this fragility, there will of course be failures wherever the occupancy mask is of poor quality, i.e. in regions of poor visibility or untextured objects.

Conclusion: Given an adequate occupancy mask, a good vector light field and, consequently, a good set of contours, can be derived from it.


September 3 meeting with Dwight Nishimura

Smoother backprojection filters (his idea). Jain's book suggests several backprojection filters that produce less moire in the backprojection. (I noticed such moire only if I severely contrast-enhanced my backprojections of points.) These filters might, however, blur the backprojection of all features slightly. It doesn't seem like this will make much of a difference to me either way.

Visibility-dependent gradient operator (my idea). Given a first guess at an occupancy mask, suppose I computed for each voxel the set of directions over which it is visible. From this, I could determine the likely (occlusion-degraded) shape of the backprojection of any (textured) wall that might be present at the voxel. I would not, however, know its direction, so I would have to look for all directions. Could I use such an operator to detect the rippled backprojections of semi-occluded walls? If I successfully detected such a wall, I could use this information in an iterative algorithm, by recomputing the same visibility information at each voxel and looping.


Printed down to here for Shape From Light Fields notebook, September 4, 1997

To do


© 1997 Marc Levoy
Last update: March 4, 2003 10:14:44 PM
levoy@cs.stanford.edu