List of Publications

The corresponding presentations for some of the papers can be found here.


Conferences


Reconstructing Occluded Surfaces using Synthetic Apertures: Stereo, Focus and Robust Measures
Vaibhav Vaish, Richard Szeliski, Larry Zitnick, Sing Bing Kang, Marc Levoy.
Proc. CVPR 2006.
Poster
Most algorithms for 3D reconstruction from images use cost functions based on SSD, which assume that the surfaces being reconstructed are visible to all cameras. This makes it difficult to reconstruct objects which are partially occluded. Recently, researchers working with large camera arrays have shown it is possible to ``see through" occlusions using a technique called synthetic aperture focusing. This suggests that we can design alternative cost functions that are robust to occlusions using synthetic apertures. Our paper explores this design space. We compare classical shape from stereo with shape from synthetic aperture focus. We also describe two variants of multi-view stereo based on color medians and entropy that increase robustness to occlusions. We present an experimental comparison of these cost functions on complex light fields, measuring their accuracy against the amount of occlusion.

Synthetic Aperture Focusing using a Shear-Warp Factorization of the Viewing Transform
Vaibhav Vaish, Gaurav Garg, Eino-Ville Talvala, Emilio Antunez, Bennett Wilburn, Mark Horowitz, Marc Levoy.
Proc. Workshop on Advanced 3D Imaging for Safety and Security
(in conjunction with CVPR 2005)
Oral presentation
Synthetic aperture focusing consists of warping and adding together the images in a 4D light field so that objects lying on a specified surface are aligned and thus in focus, while objects lying off this surface are misaligned and hence blurred. This provides the ability to see through partial occluders such as foliage and crowds, making it a potentially powerful tool for surveillance. If the cameras lie on a plane, it has been previously shown that after an initial homography, one can move the focus through a family of planes that are parallel to the camera plane by merely shifting and adding the images. In this paper, we analyze the warps required for tilted focal planes and arbitrary camera configurations. We characterize the warps using a new rank-1 constraint that lets us focus on any plane, without having to perform a metric calibration of the cameras. We also show that there are camera configurations and families of tilted focal planes for which the warps can be factorized into an initial homography followed by shifts. This homography factorization permits these tilted focal planes to be synthesized as efficiently as frontoparallel planes. Being able to vary the focus by simply shifting and adding images is relatively simple to implement in hardware and facilitates a real-time implementation. We demonstrate this using an array of 30 video-resolution cameras; initial homographies and shifts are performed on per-camera FPGAs, and additions and a final warp are performed on 3 PCs.

High Performance Imaging Using Large Camera Arrays
Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez, Adam Barth, Andrew Adams, Marc Levoy, Mark Horowitz.
Proc. SIGGRAPH 2005
The advent of inexpensive digital image sensors, and the ability to create photographs that combine information from a number of sensed images, is changing the way we think about photography. In this paper, we describe a unique array of 100 custom video cameras that we have built, and we summarize our experiences using this array in a range of imaging applications. Our goal was to explore the capabilities of a system that would be inexpensive to produce in the future. With this in mind, we used simple cameras, lenses, and mountings, and we assumed that processing large numbers of images would eventually be easy and cheap. The applications we have explored include approximating a conventional single center of projection video camera with high performance along one or more axes, such as resolution, dynamic range, frame rate, and/or large aperture, and using multiple cameras to approximate a video camera with a large synthetic aperture. This permits us to capture a video light eld, to which we can apply spatiotemporal view interpolation algorithms in order to digitally simulate time dilation and camera motion. It also permits us to create video sequences using custom non-uniform synthetic apertures.

Synthetic Aperture Confocal Imaging
Marc Levoy, Billy Chen, Vaibhav Vaish, Mark Horowitz, Ian McDowell, Mark Bolas.
Proc. SIGGRAPH 2004
Confocal microscopy is a family of imaging techniques that employ focused patterned illumination and synchronized imaging to create cross-sectional views of 3D biological specimens. In this paper, we adapt confocal imaging to large-scale scenes by replacing the optical apertures used in microscopy with arrays of real or virtual video projectors and cameras. Our prototype implementation uses a video projector, a camera, and an array of mirrors. Using this implementation, we explore confocal imaging of partially occluded environments, such as foliage, and weakly scattering environments, such as murky water. We demonstrate the ability to selectively image any plane in a partially occluded environment, and to see further through murky water than is otherwise possible. By thresholding the confocal images, we extract mattes that can be used to selectively illuminate any plane in the scene.

Using Plane + Parallax to Calibrate Dense Camera Arrays
Vaibhav Vaish, Bennett Wilburn, Neel Joshi, Marc Levoy.
Proc. CVPR 2004
Oral Presentation
A light field consists of images of a scene taken from different viewpoints. Light fields are used in computer graphics for image-based rendering and synthetic aperture photography, and in vision for recovering shape. In this paper, we describe a simple procedure to calibrate camera arrays used to capture light fields using a plane + parallax framework. Specifically, for the case when the cameras lie on a plane, we show (i) how to estimate camera positions up to an affine ambiguity, and (ii) how to reproject light field images onto a family of planes using only knowledge of planar parallax for one point in the scene. While planar parallax does not completely describe the geometry of the light field, it is adequate for the first two applications which, it turns out, do not depend on having a metric calibration of the light field. Experiments on acquired light fields indicate that our method yields than better results than full metric calibration.

High Speed Video Using a Dense Camera Array
Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Marc Levoy, Mark Horowitz.
Proc. CVPR 2004
Oral Presentation
We demonstrate a system for capturing multi-thousand frame-per-second (fps) video using a dense array of cheap 30fps CMOS image sensors. A benefit of using a camera array to capture high speed video is that we can scale to higher speeds by simply adding more cameras. Even at extremely high frame rates, our array architecture supports continuous streaming to disk from all of the cameras. This allows us to record unpredictable events, in which nothing occurs before the event of interest that could be used to trigger the beginning of recording. Synthesizing one high speed video sequence using images from an array of cameras requires methods to calibrate and correct those cameras' varying radiometric and geometric properties. We assume that our scene is either relatively planar or is very far away from the camera and that the images can therefore be aligned using projective transforms. We analyze the errors from this assumption and present methods to make them less visually objectionable. We also present a method to automatically color match our sensors. Finally, we demonstrate how to compensate for spatial and temporal distortions caused by the electronic rolling shutter, a common feature of low-end CMOS sensors.

Robust Fingerprint Authentication Using Local Structural Similarity
Nalini Ratha, Vinayaka Pandit, Ruud Bolle, Vaibhav Vaish.
Proc. Workshop on Applications on Computer Vision, 2000.
Fingerprint matching is challenging as the matcher has to minimize two competing error rates: the False Accept Rate and the False Reject Rate. We propose a novel, efficient, accurate and distortion-tolerant fingerprint authentication technique based on graph representation. Using the fingerprint minutiae features, a labeled and weighted graph of minutiae is ocntructed for both the query fingerprint and the reference fingerprint. In the first phase, we obtain a minimum set of matched node pairs by matching their neighborhood structures. In the second phase, we include more pairs in the match by comparing distances with respect to matched pairs obtained in the first phase. An optional third phase, extending the neighborhood around each feature, is entered if we cannot arrive at a decision based on the analysis in the first two phases. The proposed algorithm been tested with excellent results on a large private livescan database obtained with optical scanners.

Book Chapters

Synthetic Aperture Focusing Using Dense Camera Arrays
Vaibhav Vaish, Gaurav Garg, Eino-Ville Talvala, Emilio Antunez, Bennett Wilburn, Mark Horowitz, Marc Levoy.
To appear in 3D Imaging for Safety and Security, Springer Verlag.
Editors: A. Koschan, M. Pollefeys, M. Abidi.
Synthetic aperture focusing consists of warping and adding together the images in a 4D light field so that objects lying on a specified surface are aligned and thus in focus, while objects lying off this surface are misaligned and hence blurred. This provides the ability to see through partial occluders such as foliage and crowds, making it a potentially powerful tool for surveillance. In this paper, we describe the image warps required for focusing on any given focal plane, for cameras in general position without having to perform a complete metric calibration. We show that when the cameras lie on a plane, it is possible to vary the focus through families of frontoparallel and tilted focal planes by shifting the images after an initial recitification. Being able to vary the focus by simply shifting and adding images is relatively simple to implement in hardware and facilitates a real-time implementation. We demonstrate this using an array of 30 video-resolution cameras; initial homographies and shifts are performed on per-camera FPGAs, and additions and a final warp are performed on 3 PCs.
Extended version of our shear-warp factorization paper.

Patents

Position and Orientation Sensing with a Projector
Paul Beardsley, Ramesh Raskar, Vaibhav Vaish.
Serial No: 10/346,642
Mitsubishi Electric Research Laboratories

© Vaibhav Vaish
Last update: October 24, 2006 11:34:27 PM