Spacetime Stereo : A Unifying Framework for Depth from Triangulation
AbstractDepth from triangulation has traditionally been investigated in a number of independent threads of research, with methods such as stereo, laser scanning, and coded structured light considered separately. In this work, we propose a common framework called spacetime stereo that unifies and generalizes many of these previous methods. To show the practical utility of the framework, we develop two new algorithms for depth estimation: depth from unstructured illumination change and depth estimation in dynamic scenes. Based on our analysis, we show that methods derived from the spacetime stereo framework can be used to recover depth in situations in which existing methods perform poorly.
DocumentsConference Paper James Davis, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2003. [Paper PDF] [Poster PDF] [Poster PPT] Tech Report (added analysis of moving objects) James Davis, Diego Nehab, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” Princeton Comp. Sci. Tech Report TR-689-04, 2004. [PDF] Journal Paper (shortened and clarified the tech report) James Davis, Diego Nehab, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” IEEE Trans. On Pattern Analysis and Machine Intelligence (PAMI), vol. 27, no. 2, Feb 2005. [PDF]
ExamplesDepth from unstructured lighting on static scenesWhen the scene to be recovered is static and contains unstructured
illumination change, temporal stereo produces significantly better depth
estimates than does traditional spatial stereo. Temporal stereo works well
even under challenging conditions such as a white object in front of a
white wall illuminated only by a hand held flashlight. Results are
intensity coded depth images.
Accurate results are possible when the temporal variation in the lighting provides sufficient information for good correspondence. Using a slowly moving light stripe and temporal processing we obtain 0.044 mm RMS error – 0.13 mm peak noise on a test target.
Dynamic scenesUsing higher frequency lighting variation allows the
shape of dynamic objects to be recovered as well. We create the necessary
lighting with a projector that is neither synchronized nor calibrated with
respect to the cameras. Note that subtle details such as the bulging of a
cheek during a smile can be clearly seen in the resulting videos of
rendered geometry.
Mobile roboticsWe have started to investigate using this technology for mobile robotic
map building. Below we see the robot itself with mounted spacetime stereo
range capture equipment, the robot in action, and rendered polygon meshes
of the captured geometry from both the front and top view.
Related WorkResearchers at the University of Washington simultaneously published nearly identical ideas. We initially focused our work more on describing a framework for categorizing and understanding a broad set of related triangulation techniques, while they initially focused more on recovering the shape of dynamic scenes. You may find their paper and results interesting as well. L. Zhang, B. Curless, S. M. Seitz. “Spacetime Stereo : Shape Recovery for Dynamic Scenes,” IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2003. [Project Web Page].
|