Spacetime Stereo

Depth from triangulation has traditionally been investigated in a number of independent threads of research, with methods such as stereo, laser scanning, and coded structured light considered separately. In this work, we propose a common framework called spacetime stereo that unifies and generalizes many of these previous methods. To show the practical utility of the framework, we develop two new algorithms for depth estimation: depth from unstructured illumination change and depth estimation in dynamic scenes. Based on our analysis, we show that methods derived from the spacetime stereo framework can be used to recover depth in situations in which existing methods perform poorly.

Documents

James Davis, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2003. [Paper PDF] [Poster PDF] [Poster PPT]

James Davis, Diego Nehab, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” Princeton Comp. Sci. Tech Report TR-689-04, 2004. [PDF]

James Davis, Diego Nehab, Ravi Ramamoothi, Szymon Rusinkiewicz. “Spacetime Stereo : A Unifying Framework for Depth from Triangulation,” IEEE Trans. On Pattern Analysis and Machine Intelligence (PAMI), vol. 27, no. 2, Feb 2005. [PDF]

Examples

Depth from unstructured lighting on static scenes

When the scene to be recovered is static and contains unstructured illumination change, temporal stereo produces significantly better depth estimates than does traditional spatial stereo. Temporal stereo works well even under challenging conditions such as a white object in front of a white wall illuminated only by a hand held flashlight. Results are intensity coded depth images.

[AVI - DivX - 2.4MB]

Traditional Spatial stereo

Temporal Stereo

Accurate results are possible when the temporal variation in the lighting provides sufficient information for good correspondence. Using a slowly moving light stripe and temporal processing we obtain 0.044 mm RMS error – 0.13 mm peak noise on a test target.

Dynamic scenes

Using higher frequency lighting variation allows the shape of dynamic objects to be recovered as well. We create the necessary lighting with a projector that is neither synchronized nor calibrated with respect to the cameras. Note that subtle details such as the bulging of a cheek during a smile can be clearly seen in the resulting videos of rendered geometry.

[AVI - DivX - 1.3MB]

[AVI-Front] [AVI-3/4]
[AVI-Side] [MP4-3/4]

Mobile robotics

We have started to investigate using this technology for mobile robotic map building. Below we see the robot itself with mounted spacetime stereo range capture equipment, the robot in action, and rendered polygon meshes of the captured geometry from both the front and top view.

[AVI - DivX - 1.5MB]

[AVI - DivX - 6.7MB]

[AVI - 2.0MB]

Related Work

Researchers at the University of Washington simultaneously published nearly identical ideas. We initially focused our work more on describing a framework for categorizing and understanding a broad set of related triangulation techniques, while they initially focused more on recovering the shape of dynamic scenes. You may find their paper and results interesting as well.

L. Zhang, B. Curless, S. M. Seitz. “Spacetime Stereo : Shape Recovery for Dynamic Scenes,” IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition (CVPR), 2003. [Project Web Page].