We describe the development of fully automated computer vision techniques for capturing textured 3D CAD models of urban areas directly from near-ground photographs. The scale and generality of the input image data in this problem imposes significant constraints on any proposed system design. We discuss these constraints and their implications, then present a model capture system. The system includes a novel sensor which acquires high-resolution, spherical, geo-referenced images, and accompanying algorithms which extract textured geometric models of the environment observed by the sensor.
Eliminating the human in the loop is a significant challenge from both engineering and research standpoints, and the effort has led to some powerful new techniques. The tradeoff is that achieving automation and scaling requires specialized sensor instrumentation, large numbers (typically thousands) of image observations, and significant computational resources. In contrast to the prevailing view that human intervention always improves quality, we give examples of situations in which our automated system outperforms a human operator. We describe the current status of the project and show some preliminary results.
Prof. Teller now co-heads the MIT Computer Graphics Group, pursuing capture, exploration, design and simulation of human-scale objects and environments. Recent research efforts include: a project to capture a three-dimensional map of the entire MIT campus, outside and in; the acquisition of a three-dimensional "time-lapse" movie of the demolition of MIT's Building 20 and the construction of the Stata Center designed by Frank Gehry; and the Educational Fusion system for authoring, deploying, and teaching computer science concepts.