stanford.seal64.gif (1,768 bytes)

Broad Area Colloquium for Artificial Intelligence,
Geometry, Graphics, Robotics and Vision


3D Interpretation of Dynamic and Non-Rigid Scenes

Amnon Shashua
School of Computer Science and Eng.
Hebrew University of Jerusalem, Israel

Monday, October 15, 2001, 4:15PM
Gates B01
http://robotics.stanford.edu/ba-colloquium/

Abstract

The problem of three-dimensional (3D) interpretation from multiple two-dimensional (2D) images is perhaps the oldest and most basic topic in visual interpretation. Since objects in the world are generally 3D and the images they create on our retina are 2D, there is a natural question of how a visual system might go about compensating for the loss of information due to the projection from 3D to 2D. This question is fundamental to a wide variety of visual tasks. For example, the images a 3D object generated while changing pose would vary considerably in appearance, thus in the course of matching a picture of an unknown object to a library of pictures requires some notion of "invariance" to pose. Other examples include Ego-motion, sensing ones own 6-dof motion from the sequence of images falling on the retina, and various applications outside the realm of visual interpretation such as 3D model building, augmented reality, view interpolation and synthesis - and so forth.

During the past decade much progress has been made by many researchers to gain a complete understanding of the possible invariances and intrinsic structures one can obtain from image measurements alone across multiple images and apply those techniques to applications in modeling, visual recognition and graphics.

In this talk I will describe an emerging new topic in the analysis of 3D information. I will describe techniques which extend the envelope of analysis from static/rigid scenes to dynamic/non-rigid configurations. In other words, the 3D scene features are allowed to change positions in a non-rigid manner while the camera pose is changing - and the algebraic treatment of this phenomenon will be shown to be general and scalable to various domains and applications. I will demonstrate these techniques as applied to multi-body segmentation, visual recognition of non-rigid bodies ("action" understanding), and multiple dynamic point configurations.

This is joint work over the course of two years with Shai Avidan, Anat Levin, Yoni Wexler and Lior Wolf.

About the Speaker

Amnon Shashua is an Associate Professor at the School of Computer Science and Engineering at Hebrew University of Jerusalem, Israel, and is presently on sabbatical at Stanford University. He received his Ph.D. degree in Computational Neuroscience, working at the Artificial Intelligence Laboratory, from the Massachusetts Institute of Technology (MIT), in 1993. His main research interests are in computer vision and computational modeling of human vision. His work includes early visual processing of Saliency and Grouping mechanisms, Visual Recognition, Image Synthesis for Animation and Graphics, and Theory of Computer Vision in the areas of three-dimensional processing from a collection of two-dimensional views.

Some of the papers covering the topic of this talk have been awarded the "best paper award" in ECCV 2000, and the "honorable mention of the Marr Prize" in ICCV 2001 - both awards jointly with his student and coauthor Lior Wolf.


Contact: bac-coordinators@cs.stanford.edu

Back to the Colloquium Page