Broad Area Colloquium for Artificial Intelligence,
Geometry, Graphics, Robotics and Vision
3D Interpretation of Dynamic and Non-Rigid Scenes
Amnon Shashua
School of Computer Science and Eng.
Hebrew University of Jerusalem, Israel
Monday, October 15, 2001, 4:15PM
Gates B01 http://robotics.stanford.edu/ba-colloquium/
Abstract
The problem of three-dimensional (3D) interpretation from multiple
two-dimensional (2D) images is perhaps the oldest and most basic topic
in visual interpretation. Since objects in the world are generally 3D
and the images they create on our retina are 2D, there is a natural
question of how a visual system might go about compensating for the loss
of information due to the projection from 3D to 2D. This question is
fundamental to a wide variety of visual tasks. For example, the images a
3D object generated while changing pose would vary considerably in
appearance, thus in the course of matching a picture of an unknown
object to a library of pictures requires some notion of "invariance" to
pose. Other examples include Ego-motion, sensing ones own 6-dof motion
from the sequence of images falling on the retina, and various
applications outside the realm of visual interpretation such as 3D model
building, augmented reality, view interpolation and synthesis - and so
forth.
During the past decade much progress has been made by many researchers
to gain a complete understanding of the possible invariances and
intrinsic structures one can obtain from image measurements alone across
multiple images and apply those techniques to applications in modeling,
visual recognition and graphics.
In this talk I will describe an emerging new topic in the analysis of 3D
information. I will describe techniques which extend the envelope of
analysis from static/rigid scenes to dynamic/non-rigid configurations.
In other words, the 3D scene features are allowed to change positions in
a non-rigid manner while the camera pose is changing - and the algebraic
treatment of this phenomenon will be shown to be general and scalable to
various domains and applications. I will demonstrate these techniques as
applied to multi-body segmentation, visual recognition of non-rigid
bodies ("action" understanding), and multiple dynamic point
configurations.
This is joint work over the course of two years with Shai Avidan, Anat
Levin, Yoni Wexler and Lior Wolf.
About the Speaker
Amnon Shashua is an Associate Professor at the School of Computer
Science and Engineering at Hebrew University of Jerusalem, Israel, and
is presently on sabbatical at Stanford University. He received his
Ph.D. degree in Computational Neuroscience, working at the Artificial
Intelligence Laboratory, from the Massachusetts Institute of Technology
(MIT), in 1993. His main research interests are in computer vision and
computational modeling of human vision. His work includes early visual
processing of Saliency and Grouping mechanisms, Visual Recognition,
Image Synthesis for Animation and Graphics, and Theory of Computer
Vision in
the
areas of three-dimensional processing from a collection of
two-dimensional
views.
Some of the papers covering the topic of this talk have been awarded the
"best paper award" in ECCV 2000, and the "honorable mention of the Marr
Prize" in ICCV 2001 - both awards jointly with his student and coauthor
Lior Wolf.