The first stage can and has been modeled by many researchers using tools of linear system analysis. We offer a novel approach to the second stage by modeling it as the process of finding a partition of the image into regions such that there is high similarity within a region and low similarity across regions. This is made precise as the 'Normalized cut' criterion which can be optimized by solving an eignevalue problem. The resulting eigenvectors provide a herarchical partitioning of the image into regions ordered according to salience. Brightness, color, texture, motion similarity, proximity and good continuation can all be encoded into this framework. We show results on complex images of natural scenes which demonstrate the significant superiority of this technique over classical approaches such as those based on edge detection, MRFs etc. Phenomena such as subjective contours emerge as side consequences.
Our work on the third stage is preliminary; I shall argue on computational and psychophysical grounds that modular shape processing should be abandoned, and that grouping driven by ecological statistics is as crucial as shape cues driven by ecological optics.
This is joint work with Jianbo Shi, Serge Belongie and Thomas Leung.