Finding People and Animals in Large Collections of Images

David Forsyth

UC Berkeley


Segmentation and recognition tend to be seen as separate activities in theories of human and of machine vision. Building programs that can handle interesting applications of object recognition --- for example, recovering pictures that contain particular objects from poorly structured collections --- requires object representations that can represent objects at a reasonable level of abstraction and can help segment objects from the image. Ideally, we should be able to learn these representations from images. I have built a number of programs that illustrate the issues that must be dealt with. One can tell, quite accurately, whether an image contains a naked or scantily-clad person; another can tell whether an image contains a horse or not. These programs have been tested on large collections of pictures of diverse content. The representations can be learned from image data.

Each program uses a bottom up process only. I will describe our current work on using MCMC methods as a source of top-down information for grouping human limbs and image regions that look like clothing, and speculate on implications for theories of human vision.

David Forsyth holds a B.Sc and M.Sc in Electrical Engineering from the University of the Witwatersrand, Johannesburg, and a D.Phil from Balliol College, Oxford. He was a Prize Fellow at Magdalen College, Oxford for three years, and then an Assistant Professor of Computer Science at the University of Iowa. He is currently an Associate Professor of Computer Science at U.C. Berkeley. He has published papers on object recognition, colour and shading in computer vision, and on physical simulation. His primary interest is understanding how to attack large, general object recognition. He can be contacted at

Edited by Leonidas Guibas