--------------------------------------------------------------------------------------------------------------------

                          Understanding multimedia using generative models

                                            Nebojsa Jojic
                                         Microsoft Research

--------------------------------------------------------------------------------------------------------------------

Most of the research on understanding natural signals is based on some
sort of a model of the world. These models have typically been highly
specific about one aspect of the world, for instance, the appearance of
a human face, or the motion type of a layer, or the spectral
characteristic of speech but addressing other, "non-interesting" parts
of the scene is avoided, or left to a separate integration module. The
limited flow of information and limited adaptivity of such systems make
them very brittle in realistic applications. In order to build more
robust understanding algorithms, models need to be capable of capturing
various aspects of the data at the same time, be fairly simple, but
adapt to the data.

Generative models, as defined my the machine learning community, are
flexible models that describe the data of interest through a feasible
generation process, starting only from a minimal number of parameters
and using sampling from appropriate probability distributions to
introduce variability. While the generative process itself is rarely
used directly, the descriptive power of the model is used for inference,
classification, and data manipulation.

In this talk, I will overview the generative approach to multimedia
understanding, and report some of our recent results on audio-visual
tracking; multimedia clustering, search and retrieval; and video
editing, such as object extraction, illumination correction,
stabilization, etc. Joint work with Brendan Frey and Hagai Attias.