Video Segmentation

Feng Xie


In this project, I plan to build a system to segment the foreground objects from background scenery in a video stream. In particular, this system is to be designed to work on images or video streams with relatively large color variance between the foreground objects and background scenery. Examples would be an animation clip in a Disney cartoon or the lecturer in front of a blackboard.

The system is designed to be used in two application domains:

a) plugged into a realtime video broadcasting receiver to enable segmenting foreground objects (like the news anchor, or the lecturer) from the fairly distinct and simple background to enable a more interactive viewing experience.



There are two goals to this segmentation system: performance and quality.

The performance requirements of this system are high. It has to be able to segment the images at a rate that will not introduce noticeable delay into the video viewing experience. Here we assume the video stream is broadcasted and decoded in realtime. In order to meet the real time demands of the system, the performance requirements of the segmentation algorithm are also high. But because of the high coherence in the foreground objects in video streams, (especially in the lecturer in front of blackboard case), we should be able to run the segmentation at a lower framerate than the video rate, or explore other ways of exploiting this coherence to accelerate segmentation or lower the amortized segmentation cost of multiple frames.

The quality requirements of the system are also relatively high. It would be desirable to define sharp silhouettes for the forgeround objects because the two application domains this systems is built for both have high quality demands. In HDTV type of applications, because HDTV means more bandwidth, higher resolution, etc, a foreground sprite with noticeable fuzzy corners around the silhouette would not be acceptable.

There will be expected trade-offs between quality and performance; the ideal case would be a system that is rather scalable in terms of performance and quality based on the requirements of the application.



A segmentation system that can be plugged into a video decoder to enable video segmentation and recompositing to allow a more interactive viewing experience, and/or a video editor to allow interactive video editing and compositing for content creation.


Images from a video decoder, file format accepted (rgb, tiff)


Segmented images with alpha chanels for compositing

Input Data Generation

Two type of video streams will be used:
a) a video recording of a sitn lecture.
b) a clip of a Disney type animation


Programming will be done primarily in C++.


Potential Optimization

a)From the server, we can send down concervative screen bounding box of the lecturer, to reduce the size of image to be segmented.
b)From the client, temporal coherence can be used to give good estimates of the screen bounding box of the lecturer in the next frame.