- Blob tracking
If we wish to track the entire body as a single blob, we only need segment out the diver's
body from the
frames. For this, we are planning on using our code from Homework 3 on skin-color
segmentation for the initial segmentation of the diver from the background
since it is readily available and not the main focus of our project. In addition, the
divers will be wearing swimsuits, so most of their bodies will be skin color (especially on a
male diver).
So, we will use maximum likelihood estimation and expectation maximization to
segment divers from the background. First, we will train our system to recognize skin color.
Then we will feed our footage of divers into the system and it will give us the probability
that the pixel corresponds to skin at every pixel at every frame. In this way, we compute a
silhouette of the figure.
- Cardboard people
This algorithm assumes that each part of the body can be estimated by a planar patch, so
we now need to separately segment parts of the body. To do this, we define a 2D model of the
diver on the first frame in which a rectangle encloses each
part of his body
separately and each rectangle shares at least one edge with another rectangle.
- Exponential Maps and Twists
In this algorithm, we use a 3D model to approximate the shape of body segments. First,
the 3D
model is projected onto the image plane of the first frame in the pose and angular
configuration specified by the
viewer.
Then we use Expectation-Maximization to refine our segmentation. For each body segment in
the model, we define a matte consisting of zeros and ones
that corresponds to the projection of that portion of the model onto the image plane. The
estimate of how the body moves to the next frame is computed, and then for each
pixel and for each body segment (and the background) we compute the probability that it
complies with our motion
estimate. The matte is then refined by normalizing the sum of all probabilities per pixel
location to one.
- Blob tracking
In a simple blob tracker, we track the silhouette of the diver by superimposing silhouettes
taken at each frame onto a single image. These silhouettes are time-stamped by different
intensities. We start with a black image (i.e., intensity everywhere equal to 0). The
silhouette from the current frame is given the highest intensity and is superimposed onto the
black image. The
silhouette immediately previous to that is then given a slightly lower intensity and
superimposed
onto the image. We continue in this manner until we have an image with several
silhouettes. The gradient of this image defines the global motion vector.
This technique can be refined to track motion of regions of interest. We find a boundary
pixel on the most recent silhouette and travel along the boundary looking outside for a recent
unmarked silhouette. If we find one, we mark it by performing a floodfill. The algorithm
produces segmented motion masks which describe the motion of a particular portion of the
footage.
- Cardboard people
We use the articulated motion between two frames to predict the location of that body
segment patch in the next frame. The location of the patch is then updated by applying the
planar motion to it.
The articulated motion is computed by simultaneously minimizing the total energy of the
motions of each patch and adding in an articulation constraint that each patch must retain its
original connectivity to its neighbor patches.
- Exponential Maps and Twists
We compute the spatiotemporal gradients between frames and then esimate the motion
using the mattes derived in the segmentation portion plus an equation based on the
properties of kinematic chains given by [Bregler, Malik].