Tracking Synchronized Divers

ALGORITHM

Image Segmentation

Blob tracking
If we wish to track the entire body as a single blob, we only need segment out the diver's body from the frames. For this, we are planning on using our code from Homework 3 on skin-color segmentation for the initial segmentation of the diver from the background since it is readily available and not the main focus of our project. In addition, the divers will be wearing swimsuits, so most of their bodies will be skin color (especially on a male diver).

So, we will use maximum likelihood estimation and expectation maximization to segment divers from the background. First, we will train our system to recognize skin color. Then we will feed our footage of divers into the system and it will give us the probability that the pixel corresponds to skin at every pixel at every frame. In this way, we compute a silhouette of the figure.
Cardboard people
This algorithm assumes that each part of the body can be estimated by a planar patch, so we now need to separately segment parts of the body. To do this, we define a 2D model of the diver on the first frame in which a rectangle encloses each part of his body separately and each rectangle shares at least one edge with another rectangle.
Exponential Maps and Twists
In this algorithm, we use a 3D model to approximate the shape of body segments. First, the 3D model is projected onto the image plane of the first frame in the pose and angular configuration specified by the viewer.

Then we use Expectation-Maximization to refine our segmentation. For each body segment in the model, we define a matte consisting of zeros and ones that corresponds to the projection of that portion of the model onto the image plane. The estimate of how the body moves to the next frame is computed, and then for each pixel and for each body segment (and the background) we compute the probability that it complies with our motion estimate. The matte is then refined by normalizing the sum of all probabilities per pixel location to one.

Tracking

Blob tracking
In a simple blob tracker, we track the silhouette of the diver by superimposing silhouettes taken at each frame onto a single image. These silhouettes are time-stamped by different intensities. We start with a black image (i.e., intensity everywhere equal to 0). The silhouette from the current frame is given the highest intensity and is superimposed onto the black image. The silhouette immediately previous to that is then given a slightly lower intensity and superimposed onto the image. We continue in this manner until we have an image with several silhouettes. The gradient of this image defines the global motion vector.

This technique can be refined to track motion of regions of interest. We find a boundary pixel on the most recent silhouette and travel along the boundary looking outside for a recent unmarked silhouette. If we find one, we mark it by performing a floodfill. The algorithm produces segmented motion masks which describe the motion of a particular portion of the footage.
Cardboard people
We use the articulated motion between two frames to predict the location of that body segment patch in the next frame. The location of the patch is then updated by applying the planar motion to it.

The articulated motion is computed by simultaneously minimizing the total energy of the motions of each patch and adding in an articulation constraint that each patch must retain its original connectivity to its neighbor patches.
Exponential Maps and Twists
We compute the spatiotemporal gradients between frames and then esimate the motion using the mattes derived in the segmentation portion plus an equation based on the properties of kinematic chains given by [Bregler, Malik].

Disparity Calculation

Now we get to the heart of the matter: given two divers, how do we describe how synchronized their motion is? To be honest, we're still unsure as to how we're going to do this part. Given a single global orientation vector, one obvious way of comparing synchronization might be to compare their semi-parabolic trajectories by deriving a mapping from the first curve to the second curve and comparing that mapping to the identity matrix.

WORST-CASE/BEST-CASE PLANS

Worst-case
In the worst case, we hope to finish blob tracking and come up with some metric comparing the synchronization of the overall global orientation vectors of the two divers.
Best-case
In the best case, we solve the vision problem. :) In the next best case, we complete tracking and disparity measures all the way up to the highest level-of-detail (i.e., exponential maps and twists) and have some metric for comparing the synchronizations of the motions of two 3D models.