CS223B - Intro to Computer Vision Final Project

Introduction

The original project proposal discussed the application of computer vision to sports. In particular, I had proposed to extract 3D position data from screenshots of a televised sporting event. However, based on the instructor's feedback and initial experimentation, it soon became apparent that that problem would be too difficult to solve in a few weeks. Thus, the scope of the project was reduced to the tracking of the ball during the game. Although this might sound trivial, there are many complications that make things difficult for a computer-based vision system. For example, there are often occlusions where the ball is no longer visible and the viewer must make assumptions as to the location of the ball. The hybrid ball-tracking algorithm I developed worked remarkably well for the test cases, and I present the results in this paper.

Description of the Problem

We have tracked objects with our eyes from the moment we first looked at the world. We have been doing this for many years, and so we have become quite good at it. We have watched many sporting events on television, and watched as a ball was thrown back and forth to score points and win games. Perhaps even some adeventurous readers might actually have partaken in these physical activities on occasion, and experienced the thrill of having to track an oncoming ball.

Bayes Theory postulates that we make decisions based previous experience. This is certainly true during the tracking of a moving ball. When we observe the ball in motion, we are not able track it because we see it perfectly at every step of the way, but often because where we know where it is going to go. We have a knowledge of the physical properties of the ball, the nature and rules of the sport we are watching, and the conditions of the field that allow us to make these decisions. This is why new viewers of a sport often have more trouble following the ball than seasoned fans. A dog does the same thing, for example. If you pretend to throw a stick, the dog will follow the phantom stick with its eyes for a second. Perhaps it might get confused, when it realizes that its perception of reality does not match with what its eyes are seeing. However, it began to track the stick based on a path it had seen the stick take in the past, not because of any visual cues it was seeing (except for the moving arm, of course).

Computers, on the other hand, do not have the luxury of relying on years of experience. Unless one can implement an extensive knowledgebase of the physics of ball movement, the rules of the game, player behavior, etc., one must dismiss Bayesian ideology and deal only with what one can see at the present time. This system can never be good as a person, since occlusion will always yield an unpredictable state. However, despite these drawbacks my initial results seem to suggest that it is possible to get reasonable results from a computer vision system.

My basic algorithm is a hybrid combination of several major computer vision technologies: edge detection, optical flow methods, and segmentation techniques. This seemed reasonable since humans use many different cues to see objects. In the section that follows, I will describe the algorithm in detail.

Algorithm and Implementation

After some testing, I came up with an algorithm that was robust enough to do well in my test cases. The outline of the algorithm is as follows:
For every frame {
Setup:
Find edges in the image using modified Canny edge detector. Create "edgeImage" matrix.
Segment the image by mixture of gaussian into two layers (players and ball).
Use the edgeImage and the player layer as masks for the ball layer to remove false hits. the final image should only have the ball highlighted (in theory). Call this the "ball mask"
Tracking Portion:
If the ball moved more than a certain amount in the last frame, assume that it moved forward by this same amount. Begin tracking here.
Begin looking for the ball from this position by using optical flow on the edge images. Come up with best estimate for ball location.
Move the cursor to the nearest point on the ball mask from where the Lucas Kanade optical flow algorithm had predicted it would be. Only do this if the point you are moving to is closer than a certain distance. If it is too far away, it probably is just random noise so remain where the LK algorithm had predicted.
Save the parameters you need, move to the next frame and start again.
}

As you can see, I have broken up the steps of the algorithm into a setup phase and a tracking phase. Click on a step button to read about that step in more detail.

Testing and Results

I needed to test my code with actual data from a soccer match, so I taped one off television and edited a few clips to use as test cases. I did not want to pick easy clips, so I just picked clips that just featured the kind of typical action that can be expected at a soccer game. In addition, the quality of the videos is not very good at all, yet the algorithm still is able to perform remarkably well in most cases. I used four main scenes that featured different challenges to test my algorithm.

Test1 Initial test case, fairly easy with one minor occlusion
Test2 The free kick. Tough to track because the ball moves so quickly.
Test3 A long sequence. Checks to see if the code can track for a long amount of time.
Test4 Deflections and occlusions. A crazy sequence where the ball bounces every which way and is hidden from view several times.

Click on each test to read more about it and to view the results.

Conclusions

Overall, the ball-tracking algorithm performed reasonably well in the test cases, considering the poor quality of the video and the difficulties involved in tracking a moving ball. Further work could focus on making the system more robust. One way to do this would be to track multiple "possible" balls for several frames if uncertainties arise, then collapse these back when one true ball is found again. Another possible approach would be to use an optical flow algorithm to remove the pan of the camera and thus detect only objects which have moved. This could be used as yet another layer to try to locate the ball.