Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision
(CS 528)

Approximation Searching of High-Dimensional Data:
A Fast Algorithm and some Applications

John Platt
May 1, 2006, 4:15PM
TCSeq 200


Several information retrieval tasks can be reduced to the problem of searching a database of high-dimensional regions to see whether any overlap a query point. There is a large literature of indexing techniques based on trees, which break down given high enough dimensional data. Approximate lookup techniques, such as locality-sensitive hashing (LSH), hold out the promise for algorithms that scale to large databases.

Over the last few years, we've created a new data structure, called redundant bit vectors (RBVs), that can effectively index high-dimensional regions. We've applied RBVs for rapid identification of audio clips and for fast retrieval of handwritten words.

We show that RBVs are a preferred method for very large databases, or when the quesries are often not in the database. Our method is 109 times faster than linear scan and 48 times faster than LSH on a data set of 239369 audio fingerprints.

About the Speaker

John Platt is a Principal Researcher and manager of the Knowledge Tools group at Microsoft Research. While at Microsoft, he has worked on high-speed learning algorithms for support vector machines, display-specific anti-aliasing (ClearType), and algorithms for text categorization. For his Caltech Ph.D., he worked on flexible models for computer graphics animation, which led to a 2005 Academy Award for Technical Achievement.


Back to the Colloquium Page