Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision
(CS 528)
Approximation Searching of High-Dimensional Data:
A Fast Algorithm and some Applications
John Platt
May 1, 2006, 4:15PM
TCSeq 200
http://graphics.stanford.edu/ba-colloquium/
Abstract
Several information retrieval tasks can be reduced to the problem of
searching a database of high-dimensional regions to see whether any
overlap a query point. There is a large literature of indexing
techniques based on trees, which break down given high enough
dimensional data. Approximate lookup techniques, such as
locality-sensitive hashing (LSH), hold out the promise for algorithms
that scale to large databases.
Over the last few years, we've created a new data structure, called
redundant bit vectors (RBVs), that can effectively index high-dimensional
regions. We've applied RBVs for rapid identification of audio clips and
for fast retrieval of handwritten words.
We show that RBVs are a preferred method for very large databases, or
when the quesries are often not in the database. Our method is 109 times
faster than linear scan and 48 times faster than LSH on a data set of
239369 audio fingerprints.
About the Speaker
John Platt is a Principal Researcher and manager of the Knowledge Tools
group at Microsoft Research. While at Microsoft, he has worked on
high-speed learning algorithms for support vector machines,
display-specific anti-aliasing (ClearType), and algorithms for text
categorization. For his Caltech Ph.D., he worked on flexible models for
computer graphics animation, which led to a 2005 Academy Award for
Technical Achievement.
Contact: bac-coordinators@cs.stanford.edu
Back to the Colloquium Page