##
Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision

(CS 528)

###
Second Generation Cached-Sufficient Statistics for efficient statistical queries

Andrew Moore, Carnegie Mellon University

Feburary 7, 2005, 4:15PM

TCSeq 200

`http://graphics.stanford.edu/ba-colloquium/`

#### Abstract

This talk is about recent work on new ways to exploit preprocessed views of
data tables for tractably solving big statistical queries. We'll describe
deployments of these new algorithms in the realms of detecting killer
asteroids and unnatural disease outbreaks.
In recent years, several groups have looked at methods for pre-storing
general sufficient statistics of the data in spatial data structures such as
kd-trees and ball-trees so that statistical operations involving
aggregation, convolution and contingency tables become fast for large
datasets. In this talk we will look at two other classes of optimization
required in important statistical queries. The first involves iterating over
all spatial regions (big and small). The second involves detection of tracks
from noisy intermittent observations separated far apart in time and space.
We will also discuss the implications that have arisen from making these
operations tractable. We will focus particularly on

- Detecting all asteroids in the solar system larger than Pittsburgh's
Cathedral of Learning (data to be collected over 2006-2010).

- Early detection of emerging diseases based on national monitoring of
health-related transactions.

Joint work with Jeremy Kubica, Ting Liu, and Daniel Neill.
#### About the Speaker

Andrew Moore is a Professor of Robotics and Computer
Science at the School of Computer Science, Carnegie Mellon University.
Andrew began his career writing video-games for an obscure British personal
computer
(http://www.oric.org/index.php?page=software&fille=detail&num_log=2). He
rapidly became a thousandaire and retired to academia, where he received a
PhD from the University of Cambridge in 1991. He researched robot learning
as a Post-doc working with Chris Atkeson, and then moved to CMU.
His main research interest is data mining: statistical algorithms for
finding all the potentially useful and statistically meaningful patterns in
massive sources of data. His research group, The Auton Lab,
(http://www.autonlab.org) has devised several new ways of performing massive
statistical operations efficiently, in several cases accelerating
state-of-the-art by a several magnitudes. Members of the Auton Lab
collaborate closely with many kinds of scientists, government agencies,
technology companies and engineers in a constant quest to determine what are
some of the most urgent unresolved questions at the border of computation
and statistics. Auton Lab algorithms are now in use in dozens of
commercial, university and government applications. Andrew serves on several
editorial boards, and in industrial, government and academic advisory roles.
In his non-work life he has no hobbies or talents of any significance.

*Contact: *`bac-coordinators@cs.stanford.edu`
Back to the Colloquium Page