In recent years, several groups have looked at methods for pre-storing general sufficient statistics of the data in spatial data structures such as kd-trees and ball-trees so that statistical operations involving aggregation, convolution and contingency tables become fast for large datasets. In this talk we will look at two other classes of optimization required in important statistical queries. The first involves iterating over all spatial regions (big and small). The second involves detection of tracks from noisy intermittent observations separated far apart in time and space. We will also discuss the implications that have arisen from making these operations tractable. We will focus particularly on
His main research interest is data mining: statistical algorithms for finding all the potentially useful and statistically meaningful patterns in massive sources of data. His research group, The Auton Lab, (http://www.autonlab.org) has devised several new ways of performing massive statistical operations efficiently, in several cases accelerating state-of-the-art by a several magnitudes. Members of the Auton Lab collaborate closely with many kinds of scientists, government agencies, technology companies and engineers in a constant quest to determine what are some of the most urgent unresolved questions at the border of computation and statistics. Auton Lab algorithms are now in use in dozens of commercial, university and government applications. Andrew serves on several editorial boards, and in industrial, government and academic advisory roles. In his non-work life he has no hobbies or talents of any significance.
Back to the Colloquium Page