Taming the Giants and The Monsters: Recent Developments in Data Mining

Usama Fayyad
Microsoft Research


Knowledge Discovery in Databases (KDD) and Data Mining are concerned with the extraction of interesting structure from databases, especially large stores. Following a brief overview of this rapidly growing area of research and applications, I'll focus on data mining methods. These methods have their origins in statistics, pattern recognition, learning, visualization, databases, optimization, and parallel computing.

I'll discuss some classification and clustering methods and how they are scaled to large databases. I'll present results from our recent work to demonstrate that the methods can be effectively scaled to work with large databases with only limited memory resources. I'll outline the research challenges and opportunities posed by the problem of extracting models from massive data sets. Operating under such scalability constraints poses interesting problems for how models can be built and what methods are practical. Some applications will be used to motivate and illustrate the techniques.


Usama Fayyad is a Senior Researcher at Microsoft Research (http://research.microsoft.com/~fayyad). His research interests include scaling data mining algorithms to large databases, learning algorithms, and statistical pattern recognition, especially classification and clustering. After receiving the Ph.D. degree from The University of Michigan, Ann Arbor in 1991, he joined the Jet Propulsion Laboratory (JPL), California Institute of Technology, where (until 1996) he headed the Machine Learning Systems Group and developed data mining systems for automated science data analysis. He received the 1994 NASA Exceptional Achievement Medal and the JPL 1993 Lew Allen Award for Excellence in Research for his work on developing data mining systems to solve challenging science analysis problems in astronomy and remote sensing. He remains affiliated with JPL as a Distinguished Visting Scientist. He is a co-editor of Advances in Knowledge Discovery and Data Mining (AAAI/MIT Press, 1996) and is an Editor-in-Chief of the journal: Data Mining and Knowledge Discovery. He was program co-chair of KDD-94 and KDD-95 (the First International Conference on Knowledge Discovery and Data Mining) and is general chair of KDD-96 and KDD-99. He co-chaired the 1997 Workshops on the role of KDD in Visualizations held at KDD-97 and IEEE Vis-97 conferences.
Eyal Amir
Last modified: Thu Mar 18 15:51:55 PST 1999