Data Mining and The Database Backend

Usama Fayyad
Data Mining & Exploration (DMX) Group
Microsoft Research

Abstract

Data Mining is about finding interesting structure from databases, especially large data stores. Since manageability and convenience dictate that data will have to live in databases, we consider the problem of understanding how a database can accommodate data mining operations very important. I'll outline the research challenges and opportunities posed by the problem of extracting models from massive data sets. Operating under such scalability constraints poses interesting problems for how models can be built and what methods are practical. Following a brief overview of this rapidly growing area of research and applications, I'll focus on data mining methods for classification and clustering. The focus will be on how to scale some of these data access-intensive algorithms to large databases, and in particular how such methods could fit in with database systems. I will also cover applications of these techniques to solving difficult problems in traditional database system. These problems include effecient indexing of data for nearest-neighbor (find similar) queries in high dimensions and to database and datacube compression in OLAP.

Slides: .ps (4.9Mb), .ps.zip (1.8Mb), .pdf (1.8Mb)

About the Speaker:

Usama Fayyad is a Senior Researcher at Microsoft Research (http://research.microsoft.com/~fayyad) where he heads the Data Mining & Exploration (DMX) Group. His research interests include scaling data mining algorithms to large databases, learning algorithms, and statistical pattern recognition, especially classification and clustering. At Microsoft he also works on shipping data mining capabilities in products such as Microsoft Commerce Server and Microsoft SQL Server. After receiving the Ph.D. degree from The University of Michigan, Ann Arbor in 1991, he joined the Jet Propulsion Laboratory (JPL), California Institute of Technology, where (until 1996) he headed the Machine Learning Systems Group and developed data mining systems for automated science data analysis. He received the 1994 NASA Exceptional Achievement Medal and the JPL 1993 Lew Allen Award for Excellence in Research for his work on developing data mining systems to solve challenging science analysis problems in astronomy and remote sensing. He remains affiliated with JPL as a Distinguished Visiting Scientist. He is a co-editor of Advances in Knowledge Discovery and Data Mining (MIT Press, 1996) and is an Editor-in-Chief of the journal: Data Mining and Knowledge Discovery. He was program co-chair of KDD-94 and KDD-95 (the First International Conference on Knowledge Discovery and Data Mining) and is general chair of KDD-96 and KDD-99. He is a director of the ACM SIGKDD and serves as Editor-in-Chief of its newsletter: SIGKDD Explorations
bac-coordinators@cs.stanford.edu
Back to the Colloquium Page
Last modified: Tue Sep 21 18:59:46 PDT 1999