Data Mining and The Database Backend
Usama Fayyad
Data Mining & Exploration (DMX) Group
Microsoft Research
Abstract
Data Mining is about finding interesting structure from databases,
especially large data stores. Since manageability and convenience
dictate that data will have to live in databases, we consider the
problem of understanding how a database can accommodate data mining
operations very important. I'll outline the research challenges and
opportunities posed by the problem of extracting models from massive
data sets. Operating under such scalability constraints poses
interesting problems for how models can be built and what methods are
practical. Following a brief overview of this rapidly growing area
of research and applications, I'll focus on data mining
methods for classification and clustering. The focus will be on how
to scale some of these data access-intensive algorithms to large
databases, and in particular how such methods could fit in with
database systems. I will also cover applications of these techniques
to solving difficult problems in traditional database system. These
problems include effecient indexing of data for nearest-neighbor (find
similar) queries in high dimensions and to database and datacube
compression in OLAP.
Slides: .ps (4.9Mb), .ps.zip (1.8Mb), .pdf (1.8Mb)
About the Speaker:
Usama Fayyad is a Senior Researcher at Microsoft Research
(http://research.microsoft.com/~fayyad) where he heads the Data Mining
& Exploration (DMX) Group. His research interests include scaling data
mining algorithms to large databases, learning algorithms, and
statistical pattern recognition, especially classification and
clustering. At Microsoft he also works on shipping data mining
capabilities in products such as Microsoft Commerce Server and
Microsoft SQL Server. After receiving the Ph.D. degree from The
University of Michigan, Ann Arbor in 1991, he joined the Jet
Propulsion Laboratory (JPL), California Institute
of Technology, where (until 1996) he headed the Machine Learning
Systems Group and developed data mining systems for automated science
data analysis. He received the 1994 NASA Exceptional Achievement Medal
and the JPL 1993 Lew Allen Award for Excellence in Research for his
work on developing data mining systems to solve challenging science
analysis problems in astronomy and remote sensing. He remains
affiliated with JPL as a Distinguished Visiting Scientist. He is a
co-editor of Advances in Knowledge Discovery and Data Mining (MIT
Press, 1996) and is an Editor-in-Chief of the journal: Data Mining and
Knowledge Discovery. He was program co-chair of KDD-94 and KDD-95 (the
First International Conference on Knowledge Discovery and Data Mining)
and is general chair of KDD-96 and KDD-99. He is a director of the ACM
SIGKDD and serves as Editor-in-Chief of its newsletter: SIGKDD
Explorations
bac-coordinators@cs.stanford.edu
Back to the Colloquium Page
Last modified:
Tue Sep 21 18:59:46 PDT 1999