Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision

The Statistical Natural Language Processing Revolution

Eugene Charniak
Professor of Computer Science and Cognitive Science
Brown University

Wednesday, April 26, 2000
refreshments 4:05PM, talk begins 4:15PM
TCseq201, Lecture Hall B


Over the last ten years or so the field of natural language processing (NLP) has become increasingly dominated by corpus-based methods and statistical techniques. In this research, problems are attacked by collecting statistics from a corpus (sometimes marked with correct answers, sometimes not) and then applying the statistics to new instances of the task. In this talk we give an overview of statistical techniques in a few areas of NLP such as: parsing (finding the correct phrase structure for a sentence), lexical semantics (learning meanings and other properties of words and phrases from text), and anaphora resolution (determining the intended antecedent of pronouns, and noun phrases in general). As a general rule, corpus-based, and particularly statistical techniques outperform hand-crafted systems, and the rate of progress in the field is quite high.

About the Speaker

Eugene Charniak is Professor of Computer Science and Cognitive Science at Brown University and past chair of the Department of Computer Science. He received his A.B. degree in Physics from University of Chicago, and a Ph.D. from M.I.T. in Computer Science. He has published four books: Computational Semantics, with Yorick Wilks (1976) Artificial Intelligence Programming with Chris Riesbeck, Drew McDermott, and James Meehan (1980, 1987), Introduction to Artificial Intelligence with Drew McDermott (1985) and Statistical Language Learning (1993). He is a Fellow of the American Association of Artificial Intelligence and was previously a Councilor of the organization. His research has always been in the area of language understanding or technologies which relate to it. Over the last few years he has been interested in statistical techniques for language processing. In this area he has worked in the sub-areas of lexicalized parsing, pronoun-reference, and lexical resource acquisition, all through statistical means.
Back to the Colloquium Page
Mon Jan 10 14:05:01 PST 2000