Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision


Processing Natural Language without Natural Language Processing


Eric Brill
Microsoft

Monday, April 28, 2003, 4:15PM
TCSeq 200
http://robotics.stanford.edu/ba-colloquium/

Abstract

Despite decades of research and development, we can still only create machines with the most rudimentary natural language processing capabilities.  One of the greatest barriers to advanced natural language processing is our inability to overcome the linguistic knowledge acquisition bottleneck.  Language appears to be extremely complex and idiosyncratic.  Over the years, there has been an ongoing debate as to how best to overcome this bottleneck: via better linguistics or more powerful machine learning.  While we have been debating, the amount of on-line text has ballooned from the ubiquitous million-word Brown corpus to close to a trillion words accessible on the Web. Does this change everything?  We will describe recent work in a number of areas, including automatic question answering, automatic training of grammar checkers, and language modeling, where state of the art accuracy is achieved using very simple methods whose power comes entirely from the plethora of text currently available to these systems.

About the Speaker

Eric Brill is a Senior Researcher in the Machine Learning and Applied Statistics Group, of Microsoft Research.  His research interests include machine learning, string algorithms, natural language processing and information retrieval.
Contact: bac-coordinators@cs.stanford.edu

Back to the Colloquium Page