Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision
Processing Natural Language without Natural Language Processing
Eric Brill
Microsoft
Monday, April 28, 2003, 4:15PM
TCSeq 200
http://robotics.stanford.edu/ba-colloquium/
Abstract
Despite decades of research and development, we can still only create
machines with the most rudimentary natural language processing
capabilities. One of the greatest barriers to advanced natural
language processing is our inability to overcome the linguistic
knowledge acquisition bottleneck. Language appears to be extremely
complex and idiosyncratic. Over the years, there has been an ongoing
debate as to how best to overcome this bottleneck: via better
linguistics or more powerful machine learning. While we have been
debating, the amount of on-line text has ballooned from the ubiquitous
million-word Brown corpus to close to a trillion words accessible on
the Web. Does this change everything? We will describe recent work in
a number of areas, including automatic question answering, automatic
training of grammar checkers, and language modeling, where state of
the art accuracy is achieved using very simple methods whose power
comes entirely from the plethora of text currently available to these
systems.