Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision
(CS 528)

Rethinking State, Action, and Reward in Reinforcement Learning

Satinder Singh
November 7, 2005, 4:15PM
Hewlett (TCSeq) 200


Over the last decade and more, there has been rapid theoretical and empirical progress in reinforcement learning (RL) using the well- established formalisms of Markov decision processes (MDPs) and partially observable MDPs or POMDPs. At the core of these formalisms are particular formulations of the elemental notions of state, action, and reward that have served the field of RL so well. In this talk, I will describe recent progress in rethinking these basic elements to take the field beyond (PO)MDPs. In particular, I will briefly describe older work on flexible notions of actions called options, briefly describe some recent work on intrinsic rather than extrinsic rewards, and then spend the bulk of my time on recent work on predictive representations of state. I will conclude by arguing that taken together these advances point the way for RL to address the many challenges of building an artificial intelligence.

About the Speaker

Satinder Singh is an Associate Professor of Electrical Engineering and Computer Science in the University of Michigan, Ann Arbor. His main research interest is in the old-fashioned goal of Artificial Intelligence, that of building autonomous agents that can learn to be broadly competent in complex, dynamic, and uncertain environments. The field of reinforcement learning (RL) has focused on this goal, and accordingly his deepest contributions are in RL.


Back to the Colloquium Page