Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision


Planning and Programming with First-order Representations of Markov Decision Processes

Craig Boutilier
Department of Computer Science
University of Toronto

Monday, June 4, 2001, 4:15PM
TCSEQ 201
http://robotics.stanford.edu/ba-colloquium/

Abstract

Markov decision processes (MDPs) have become the de facto standard model for decision-theoretic planning problems. However, classic solution techniques relying on explicit state and action enumeration are too computationally demanding for many AI planning problems. In this talk, I describe a first-order representation--based on the situation calculus--that allows certain classes MDPs to be represented naturally and concisely. I then describe how two classes of computational techniques can be extended to exploit first-order structure when solving a large MDP.

I first describe a dynamic programming algorithm for solving large MDPs without enumerating either state or action spaces. Based on a decision-theoretic generalization of regression, this algorithm discovers the logical structure of the optimal value function and policy through analysis of the logical MDP representation. This should lead to effective algorithms for automated planning with first-order MDPs.

I then describe a framework for agent programming which allows the seamless integration of explicit agent programming with decision-theoretic planning. Specifically, the DTGolog model allows one to partially specify a control program in a high-level, logical language, and provides an interpreter that, given the first-order representation of an MDP, will determine the optimal completion of that program. The DTGolog interpreter can be viewed as constructing an optimal policy subject to the constraints imposed by the partially-specified program.

Finally, I describe some new directions we are exploring to further exploit the integration of planning and programming in first-order representations MDPs. These include the use of our first-order dynamic programming algorithm to construct optimal completions of programs, and the use of regression to compile "action models" of procedures and other program fragments that are used repeatedly.

Much of this talks describes joint work with Ray Reiter, Bob Price, Misha Soutchanski and Sebastian Thrun. For more details see:

http://www.cs.toronto.edu/~cebly/Papers/dtgolog.pdf.gz

http://www.cs.toronto.edu/~cebly/Papers/dtregress.ps

About the Speaker

Craig Boutilier is an Associate Professor with the Department of Computer Science at the University of Toronto. He received his Ph.D. in Computer Science from the University of Toronto in 1992, and worked as an Assistant and Associate Professor at the University of British Columbia from 1991 until his return to Toronto in 1999. Boutilier's research interests have spanned a wide range of topics, from knowledge representation, belief revision, default reasoning, and philosophical logic, to probabilistic reasoning, decision making under uncertainty, multiagent systems, and machine learning. His current research efforts focus on various aspects of decision making under uncertainty: Markov decision processes, game theory and multiagent decision processes, economic models, reinforcement learning, probabilistic inference and preference elicitation.


Contact: bac-coordinators@cs.stanford.edu

Back to the Colloquium Page