##
Broad Area Colloquium For AI-Geometry-Graphics-Robotics-Vision

### Planning and Programming with First-order Representations
of Markov Decision Processes

Craig Boutilier

Department of Computer Science

University of Toronto

Monday, June 4, 2001, 4:15PM

TCSEQ 201

`http://robotics.stanford.edu/ba-colloquium/`

#### Abstract

Markov decision processes (MDPs) have become the de facto standard
model for decision-theoretic planning problems. However, classic
solution techniques relying on explicit state and action enumeration
are too computationally demanding for many AI planning problems. In
this talk, I describe a first-order representation--based on the
situation calculus--that allows certain classes MDPs to be represented
naturally and concisely. I then describe how two classes of
computational techniques can be extended to exploit first-order
structure when solving a large MDP.
I first describe a dynamic programming algorithm for solving large
MDPs without enumerating either state or action spaces. Based on a
decision-theoretic generalization of regression, this algorithm
discovers the logical structure of the optimal value function and
policy through analysis of the logical MDP representation. This should
lead to effective algorithms for automated planning with first-order
MDPs.

I then describe a framework for agent programming which allows the
seamless integration of explicit agent programming with
decision-theoretic planning. Specifically, the DTGolog model allows
one to partially specify a control program in a high-level, logical
language, and provides an interpreter that, given the first-order
representation of an MDP, will determine the optimal completion of
that program. The DTGolog interpreter can be viewed as constructing
an optimal policy subject to the constraints imposed by the
partially-specified program.

Finally, I describe some new directions we are exploring to further
exploit the integration of planning and programming in first-order
representations MDPs. These include the use of our first-order dynamic
programming algorithm to construct optimal completions of programs,
and the use of regression to compile "action models" of procedures and
other program fragments that are used repeatedly.

Much of this talks describes joint work with Ray Reiter, Bob Price,
Misha Soutchanski and Sebastian Thrun. For more details see:

http://www.cs.toronto.edu/~cebly/Papers/dtgolog.pdf.gz

http://www.cs.toronto.edu/~cebly/Papers/dtregress.ps

#### About the Speaker

Craig Boutilier is an Associate Professor with the Department of
Computer Science at the University of Toronto. He received his Ph.D. in
Computer Science from the University of Toronto in 1992, and worked as
an Assistant and Associate Professor at the University of British
Columbia from 1991 until his return to Toronto in 1999. Boutilier's
research interests have spanned a wide range of topics, from knowledge
representation, belief revision, default reasoning, and philosophical
logic, to probabilistic reasoning, decision making under uncertainty,
multiagent systems, and machine learning. His current research efforts
focus on various aspects of decision making under uncertainty: Markov
decision processes, game theory and multiagent decision processes,
economic models, reinforcement learning, probabilistic inference and
preference elicitation.

Contact: `bac-coordinators@cs.stanford.edu`
Back to the Colloquium Page