Guided Policy Search

Sergey Levine
Vladlen Koltun
[personal website]
[personal website]
Computer Science Department, Stanford University, Stanford, CA 94305 USA
Proceedings of the 30th International Conference on Machine Learning (ICML 2013)
Abstract

Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.

Paper (with appendices): [PDF]

Planar policies video: [MP4]

3D running video: [MP4]

BibTeX Citation

@inproceedings{2013-gps,
    author = {Sergey Levine and Vladlen Koltun},
    title = {Guided Policy Search},
    booktitle = {ICML '13: Proceedings of the 30th International Conference on Machine Learning},
    year = {2013},
}
Supplementary Videos
Planar Policies
3D Running