Learning Complex Neural Network Policies with Trajectory Optimization

Sergey Levine [personal website]
Computer Science Department, Stanford University, Stanford, CA 94305 USA
Vladlen Koltun [personal website]
Adobe Research, San Francisco, CA 94103 USA
Proceedings of the 31st International Conference on Machine Learning (ICML 2014)
Abstract

Direct policy search methods offer the promise of automatically learning controllers for complex, high-dimensional tasks. However, prior applications of policy search often required specialized, low-dimensional policy classes, limiting their generality. In this work, we introduce a policy search algorithm that can directly learn high-dimensional, general-purpose policies, represented by neural networks. We formulate the policy search problem as an optimization over trajectory distributions, alternating between optimizing the policy to match the trajectories, and optimizing the trajectories to match the policy and minimize expected cost. Our method can learn policies for complex tasks such as bipedal push recovery and walking on uneven terrain, while outperforming prior methods.

Paper (with appendices): [PDF]

Push recovery training and test: [MP4]

Uneven terrain training and test: [MP4]

Additional push recoveries: [MP4]

BibTeX Citation

@inproceedings{2014-cgps,
    author = {Sergey Levine and Vladlen Koltun},
    title = {Learning Complex Neural Network Policies with Trajectory Optimization},
    booktitle = {ICML '14: Proceedings of the 31st International Conference on Machine Learning},
    year = {2014},
}
Supplementary Videos
Push recovery training and test
Uneven terrain training and test
Additional push recoveries