# Monte Carlo Tree Search for Policy Optimization

@inproceedings{Ma2019MonteCT, title={Monte Carlo Tree Search for Policy Optimization}, author={Xiaobai Ma and K. Driggs-Campbell and Zongzhang Zhang and Mykel J. Kochenderfer}, booktitle={IJCAI}, year={2019} }

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo… Expand

#### 2 Citations

Local Search for Policy Iteration in Continuous Control

- Computer Science, Mathematics
- ArXiv
- 2020

An algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework and introduces a form of tree search for continuous action spaces. Expand

#### References

SHOWING 1-10 OF 36 REFERENCES

Proximal Policy Optimization Algorithms

- Computer Science
- ArXiv
- 2017

We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a "surrogate" objective… Expand

Evolution-Guided Policy Gradient in Reinforcement Learning

- Computer Science, Mathematics
- NeurIPS
- 2018

Evolutionary Reinforcement Learning (ERL), a hybrid algorithm that leverages the population of an EA to provide diversified data to train an RL agent, and reinserts the RL agent into theEA population periodically to inject gradient information into the EA. Expand

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

- Mathematics, Computer Science
- ArXiv
- 2017

This work explores the use of Evolution Strategies (ES), a class of black box optimization algorithms, as an alternative to popular MDP-based RL techniques such as Q-learning and Policy Gradients, and highlights several advantages of ES as a blackbox optimization technique. Expand

Trust Region Policy Optimization

- Computer Science, Mathematics
- ICML
- 2015

A method for optimizing control policies, with guaranteed monotonic improvement, by making several approximations to the theoretically-justified scheme, called Trust Region Policy Optimization (TRPO). Expand

Progressive Strategies for Monte-Carlo Tree Search

- Computer Science
- 2008

Two progressive strategies for MCTS are introduced, called progressive bias and progressive unpruning, which enable the use of relatively time-expensive heuristic knowledge without speed reduction. Expand

Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning

- Computer Science
- ArXiv
- 2017

It is shown that combining DNNs with novelty search, which was designed to encourage exploration on tasks with deceptive or sparse reward functions, can solve a high-dimensional problem on which reward-maximizing algorithms fail, and expands the sense of the scale at which GAs can operate. Expand

High-Dimensional Continuous Control Using Generalized Advantage Estimation

- Computer Science, Mathematics
- ICLR
- 2016

This work addresses the large number of samples typically required and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias. Expand

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

- Computer Science, Mathematics
- NIPS
- 2017

This work proposes to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature with trust region, which is the first scalable trust region natural gradient method for actor-critic methods. Expand

Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari

- Computer Science
- IJCAI
- 2018

It is demonstrated that even a very basic canonical ES algorithm can achieve the same or even better performance than traditional RL algorithms, and is likely to lead to new advances in the state-of-the-art for solving RL problems. Expand

Safe mutations for deep and recurrent neural networks through output gradients

- Computer Science
- GECCO
- 2018

A family of safe mutation (SM) operators that facilitate exploration without dramatically altering network behavior or requiring additional interaction with the environment are proposed, which dramatically increases the ability of a simple genetic algorithm-based neuroevolution method to find solutions in high-dimensional domains that require deep and/or recurrent neural networks. Expand