Abstract
A core novelty of Alpha Zero is the interleaving of tree search and deeplearning, which has proven very successful in board games like Chess, Shogi andGo. These games have a discrete action space. However, many real-worldreinforcement learning domains have continuous action spaces, for example inrobotic control, navigation and self-driving cars. This paper presents thenecessary theoretical extensions of Alpha Zero to deal with continuous actionspace. We also provide some preliminary experiments on the Pendulum swing-uptask, empirically showing the feasibility of our approach. Thereby, this workprovides a first step towards the application of iterated search and learningin domains with a continuous action space.