Download PDFOpen PDF in browserCurious Exploration and Return-based Memory Restoration for Deep Reinforcement LearningEasyChair Preprint 545612 pages•Date: May 4, 2021AbstractReward engineering and designing an incentive reward function is a non-trivial task for training agents in complex environments. Furthermore, a biased behavior which is far from the efficient one, may be extracted with this reward function. In this paper, we focus on training agents with binary success/failure reward function in Half Field Offense domain. The major advantage of this work is that the agent has no presumption about the environment which means it only follows the original formulation of reinforcement learning agents. The main challenge of using such a reward function is the high sparsity of positive reward signals. To address this problem, we use a simple prediction-based exploration strategy (called Curious Exploration) along with a Return-based Memory Restoration (RMR) technique which tends to remember more valuable memories. The proposed method can be utilized to train agents in environments with fairly complex state and action spaces. This paper concentrates on learning to score goals by a single agent in the domain of simulated RoboCup soccer. Experimental results show that while our baseline method completely fails to learn the task, our proposed method can converge easily to the nearly optimal behavior. The video presenting our trained agent’s behavior is available at http://bit.ly/HFO_Binary_Reward. Keyphrases: Deep Reinforcement Learning, Half Field Offense, Parameterized Action Space, Prediction-based Exploration, Replay Memory, Soccer 2D Simulation, Sparse Binary Reward
|