Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning

EasyChair Preprint 5456

12 pages•Date: May 4, 2021

Saeed Tafazzol, Erfan Fathi, Mahdi Rezaei and Ehsan Asali

Abstract

Reward engineering and designing an incentive reward function is a non-trivial task for training agents in complex environments. Furthermore, a biased behavior which is far from the efficient one, may be extracted with this reward function. In this paper, we focus on training agents with binary success/failure reward function in Half Field Offense domain. The major advantage of this work is that the agent has no presumption about the environment which means it only follows the original formulation of reinforcement learning agents. The main challenge of using such a reward function is the high sparsity of positive reward signals. To address this problem, we use a simple prediction-based exploration strategy (called Curious Exploration) along with a Return-based Memory Restoration (RMR) technique which tends to remember more valuable memories. The proposed method can be utilized to train agents in environments with fairly complex state and action spaces. This paper concentrates on learning to score goals by a single agent in the domain of simulated RoboCup soccer. Experimental results show that while our baseline method completely fails to learn the task, our proposed method can converge easily to the nearly optimal behavior. The video presenting our trained agent’s behavior is available at http://bit.ly/HFO_Binary_Reward.

Keyphrases: Deep Reinforcement Learning, Half Field Offense, Parameterized Action Space, Prediction-based Exploration, Replay Memory, Soccer 2D Simulation, Sparse Binary Reward

Links:

https://easychair.org/publications/preprint/mFMQ

BibTeX entry

BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:

@booklet{EasyChair:5456,
  author    = {Saeed Tafazzol and Erfan Fathi and Mahdi Rezaei and Ehsan Asali},
  title     = {Curious Exploration and Return-based Memory Restoration for Deep Reinforcement Learning},
  howpublished = {EasyChair Preprint 5456},
  year      = {EasyChair, 2021}}

Download PDF Open PDF in browser