Download PDFOpen PDF in browserCurrent version

Learning to Self-Modify Rewards with Implicit Gradients

EasyChair Preprint 8260, version 1

Versions: 12history
10 pagesDate: June 12, 2022

Abstract

Reward shaping is a powerful technique for efficient learning of optimal policies in sequential decision-making. However, it is challenging to design auxiliary rewards to help the agent, and often needs considerable time and effort by domain experts. In this paper, we build on the optimal rewards methodology to adapt a given reward function. This problem can be naturally formulated as a meta-learning problem and solved in a bi-level optimization framework. However, standard approaches used in literature for these problems are not scalable. Hence we propose to use an implicit-gradient technique to solve this problem. We demonstrate the effectiveness of our method in both a) learning optimal rewards and b) adaptive reward shaping.

Keyphrases: Reinforcement Learning, Reward Shaping, meta-learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:8260,
  author    = {Aiden Boyd and Shibani and Will Callaghan},
  title     = {Learning to Self-Modify Rewards with Implicit Gradients},
  howpublished = {EasyChair Preprint 8260},
  year      = {EasyChair, 2022}}
Download PDFOpen PDF in browserCurrent version