Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
This paper was accepted at the workshop at “Human-in-the-Loop Learning Workshop” at NeurIPS 2022. Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In …
Read more “Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning”