Algorithm Example: Deep TAMER
A natural way for humans to guide an agent's learning is to observe its inputs and provide feedback on its actions. This translates directly to incorporating human feedback into reinforcement learning by assigning human feedback as state-action value. Deep TAMER is a prominent human-guided RL framework that leverages this concept by enabling humans to offer discrete, time-stepped positive or negative feedback. To account for human reaction time, a credit assignment mechanism maps feedback to a window of state-action pairs. A neural network
The algorithm example we provide here is enhances the original Deep TAMER in numerous ways. First, the original Deep TAMER formulation relies on DQN, which only works with discrete action spaces. We designed a continuous version of Deep TAMER while adopting state-of-the-art reinforcement learning implementation practices. We implement an actor-critic framework to handle continuous action space. Here, the critic is the human feedback estimator