Whence reward
How to define reward in a reinforcement learning framework?
- Programming
- Coding: translate the goals of behaviour into reward values, takes states outputs rewards
- Human-in-the-loop: source of reward is person, non-stationary reward
- Example
- Mimic reward: copy the given reward
- Inverse reinforcement learning: learner would figure out what rewards the trainer must have been maximizing that makes this behaviour optimal
- Indirect approaches, optimization
- Evolutionary optimizationH high-level behaviour we can create a score for, and optimization would search for reward to encourage the behaviour
- Meta RL: learning at evolutionary level that creates better ways of learning at the individual level