AcademicMachine Learning

Whence reward

How to define reward in a reinforcement learning framework?

  • Programming
  1. Coding: translate the goals of behaviour into reward values, takes states outputs rewards
  2. Human-in-the-loop: source of reward is person, non-stationary reward
  • Example
  1. Mimic reward: copy the given reward
  2. Inverse reinforcement learning: learner would figure out what rewards the trainer must have been maximizing that makes this behaviour optimal
  • Indirect approaches, optimization
  1. Evolutionary optimizationH high-level behaviour we can create a score for, and optimization would search for reward to encourage the behaviour
  2. Meta RL: learning at evolutionary level that creates better ways of learning at the individual level

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses User Verification plugin to reduce spam. See how your comment data is processed.