17 Frontiers

17.1 General Value Functions and Auxiliary Tasks

Auxiliary tasks can help on the main task is that they may require some of the same representation as are needed on the main task.

CNN with different heads

17.2 Temporal Abstraction via Options



  • A generalization of actions
  • Starting from a finite MDP, specify a way of choosing actions until termination
  • Example: go-to-hallway, open-the-door

17.3 Observations and State

environment’s state -> observations

history of observations and actions -> summary of history -> definition of Markov state

state-update function

approximation state: the last k observations and actions

17.4 Designing Reward Signals

sparse reward

initial value function, shaping tech

17.5 Remaining Issues

  • powerful parametric function approximation
  • methods for learning features
  • scalable methods for planning with learning models
  • automating the choice of subproblems on which an agent works and which it uses to structure its developing mind.

17.6 Reinforcement Learning and the Future of Artificial Intelligence