Reinforcement Learning 17: Frontiers

May 18, 2019

17 Frontiers

17.1 General Value Functions and Auxiliary Tasks

Auxiliary tasks can help on the main task is that they may require some of the same representation as are needed on the main task.

CNN with different heads

17.2 Temporal Abstraction via Options

http://www-anw.cs.umass.edu/~barto/courses/cs687/simsek-lecture2.pdf

Options:

A generalization of actions
Starting from a finite MDP, specify a way of choosing actions until termination
Example: go-to-hallway, open-the-door

17.3 Observations and State

environment’s state -> observations

history of observations and actions -> summary of history -> definition of Markov state

state-update function

approximation state: the last k observations and actions

17.4 Designing Reward Signals

sparse reward

initial value function, shaping tech

17.5 Remaining Issues

powerful parametric function approximation
methods for learning features
scalable methods for planning with learning models
automating the choice of subproblems on which an agent works and which it uses to structure its developing mind.

17.6 Reinforcement Learning and the Future of Artificial Intelligence