Reinforcement Learning 17: Frontiers
17 Frontiers
17.1 General Value Functions and Auxiliary Tasks
Auxiliary tasks can help on the main task is that they may require some of the same representation as are needed on the main task.
CNN with different heads
17.2 Temporal Abstraction via Options
http://www-anw.cs.umass.edu/~barto/courses/cs687/simsek-lecture2.pdf
Options:
- A generalization of actions
- Starting from a finite MDP, specify a way of choosing actions until termination
- Example: go-to-hallway, open-the-door
17.3 Observations and State
environment’s state -> observations
history of observations and actions -> summary of history -> definition of Markov state
state-update function
approximation state: the last k observations and actions
17.4 Designing Reward Signals
sparse reward
initial value function, shaping tech
17.5 Remaining Issues
- powerful parametric function approximation
- methods for learning features
- scalable methods for planning with learning models
- automating the choice of subproblems on which an agent works and which it uses to structure its developing mind.