1 Introduction

1.1 Reinforcement Learning

a policy, a reward signal , a value function and, optionally, a model of the environment

model-based or model-free.

Markovian decision processes(MDPs)

Bellman equations

dynamic programming

Monte Carlo methods

trial-and-error

“Law of Effect”

k-armed bandit

temporal-difference

actor-critic architecture

Q-learning