Reinforcement Learning 7: n-step Bootstrapping
7 n-step Bootstrapping
7.1 $n$-step TD Prediction
$n$-step TD for estimating $V \approx v_\pi$
Example 7.1: $n$-step TD Methods on the Random Walk
7.2 $n$-step Sarsa
$n$-step Sarsa for estimating $Q \approx q_*$, or $Q \approx q_\pi$ for a given $\pi$
7.3 $n$-step Off-policy Learning by Importance Sampling
7.4 *Per-reward Off-policy Methods
7.5 Off-policy Learning Without Importance Sampling: The $n$-step Tree Backup Algorithm
$n$-step Tree Backup for estimating $Q \approx q*$, or $Q \approx q_\pi$ for a given $\pi$
7.6 *A Unifying Algorithm: $n$-step $Q(\sigma)$
Off-policy $n$-step $Q(\sigma)$ for estimating $Q \approx q*$, or $Q \approx q_\pi$ for a given $\pi$