n-step Bootstrapping Chapter 7 Random Walk TD(0) vs TD(n) vs MC

Controls

episodes: 0
RMSE TD(0): -
RMSE TD(n): -
RMSE MC: -

Cheat Sheet

Same 5-state random walk as Sutton and Barto: start from C, move left/right with equal probability, reward 1 only if the walk hits R (otherwise 0 at L).
TD(0): V(S_t) ← V(S_t) + alpha [R_(t+1) + gamma V(S_(t+1)) - V(S_t)]
n-step target: G_t:t+n = sum(k=1..m) gamma^(k-1) R_(t+k) + 1[t+n < T] gamma^n V(S_(t+n))
MC target is full return G_t:t+T (no bootstrapping)

Last trajectory

Run one episode to inspect per-step n-step targets.
t S_t S_(t+1) R_(t+1) G_t:t+n

Visualization

State values (A-E) True vs TD(0) vs TD(n) vs MC
true values TD(0) n-step TD Monte Carlo
Learning curve RMSE by episode