n-step Bootstrapping

Controls

Episodes / batch

Auto-play speed (ms)

n-step horizon n 4

Discount gamma 1.00

TD(0) alpha 0.10

TD(n) alpha 0.10

MC alpha 0.03

episodes: 0

RMSE TD(0): -

RMSE TD(n): -

RMSE MC: -

Cheat Sheet

Same 5-state random walk as Sutton and Barto: start from C, move left/right with equal probability, reward 1 only if the walk hits R (otherwise 0 at L).

TD(0): V(S_t) ← V(S_t) + alpha [R_(t+1) + gamma V(S_(t+1)) - V(S_t)]

n-step target: G_t:t+n = sum(k=1..m) gamma^(k-1) R_(t+k) + 1[t+n < T] gamma^n V(S_(t+n))

MC target is full return G_t:t+T (no bootstrapping)

Last trajectory

Run one episode to inspect per-step n-step targets.

t	S_t	S_(t+1)	R_(t+1)	G_t:t+n

Visualization

State values (A-E) True vs TD(0) vs TD(n) vs MC

true values TD(0) n-step TD Monte Carlo

Learning curve RMSE by episode