Random Walk TD

Controls

Episodes / batch

Auto-play speed (ms)

TD step size alpha 0.10

MC step size alpha 0.02

episodes: 0

TD RMSE: -

MC RMSE: -

Cheat Sheet

Five non-terminal states A-E lie between terminal states L and R. Start from C, move left/right uniformly, with reward 1 only when hitting R and 0 otherwise.

TD(0): V(S_t) <- V(S_t) + alpha [R_(t+1) + V(S_(t+1)) - V(S_t)]

MC: V(S_t) <- V(S_t) + alpha [G_t - V(S_t)]

Last trajectory

Run at least one episode to inspect visited states and terminal reward.

step	state	next	reward

Visualization

State values (A-E) True vs TD vs MC

true values TD(0) Monte Carlo

Learning curve RMSE by episode