Random Walk Prediction Chapter 5 Subpage TD(0) vs MC Example Walk

Controls

episodes: 0
TD RMSE: -
MC RMSE: -

Cheat Sheet

Five non-terminal states A-E lie between terminal states L and R. Start from C, move left/right uniformly, with reward 1 only when hitting R and 0 otherwise.
TD(0): V(S_t) <- V(S_t) + alpha [R_(t+1) + V(S_(t+1)) - V(S_t)]
MC: V(S_t) <- V(S_t) + alpha [G_t - V(S_t)]

Last trajectory

Run at least one episode to inspect visited states and terminal reward.
step state next reward

Visualization

State values (A-E) True vs TD vs MC
true values TD(0) Monte Carlo
Learning curve RMSE by episode