Eligibility Traces Chapter 12 Random Walk TD(0) vs TD(lambda)

Controls

episodes: 0
RMSE TD(0): -
RMSE TD(lambda): -

Update Equations

delta_t = R_(t+1) + gamma V(S_(t+1)) - V(S_t)
e_t(s) = gamma lambda e_(t-1)(s), then increment visited state
V(s) ← V(s) + alpha delta_t e_t(s)
With lambda = 0, TD(lambda) behaves like TD(0). With lambda = 1, updates look more Monte Carlo-like.

Last Episode Trace View

Run one episode to inspect TD errors and eligibility traces.
t S_t S_t+1 R delta e(A..E)

Visualization

State values (A-E) True vs TD(0) vs TD(lambda)
true values TD(0) TD(lambda)
Learning curve RMSE by episode
Final eligibility traces End of latest episode