episodes: 0
RMSE TD(0): -
RMSE TD(lambda): -
delta_t = R_(t+1) + gamma V(S_(t+1)) - V(S_t)
e_t(s) = gamma lambda e_(t-1)(s), then increment visited state
V(s) ← V(s) + alpha delta_t e_t(s)
With lambda = 0, TD(lambda) behaves like TD(0). With lambda = 1, updates look more Monte Carlo-like.
Run one episode to inspect TD errors and eligibility traces.
| t |
S_t |
S_t+1 |
R |
delta |
e(A..E) |