Click a cell on the grid to inspect a state’s backup. In Sutton’s example:
A gives +10 and teleports to A′, B gives +5 and teleports to B′,
and hitting the wall gives −1 (state unchanged).
Convergence curve is on the right (max Δ per sweep).
Backup breakdown (selected state)
Click a state to see its Bellman backup terms.
action
next state
reward
backup term
V(s) update will appear here after you select a state.