Windy Gridworld Chapter 5 Subpage SARSA Control

Controls

episodes: 0
last steps: -
best steps: -

Environment

7x10 grid. Start is S=(3,0), goal is G=(3,7). Actions are up/down/left/right. Each step gives reward -1. Wind pushes upward by column strengths [0,0,0,1,1,1,2,2,1,0].
SARSA: Q(s,a) <- Q(s,a) + alpha [r + gamma Q(s',a') - Q(s,a)]
Q-learning: Q(s,a) <- Q(s,a) + alpha [r + gamma max_a' Q(s',a') - Q(s,a)]

Last trajectory

Run at least one episode to inspect the path to goal.
step state action next

Visualization

Grid policy Greedy arrows + last path
low value mid value high value last trajectory
Learning curve Steps per episode