Planning and Learning with Tabular Methods
Chapter 8Dyna-QPlanning vs No Planning
Controls
episodes: 0
Q-learning last steps: -
Dyna-Q last steps: -
speedup: -
Cheat Sheet
Both agents act in the same maze from start S to goal G.
Q-learning updates only from real transitions, while Dyna-Q also uses model-simulated planning updates.