-
Prediction and Control with Function Approximation (Coursera)
-
Dynamic Programming (Sutton & Barto)
-
Finite Markov Decision Processes (Sutton & Barto)
-
Multi-armed Bandits (Sutton & Barto)
-
Temperal Difference Learning (Sutton & Barto)
-
Monte Carlo Methods (Sutton & Barto)
-
The Last Marble
-
Policy Improvement Theorem (Sutton & Barto)
-
Importance Sampling in Off-Policy Reinforcement Learning Algorithms