On-policy Monte Carlo Control 3 # In the previous section, we used the assumption of exploring starts (ES) to design a Monte Carlo control method called MCES. Consider the following MDP, with two states B and C , with 1 action in state B and two actions in state C , with =1. Monte Carlo Prediction is a method used in reinforcement learning to estimate the value function of a policy by averaging the returns of multiple episodes. 