Q-learning
Unlike the SARSA algorithm, Q-learning uses the current knowledge of the action-value function (Q-function) to update the target policy greedily:
Unlike the SARSA algorithm, Q-learning uses the current knowledge of the action-value function (Q-function) to update the target policy greedily: