Hi,Thank you for taking time to read this article and provide feedback. If we look at the equation for Policy Evaluation, it has a max over all actions and within that, a summation over next states. So we compute the V(s) to be the maximum utility (value) possible. There are also alternate ways of achieving this, like following the policy until convergence.