Hi,Thank you for taking time to read this article and provide feedback.

Raghuveer Bhandarkar

1 min readJun 29, 2020

Hi,Thank you for taking time to read this article and provide feedback. If we look at the equation for Policy Evaluation, it has a max over all actions and within that, a summation over next states. So we compute the V(s) to be the maximum utility (value) possible. There are also alternate ways of achieving this, like following the policy until convergence.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Raghuveer Bhandarkar

21 Followers

4 Following

Machine Learning, Architecture, Georgia Tech Alumni.

Responses (1)

Write a response

What are your thoughts?

Also publish to my profile

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams