Raghuveer Bhandarkar
1 min readJun 29, 2020

--

Hi,Thank you for taking time to read this article and provide feedback. If we look at the equation for Policy Evaluation, it has a max over all actions and within that, a summation over next states. So we compute the V(s) to be the maximum utility (value) possible. There are also alternate ways of achieving this, like following the policy until convergence.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Raghuveer Bhandarkar
Raghuveer Bhandarkar

Written by Raghuveer Bhandarkar

Machine Learning, Architecture, Georgia Tech Alumni.

Responses (1)

Write a response