TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial…

Follow publication

ReInforcement Learning

Policy Iteration in RL: A step by step Illustration

Raghuveer Bhandarkar
TDS Archive
Published in
7 min readMar 25, 2020

Source: Image by Annalise Batista from Pixabay

Overview of the game:

Rewards (Positive and Negative):

A quick review of the ‘Policy Iteration’ algorithm:

What is a Policy?

Step 1:

Step 2:

Policy Evaluation¹

Step 3:

Policy Improvement¹

Policy Iterations:

State transition diagram:

State Transition Diagram

Transition probability matrix:

Transition Probability Matrix for Action North
Transition Probability Matrix for Action South

Policy Iteration algorithm:

Initial random policy:

First iteration:

I Iteration: Policy Improvement

Second iteration:

II Iteration: Policy Evaluation
II Iteration: Policy Improvement

Third iteration:

III Iteration: Policy Evaluation
III Iteration: Policy Improvement

Conclusion:

References:

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Raghuveer Bhandarkar
Raghuveer Bhandarkar

Written by Raghuveer Bhandarkar

Machine Learning, Architecture, Georgia Tech Alumni.

Responses (3)

Write a response