SolutionΒΆ
The solution to the MDP generated by any solver can be extracted in the form of a policy function. The policy function takes as arguments the state variables in the same order defined for the transition and reward functions.
To obtain the policy from the solution to the mdp defined above, we call
policy = getpolicy(mdp, solution)
If we want to query the optimal policy to take at the state stateq = (12, "no"), we can pass the query to the policy function as follows.
actionq = policy(stateq...)
Above, actionq takes on the value "E". This action makes sense since we’re to the west of the midpoint goal for the problem, and moving east would bring us closer to the goal. Note that we used the trailing ellipsis ... to expand stateq. An equally valid function call is
actionq = policy(12, "no")
For online policies, like those generated by MCTS, the policy generally improves as it receives more queries.