Parallel Value IterationΒΆ
To reap the benefits of Julia’s parallel computing framework for value iteration, we need a few more steps. The main issue we have to get around is code availability when we add processes. But we’ll skip an in-depth explanation and just go straight to what we can do.
We consider a quick and dirty example of running the exact same code as in the MDP with T(s, a) type transition on PLite’s parallel value iteration solver. First, we wrap our existing code under the module ExampleModule (you can name it whatever you want), and save it under the file name ExampleModule.jl. As our naming scheme suggests, the module and file should share the same name. Below is what should be saved to the file.
module ExampleModule
export
mdp,
solver,
solve,
getpolicy
using PLite
# constants
const MinX = 0
const MaxX = 100
const StepX = 20
# mdp definition
mdp = MDP()
statevariable!(mdp, "x", MinX, MaxX) # continuous
statevariable!(mdp, "goal", ["no", "yes"]) # discrete
actionvariable!(mdp, "move", ["W", "E", "stop"]) # discrete
transition!(mdp,
["x", "goal", "move"],
function mytransition(x::Float64, goal::AbstractString, move::AbstractString)
function isgoal(x::Float64)
if abs(x - MaxX / 2) < StepX
return "yes"
else
return "no"
end
end
if isgoal(x) == "yes" && goal == "yes"
return [([x, isgoal(x)], 1.0)]
end
if move == "E"
if x >= MaxX
return [
([x, isgoal(x)], 0.9),
([x - StepX, isgoal(x - StepX)], 0.1)]
elseif x <= MinX
return [
([x, isgoal(x)], 0.2),
([x + StepX, isgoal(x + StepX)], 0.8)]
else
return [
([x, isgoal(x)], 0.1),
([x - StepX, isgoal(x - StepX)], 0.1),
([x + StepX, isgoal(x + StepX)], 0.8)]
end
elseif move == "W"
if x >= MaxX
return [
([x, isgoal(x)], 0.1),
([x - StepX, isgoal(x - StepX)], 0.9)]
elseif x <= MinX
return [
([x, isgoal(x)], 0.9),
([x + StepX, isgoal(x + StepX)], 0.1)]
else
return [
([x, isgoal(x)], 0.1),
([x - StepX, isgoal(x - StepX)], 0.8),
([x + StepX, isgoal(x + StepX)], 0.1)]
end
elseif move == "stop"
return [([x, isgoal(x)], 1.0)]
end
end
)
reward!(mdp,
["x", "goal", "move"],
function myreward(x::Float64, goal::AbstractString, move::AbstractString)
if goal == "yes" && move == "stop"
return 1
else
return 0
end
end
)
# solver options
solver = ParallelValueIteration()
discretize_statevariable!(solver, "x", StepX)
end
On top of the keyword arguments available to SerialValueIteration, ParallelValueIteration has an additional nthreads keyword argument. The default value is CPU_CORES / 2.
CPU_CORES is a Julia standard library constant, and it defaults to the number of CPU cores in your system. But the number of cores given usually includes virtual cores (e.g., Intel processors), so we divide by two to obtain the number of physical cores. There isn’t an issue with increasing the number of cores. But since we have the same number of cores doing the same number of work, there won’t be an increase in efficiency. In fact, with greater number of threads there may be more overhead and runtime processes. As such, we recommend using as many threads as there are physical cores on the machine. In the case of the parallel solver, we can define
solver = ParallelValueIteration(
tol=1e-6,
maxiter=10000,
discount=0.999,
verbose=false,
nthreads=10)
As in the serial solver, PLite needs a definition of the discretization scheme.
Notice that there are two modifications to the code being wrapped (in addition to putting it in ExampleModule and using ParallelValueIteration):
- we removed the
solvebit that generated the solution - we added the
exportkeyword that makes the objects and functions available to the user (either in on the console or the Jupyter notebook)
export
mdp,
solver,
solve,
getpolicy
On the console or Jupyter notebook, we then input the following.
const NThreads = int(CPU_CORES / 2)
addprocs(NThreads - 1) # -1 to account for existing process
using ExampleModule
# generate results
solution = solve(mdp, solver)
Notice we add the desired number of processes before loading the module. This sequence of code evaluation allows all processes to get the code on ExampleModule. We then call solve on the mdp and solver to obtain the solution.