Using Spatial Hints to Improve Policy Reuse in a Reinforcement
By Bruno da Silva
We study the problem of knowledge reuse by a reinforcement learning agent.
We are interested in how an agent can exploit policies that were learned in
the past to learn a new task more efficiently in the present. Our approach
is to elicit spatial hints from an expert suggesting the world states in
which each existing policy should be more relevant to the new task. By
confronting these hints with domain exploration, the agent is able to
distinguish those portions of existing policies that are beneficial to the
new task, therefore learning a new policy more efficiently. We call our
approach Spatial Hints Policy Reuse (SHPR). Experiments demonstrate the
effectiveness of our method, and its robustness against bad inputs.
Visit the LCI Forum page