Alan K. Mackworth's Publications

Sorted by DateClassified by Publication TypeSorted by First Author Last NameClassified by Author Last Name

Using Spatial Hints to Improve Policy Reuse in a Reinforcement Learning Agent

Bruno da Silva and Alan K. Mackworth. Using Spatial Hints to Improve Policy Reuse in a Reinforcement Learning Agent. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010, pp. 317–324, Toronto, Canada, May 2010.

Download

[PDF]316.9kB  

Abstract

We study the problem of knowledge reuse by a reinforcement learning agent. We are interested in how an agent can exploit policies that were learned in the past to learn a new task more efficiently in the present. Our approach is to elicit spatial hints from an expert suggesting the world states in which each existing policy should be more relevant to the new task. By using these hints with domain exploration, the agent is able to detect those portions of existing policies that are beneficial to the new task, therefore learning a new policy more efficiently. We call our approach Spatial Hints Policy Reuse (SHPR). Experiments demonstrate the effectiveness and robustness of our method. Our results encourage further study investigating how much more efficacy can be gained from the elicitation of very simple advice from humans.

BibTeX

@InProceedings{BrunoAAMAS2010,
  author =	 {Bruno da Silva and Alan K. Mackworth},
  title =	 {Using Spatial Hints to Improve Policy Reuse in a Reinforcement Learning Agent},
  year =	 {2010}, 
  month = {May},
  booktitle =	 {Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2010},
  address =      {Toronto, Canada},
  pages = {317-324},
  abstract =	 {We study the problem of knowledge reuse by a reinforcement learning agent. We are interested in how an agent can exploit policies that were learned in the past to learn a new task more efficiently in the present. Our approach is to elicit spatial hints from an expert suggesting the world states in which each existing policy should be more relevant to the new task. By using these hints with domain exploration, the agent is able to detect those portions of existing policies that are beneficial to the new task, therefore learning a new policy more efficiently. We call our approach Spatial Hints Policy Reuse (SHPR). Experiments demonstrate the effectiveness and robustness of our method. Our results encourage further study investigating how much more efficacy can be gained from the elicitation of very simple advice from humans.},
  bib2html_pubtype ={Refereed Conference Proceeding},
  bib2html_rescat ={},
}

Generated by bib2html.pl (written by Patrick Riley ) on Wed Apr 23, 2014 19:08:35