Class Q10_Controller

java.lang.Object
  extended by Q10_Controller

public class Q10_Controller
extends java.lang.Object

This applet demonstrates Q-learning for a particular grid world problem. It isn't designed to be general or reusable.

Copyright (C) 2003-2006 David Poole.

This program gives Q-learning code. The GUI is in Q_GUI.java. The controller code is at Q_Controller.java, and the environment simulation is at Q_Env.java.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


Field Summary
 boolean alphaFixed
           
 double discount
           
 double[][][] qvalues
          The Q values: Q[xpos,ypos,action]
 boolean tracing
           
 int[][][] visits
          The number of times the agent has been at (xpos,ypos) and done action
 
Method Summary
 void doreset(double initVal)
          resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0
 void dostep(int action)
          does one step carries out the action
 void dostep(int action, double newdiscount, double alphaFieldValue)
          does one step carries out the action, and sets the discount and the alpha value
 void doSteps(int count, double greedyProb, double newdiscount, double alphaFieldValue)
          does count number of steps whether each step is greedy or random is determine by greedyProb
 double value(int xval, int yval)
          determines the value of a location the value is the maximum, for all actions, of the q-value
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

qvalues

public double[][][] qvalues
The Q values: Q[xpos,ypos,action]


visits

public int[][][] visits
The number of times the agent has been at (xpos,ypos) and done action


discount

public double discount

alphaFixed

public boolean alphaFixed

tracing

public boolean tracing
Method Detail

doreset

public void doreset(double initVal)
resets the Q-values sets all of the Q-values to initVal, and all of the visit counts to 0

Parameters:
initVal - the initial value to set all values to

dostep

public void dostep(int action,
                   double newdiscount,
                   double alphaFieldValue)
does one step carries out the action, and sets the discount and the alpha value

Parameters:
action - the action that the agent does
newdiscount - the discount to use
alphaFieldValue - the alpha value to use

dostep

public void dostep(int action)
does one step carries out the action

Parameters:
action - the action that the agent does

value

public double value(int xval,
                    int yval)
determines the value of a location the value is the maximum, for all actions, of the q-value

Parameters:
xval - the x-coordinate
yval - the y-coordinate
Returns:
the value of the (xval,yval) position

doSteps

public void doSteps(int count,
                    double greedyProb,
                    double newdiscount,
                    double alphaFieldValue)
does count number of steps whether each step is greedy or random is determine by greedyProb

Parameters:
count - the number of steps to do
greedyProb - the probability that is step is chosen greedily
newdiscount - the discount to use
alphaFieldValue - the alpha value to use