CPSC422 Spring 2012
Assignment 4

Due: 10:00am, Monday 6 February 2012.

You are encouraged to discuss this assignment and collaborate with your classmates, as long as (a) you list the people with whom you discussed the assignment and (b) you give your own answers to the questions. Feel free to ask questions on the Vista bulletin board.

Question 1

Consider the model-based reinforcement learner at: http://artint.info/demos/rl/sGameModel.html, and the controller in the dostep method of SGameModelController.java.

  1. The current applet uses the empirical proportion as the probability. Does adding pseudo-counts improve the performance? Can you explain why or why not?

  2. Suppose we wanted to measure the performance in terms of the number of asynchronous value iteration updates. Does doing more than one update per step improve the performance of the learner in terms of number of updates? (The current applet records the number of actions taken, not the number of updates. You do not need to change the code to do this question, but you need to report in terms of number of updates, not number of steps.)

Solution:

  1. To make it use pseudo-counts, we change the line 139 to

     qs1a1+= (transitions[s1][a1][s2]+1.0)*(rewards[s1][a1]+discount*value(s2))
              /(STATE_SIZE+(double)count_actions_in_state[s1][a1]);
    

    And them we can remove the surrounding conditional. This works much worse because the prior does not reflect how the world works: not every transition is possible. It takes a long time to overcome this prior.

  2. To check this, I compared 100000 steps at 1 update per step, with 50000 steps at 2 updates per step and 20000 steps at 5 updates per step (each of which does the same number of updates).

    The 100000 steps at 1 update per step, was much better with approximately 9000-12000 rewards (considering the 10 and 90 percentile of the rewards).

    The 50000 steps at 2 updates per step, was much worse with rewards of 2600-3900.

    20000 steps at 5 updates per step was again much worse (and much more variable) with rewards of -2000 to 200.

    So if experiences are cheap, it is better to use our time collecting experiences than updating Q-values.

Question 2

To determine the best prediction for a single Boolean random variable on test data, assume that the data cases are generated stochastically according to some true parameter $p_0$. Try the following a number of time (say 1000).

For each of the optimality criteria – sum of absolute values, sum of squares, and likelihood (or entropy) – which of the following gives a lower error on the test set:

the mode

$n_1/(n_0+n_1)$

if $n_1=0$, use $0.001$, if $n_0=0$, use $0.999$, else use $n_1/(n_0+n_1)$. (Try this for different numbers when the counts are zero.)

$(n_1+1)/(n_0+n_1+2)$

$(n_1+\alpha )/(n_0+n_1+2\alpha )$ for different values of $\alpha >0$

another predictor that is a function of $n_0$ and $n_1$.

Solution:

The python file http://cs.ubc.ca/~poole/cs422/2012/as4/testPredictionSol.py produces the data needed. This gives the sum of errors for 1,000,000 test cases (1000 test cases for 1000 different values for $p_0$).

For sum of absolute values, the mode is best, whenever $k>2$. When $k=2$ the empirical average is the same as the mode. Any pseudo-count makes the prediction worse.

For the sum of squares error, Laplace smoothing (a pseudo count of 1) works best, although the pseudo-count doesn’t make much difference for more than 20 training examples (only changes the 3rd decimal point). Just removing the 0,1 predictions is not as good as using pseudo-counts.

For the entropy, Laplace smoothing (a pseudo-count of 1) works best. The mode always has infinite entropy and the empirical average has infinite error for small training set sizes. Just removing the 0,1 predictions avoids the infinite entropy, but isn’t as good as adding pseudo counts.

Question 3

How long did this assignment take? What did you learn? Was it reasonable?

This should have takes a couple of hours. You should have learned about the role of training and test sets, and the role of pseudo-counts.