# CPSC422 Spring 2012Assignment 4

Due: 10:00am, Monday 6 February 2012.

You are encouraged to discuss this assignment and collaborate with your classmates, as long as (a) you list the people with whom you discussed the assignment and (b) you give your own answers to the questions. Feel free to ask questions on the Vista bulletin board.

# Question 1

Consider the model-based reinforcement learner at: http://artint.info/demos/rl/sGameModel.html, and the controller in the dostep method of SGameModelController.java.

1. The current applet uses the empirical proportion as the probability. Does adding pseudo-counts improve the performance? Can you explain why or why not?

2. Suppose we wanted to measure the performance in terms of the number of asynchronous value iteration updates. Does doing more than one update per step improve the performance of the learner in terms of number of updates? (The current applet records the number of actions taken, not the number of updates. You do not need to change the code to do this question, but you need to report in terms of number of updates, not number of steps.)

# Question 2

To determine the best prediction for a single Boolean random variable on test data, assume that the data cases are generated stochastically according to some true parameter . Try the following a number of time (say 1000).

• choose a number randomly. This is going to be the “ground truth”.

• Generate training examples (try various values for , some small, say 5, 10, 20, and some large, say 1,000) by sampling with probability .

• From the training set, generate , the number of false instances in the training set and , the number of true instances.

• Generate a test set that contains many (say 1000) test cases using the same parameter .

For each of the optimality criteria – sum of absolute values, sum of squares, and likelihood (or entropy) – which of the following gives a lower error on the test set:

the mode

if , use , if , use , else use . (Try this for different numbers when the counts are zero.)

for different values of

another predictor that is a function of and .

# Question 3

How long did this assignment take? What did you learn? Was it reasonable?