Untitled

Practice Final Exam

April 2005

Note that this practice exam emphasises parts that are not in the textbook. The final exam will cover all parts of the course.

Suppose our Q-learning agent, with fixed alpha, and discount gamma, was in state 34 did action 7, received reward 3 and ended up in state 65. What value(s) get updated? Give an expression for the new value. (You need to be as specific as possible)

In temporal difference learning (e.q. Q-learning), to get the average of a sequence of k values, we let alpha = 1/k. Explain why it may be advantageous to keep alpha fixed in the context of reinforcement learning.

Explain what happens in reinforcement learning if the agent always chooses the action that maximizes the Q-value. Suggest two ways that can force the agent to explore.

In MDPs and reinforcement learning explain why we often use discounting of future rewards.

What is the main difference between asynchronous value iteration and standard value iteration? Why does asynchronous value iteration often work better than standard value iteration?

In learning under uncertainty, when the the EM algorithm used? What is the E-step? What is the M-step?

Why don't we use the empirical frequence when learning probabilities from data? That is, if we observe n occurences of A out of m cases, why shouldn't we just use n/m as our probability estimate?

Suppose that in a decision network, the decsion variable run has parents look and see. Suppose that we are using variable elimination to find the optimal policy. Suppose that after eliminating all of the other variables, we have the factor

look	see	run	value
true	true	yes	23
true	true	no	8
true	false	yes	37
true	false	no	56
false	true	yes	28
false	true	no	12
false	false	yes	18
false	false	no	22

What is the resulting factor after eliminating run? (Hint: you do not sum out run as it is a decision variable).
What is the optimal decision function for run?

Suppose that, in a decision network, there were arcs from random variables "contaminated specimen" and "positive test" to the decision variable "discard sample". Sally solved the decision network and discovered that there was a unique optimal policy:

contaminated specimen	positive test	discard sample
true	true	yes
true	false	no
false	true	yes
false	false	no

What can you say about the value of information in this case? [You will only get full marks for a precise statements contained in full sentences.]

In a nuclear research submarine, a sensor measures the temperature of the reactor core. An alarm is triggered (A=true) if the sensor reading is abnormally high (S=true), indicating an overheating of the core (C=true). The alarm and/or the sensor can be defective (S_ok=false, A_ok=false) which can cause them to malfunction. The alarm system can be modelled by the following belief network (all variables are Boolean):

Suppose we add a second, identical sensor to the system and trigger the alarm when either of the sensors reads a high temperature. The two sensors break and fail independently. Give the corresponding, extended belief network. (Draw the graph and specify any new conditional probabilities).
When an alarm is observed, a decision is made whether to shut down the reactor. Shutting down the reactor has a cost c_s associated with it (independent of whether the core was overheating), while not shutting down an overheated core incurs a cost c_m much higher than c_s. Draw the decision network modelling this decision problem for the original system (i.e., only one sensor). Specify any new tables that need to be defined (you should use the parameters c_s and c_m where appropriate in the tables). You can assume that the utility is the negative of cost.
For the decision network in part (c), suppose you need to compute the optimal policy, and the value of the optimal policy using variable elimination. Show, for one legal elimination ordering, what variables are eliminated and what factors are created. [Just give the variables in the factors, not the tables of numbers.]
How can the decision function(s) and its expected value be extracted from the elimination of part (d).

Suppose you have a job at a company that is building online teaching tools. As you have taken more than one AI course, and have done a number of assignments using different techniques for the same problem, your boss wants to know your opinion on various options they are considering.

They are planning on building an intelligent tutoring system for teaching elementary physics (e.g., mechanics and electro-magnetism). One of the things that the system will need to do is to diagnose errors that a student may be making.

For each of the following, answer the explicit questions and use proper English. Answering parts not asked or giving more than one answer when only one is asked for will annoy the boss and result in reduced marks in the exam. The boss also doesn't like jargon, so please use straightforward English.

The boss has heard of consistency-based diagnosis, abductive diagnosis and belief networks, but wants to know what they involve in the context of building an intelligent tutoring system for teaching elementary physics.

Explain what knowledge (about physics and about students) consistency-based diagnosis requires.
Explain what knowledge (about physics and about students) abductive diagnosis requires.
What is the main advantage of using abductive diagnosis over consistency-based diagnosis in this domain?
What is the main advantage of consistency-based diagnosis over abductive diagnosis in this domain?
Explain what knowledge (about physics and about students) a belief network model requires.
What is the main advantage of using belief networks (over using abductive diagnosis or consistency-based diagnosis) in this domain?
What is the main advantage of using abductive diagnosis or consistency-based diagnosis compared to using belief networks in this domain?

Suppose you are given the following scenario. There are four possible diseases a particular patient may have: p, q, r and s. p causes spots. q causes spots. Fever could be caused by one (or more) of q, r or s. The patient has spots and fever.

Show how to represent this theory using Horn clauses.
Show how to diagnose this patient using abduction. Show clearly the query and the resulting answer(s).
Suppose also that p and s cannot occur together. Show how that changes your theory from part (a). Show how to diagnose the patient using abduction with the new theory. Show clearly the query and the resulting answer(s).

CPSC 422

Practice Final Exam

April 2005