9.3 Sequential Decisions

9.3.3 Variable Elimination for Decision Networks

Fortunately, an agent does not have to enumerate all of the policies; variable elimination (VE) can be adapted to find an optimal policy. The idea is first to consider the last decision, find an optimal decision for each value of its parents, and produce a factor of these maximum values. This results in a new decision network, with one less decision, that can be solved recursively.

1:procedure VE_DN(DN):
2:      Inputs
3:            DN a decision network       
4:      Output
5:            An optimal policy and its expected utility
6:      Local
7:            DFs: a set of decision functions, initially empty
8:            Fs: a set of factors       
9:      Remove all variables that are not ancestors of the utility node
10:      Create a factor in Fs for each conditional probability
11:      Create a factor in Fs for the utility
12:      while there are decision nodes remaining do
13:            Sum out each random variable that is not a parent of a decision node
14:            Let D be the last decision remaining
15:             D is only in a factor F(D,V1,Vk) where V1Vk are parents of D
16:            Add maxDF to Fs.
17:            Add argmaxDF to DFs.       
18:      Sum out all remaining random variables
19:      Return DFs and the product of remaining factors
Figure 9.13: Variable elimination for decision networks

Figure 9.13 shows how to use variable elimination for decision networks. Essentially, it computes the expected utility of an optimal decision. It eliminates the random variables that are not parents of a decision node by summing them out according to some elimination ordering. The ordering of the random variables being eliminated does not affect correctness and so it can be chosen for efficiency.

After eliminating all of the random variables that are not parents of a decision node, in a no-forgetting decision network, there must be one decision variable D that is in a factor F where all of the variables, other than D, in F are parents of D. This decision D the last decision in the ordering of decisions.

To eliminate that decision node, VE_DN chooses the values for the decision that result in the maximum utility. This maximization creates a new factor on the remaining variables and a decision function for the decision variable being eliminated. This decision function created by maximizing is one of decision functions in an optimal policy.

Example 9.19.

In Example 9.13, there are three initial factors representing P(Weather), P(ForecastWeather), and u(Weather,Umbrella). First, it eliminates Weather by multiplying all three factors and summing out Weather, giving a factor on Forecast and Umbrella,

Forecast Umbrella Value
sunny take_it 12. 95
sunny leave_it 49. 0
cloudy take_it 8. 05
cloudy leave_it 14. 0
rainy take_it 14. 0
rainy leave_it 7. 0

To maximize over Umbrella, for each value of Forecast, VE_DN selects the value of Umbrella that maximizes the value of the factor. For example, when the forecast is sunny, the agent should leave the umbrella at home for a value of 49.0.

VE_DN constructs an optimal decision function for Umbrella by selecting a value of Umbrella that results in the maximum value for each value of Forecast:

Forecast Umbrella
sunny leave_it
cloudy leave_it
rainy take_it

It also creates a new factor that contains the maximal value for each value of Forecast:

Forecast Value
sunny 49.0
cloudy 14.0
rainy 14.0

It now sums out Forecast from this factor, which gives the value 77.0. This is the expected value of the optimal policy.

Example 9.20.

Consider Example 9.15. Before summing out any variables it has the following factors:

MeaningFactorP(Tampering)f0(Tampering)P(Fire)f1(Fire)P(AlarmTampering,Fire)f2(Tampering,Fire,Alarm)P(SmokeFire)f3(Fire,Smoke)P(LeavingAlarm)f4(Alarm,Leaving)P(ReportLeaving)f5(Leaving,Report)P(See_smokeCheck_smoke,Smoke)f6(Smoke,See_smoke,Check_smoke)u(Fire,Check_smoke,Call)f7(Fire,Check_smoke,Call)

The expected utility is the product of the probability and the utility, as long as the appropriate actions are chosen.

VE_DN sums out the random variables that are not parents of a decision node. Thus, it sums out Tampering, Fire, Alarm, Smoke, and Leaving. After these have been eliminated, there is a single factor, part of which (to two decimal places) is:

Report See_smoke Check_smoke Call Value
true true yes yes -1. 33
true true yes no -29. 30
true true no yes 0
true true no no 0
true false yes yes -4. 86
true false yes no -3. 68

From this factor, an optimal decision function can be created for Call by selecting a value for Call that maximizes Value for each assignment to Report, See_smoke, and Check_smoke.

Consider the case when Report=true, See_smoke=true, and Check_smoke=yes. The maximum of -1.33 and -29.3 is -1.33, so for this case, the optimal action is Call=yes with value -1.33. This maximization is repeated for the other values of Report, See_smoke and Check_smoke.

An optimal decision function for Call is

Report See_smoke Check_smoke Call
true true yes yes
true true no yes
true false yes no

The value for Call when Report=true, See_smoke=true and Check_smoke=no is arbitrary. It does not matter what the agent plans to do in this situation, because the situation never arises. The algorithm does not need to treat this as a special case.

The factor resulting from maximizing Call contains the maximum values for each combination of Report, See_smoke, and Check_smoke:

Report See_smoke Check_smoke Value
true true yes -1. 33
true true no 0
true false yes -3. 68

Summing out See_smoke gives the factor

Report Check_smoke Value
true yes -5. 01
true no -5. 65
false yes -23. 77
false no -17. 58

Maximizing Check_smoke for each value of Report gives the decision function

Report Check_smoke
true yes
false no

and the factor

Report Value
true -5. 01
false -17. 58

Summing out Report gives the expected utility of -22.60 (taking into account rounding errors).

Thus, the policy returned can be seen as the rules

check_smokereport.
callsee_smoke.
callreport¬check_smoke¬see_smoke.

The last of these rules is never used because the agent following the optimal policy does check for smoke if there is a report. It remains in the policy because VE_DN has not determined an optimal policy for Check_smoke when it is optimizing Call.

Note also that, in this case, even though checking for smoke has an immediate negative reward, checking for smoke is worthwhile because the information obtained is valuable.

The following example shows how the factor containing a decision variable can contain a subset of its parents when the VE algorithm optimizes the decision.

Example 9.21.

Consider Example 9.13, but with an extra arc from Weather to Umbrella. That is, the agent gets to observe both the weather and the forecast. In this case, there are no random variables to sum out, and the factor that contains the decision node and a subset of its parents is the original utility factor. It can then maximize Umbrella, giving the decision function and the factor:

Weather Umbrella
norain leave_it
rain take_it
Weather Value
norain 100
rain 70

Note that the forecast is irrelevant to the decision. Knowing the forecast does not give the agent any useful information. Summing out Forecast gives a factor where all of the values are 1.

Summing out Weather, where P(Weather=norain)=0.7, gives the expected utility 0.7*100+0.3*70=91.