9.2 One-Off Decisions

9.2.1 Single-Stage Decision Networks

A decision tree is a state-based representation where each path from a root to a leaf corresponds to a state. It is, however, often more natural and more efficient to represent and reason directly in terms of features, represented as variables.

A single-stage decision network is an extension of a belief network with three kinds of nodes:

  • Decision nodes, drawn as rectangles, represent decision variables. The agent gets to choose a value for each decision variable. Where there are multiple decision variables, we assume there is a total ordering of the decision nodes, and the decision nodes before a decision node D in the total ordering are the parents of D.

  • Chance nodes, drawn as ovals, represent random variables. These are the same as the nodes in a belief network. Each chance node has an associated domain and a conditional probability of the variable, given its parents. As in a belief network, the parents of a chance node represent conditional dependence: a variable is independent of its non-descendants, given its parents. In a decision network, both chance nodes and decision nodes can be parents of a chance node.

  • A utility node, drawn as a diamond, represents the utility. The parents of the utility node are the variables on which the utility depends. Both chance nodes and decision nodes can be parents of the utility node.

Each chance variable and each decision variable has a domain. There is no domain for the utility node. Whereas the chance nodes represent random variables and the decision nodes represent decision variables, there is no utility variable.

Associated with a decision network is a conditional probability for each chance node given its parents (as in a belief network) and a utility as a function of the utility node’s parents. In the specification of the network, there are no functions associated with a decision (although the algorithm will construct a function).

Which_way Accident Value
short true 0.2
short false 0.8
long true 0.01
long false 0.99
Wear_pads Which_way Accident Utility
true short true 35
true short false 95
true long true 30
true long false 75
false short true 3
false short false 100
false long true 0
false long false 80
Figure 9.7: Single-stage decision network for the delivery robot
Example 9.10.

Figure 9.7 gives a decision network representation of Example 9.6. There are two decisions to be made: which way to go and whether to wear padding. Whether the agent has an accident only depends on which way they go. The utility depends on all three variables.

This network requires two factors: a factor representing the conditional probability, P(AccidentWhich_way), and a factor representing the utility as a function of Which_way, Accident, and Wear_pads. Tables for these factors are shown in Figure 9.7.

A policy for a single-stage decision network is an assignment of a value to each decision variable. Each policy has an expected utility. An optimal policy is a policy whose expected utility is maximal. That is, it is a policy such that no other policy has a higher expected utility. The value of a decision network is the expected utility of an optimal policy for the network.

1:procedure VE_SSDN(DN)
2:      Inputs
3:            DN a single stage decision network       
4:      Output
5:            An optimal policy and the expected utility of that policy.
6:      Prune all nodes that are not ancestors of the utility node.
7:      Sum out all chance nodes.
8:      – at this stage there is a single factor F that was derived from utility
9:      Let v be the maximum value in F
10:      Let d be an assignment that gives the maximum value
11:      return d,v
Figure 9.8: Variable elimination for a single-stage decision network

Figure 9.8 shows how variable elimination is used to find an optimal policy in a single-stage decision network. After pruning irrelevant nodes and summing out all random variables, there will be a single factor that represents the expected utility for each combination of decision variables. This factor does not have to be a factor on all of the decision variables; however, those decision variables that are not included are not relevant to the decision.

Example 9.11.

Consider running VE_SSDN on the decision network of Figure 9.7. No nodes are able to be pruned, so it sums out the only random variable, Accident. To do this, it multiplies both factors because they both contain Accident, and sums out Accident, giving the following factor:

Wear_pads Which_way Value
true short 0.2*35+0.8*95=83
true long 0.01*30+0.99*75=74.55
false short 0.2*3+0.8*100=80.6
false long 0.01*0+0.99*80=79.2

Thus, the policy with the maximum value – the optimal policy – is to take the short way and wear pads, with an expected utility of 83.