foundations of computational agents
A decision tree is a state-based representation where each path from a root to a leaf corresponds to a state. It is, however, often more natural and more efficient to represent and reason directly in terms of features, represented as variables.
A single-stage decision network is an extension of a belief network with three kinds of nodes:
Decision nodes, drawn as rectangles, represent decision variables. The agent gets to choose a value for each decision variable. Where there are multiple decision variables, we assume there is a total ordering of the decision nodes, and the decision nodes before a decision node $D$ in the total ordering are the parents of $D$.
Chance nodes, drawn as ovals, represent random variables. These are the same as the nodes in a belief network. Each chance node has an associated domain and a conditional probability of the variable, given its parents. As in a belief network, the parents of a chance node represent conditional dependence: a variable is independent of its non-descendants, given its parents. In a decision network, both chance nodes and decision nodes can be parents of a chance node.
A utility node, drawn as a diamond, represents the utility. The parents of the utility node are the variables on which the utility depends. Both chance nodes and decision nodes can be parents of the utility node.
Each chance variable and each decision variable has a domain. There is no domain for the utility node. Whereas the chance nodes represent random variables and the decision nodes represent decision variables, there is no utility variable.
Associated with a decision network is a conditional probability for each chance node given its parents (as in a belief network) and a utility as a function of the utility node’s parents. In the specification of the network, there are no functions associated with a decision (although the algorithm will construct a function).
$Which\mathrm{\_}way$ | $Accident$ | Value |
---|---|---|
$short$ | $true$ | 0.2 |
$short$ | $false$ | 0.8 |
$long$ | $true$ | 0.01 |
$long$ | $false$ | 0.99 |
$Wear\mathrm{\_}pads$ | $Which\mathrm{\_}way$ | $Accident$ | Utility |
---|---|---|---|
$true$ | $short$ | $true$ | 35 |
$true$ | $short$ | $false$ | 95 |
$true$ | $long$ | $true$ | 30 |
$true$ | $long$ | $false$ | 75 |
$false$ | $short$ | $true$ | 3 |
$false$ | $short$ | $false$ | 100 |
$false$ | $long$ | $true$ | 0 |
$false$ | $long$ | $false$ | 80 |
Figure 9.7 gives a decision network representation of Example 9.6. There are two decisions to be made: which way to go and whether to wear padding. Whether the agent has an accident only depends on which way they go. The utility depends on all three variables.
This network requires two factors: a factor representing the conditional probability, ${P}{\mathrm{(}}{A}{c}{c}{i}{d}{e}{n}{t}{\mathrm{\mid}}{W}{h}{i}{c}{h}{\mathrm{\_}}{w}{a}{y}{\mathrm{)}}$, and a factor representing the utility as a function of ${W}{\mathit{}}{h}{\mathit{}}{i}{\mathit{}}{c}{\mathit{}}{h}{\mathit{}}{\mathrm{\_}}{\mathit{}}{w}{\mathit{}}{a}{\mathit{}}{y}$, ${A}{\mathit{}}{c}{\mathit{}}{c}{\mathit{}}{i}{\mathit{}}{d}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{t}$, and ${W}{\mathit{}}{e}{\mathit{}}{a}{\mathit{}}{r}{\mathit{}}{\mathrm{\_}}{\mathit{}}{p}{\mathit{}}{a}{\mathit{}}{d}{\mathit{}}{s}$. Tables for these factors are shown in Figure 9.7.
A policy for a single-stage decision network is an assignment of a value to each decision variable. Each policy has an expected utility. An optimal policy is a policy whose expected utility is maximal. That is, it is a policy such that no other policy has a higher expected utility. The value of a decision network is the expected utility of an optimal policy for the network.
Figure 9.8 shows how variable elimination is used to find an optimal policy in a single-stage decision network. After pruning irrelevant nodes and summing out all random variables, there will be a single factor that represents the expected utility for each combination of decision variables. This factor does not have to be a factor on all of the decision variables; however, those decision variables that are not included are not relevant to the decision.
Consider running ${V}{\mathit{}}{E}{\mathit{}}{\mathrm{\_}}{\mathit{}}{S}{\mathit{}}{S}{\mathit{}}{D}{\mathit{}}{N}$ on the decision network of Figure 9.7. No nodes are able to be pruned, so it sums out the only random variable, ${A}{\mathit{}}{c}{\mathit{}}{c}{\mathit{}}{i}{\mathit{}}{d}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{t}$. To do this, it multiplies both factors because they both contain ${A}{\mathit{}}{c}{\mathit{}}{c}{\mathit{}}{i}{\mathit{}}{d}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{t}$, and sums out ${A}{\mathit{}}{c}{\mathit{}}{c}{\mathit{}}{i}{\mathit{}}{d}{\mathit{}}{e}{\mathit{}}{n}{\mathit{}}{t}$, giving the following factor:
${W}{}{e}{}{a}{}{r}{}{\mathrm{\_}}{}{p}{}{a}{}{d}{}{s}$ | ${W}{}{h}{}{i}{}{c}{}{h}{}{\mathrm{\_}}{}{w}{}{a}{}{y}$ | Value |
---|---|---|
${t}{}{r}{}{u}{}{e}$ | ${s}{}{h}{}{o}{}{r}{}{t}$ | ${0.2}{*}{35}{+}{0.8}{*}{95}{=}{83}$ |
${t}{}{r}{}{u}{}{e}$ | ${l}{}{o}{}{n}{}{g}$ | ${0.01}{*}{30}{+}{0.99}{*}{75}{=}{74.55}$ |
${f}{}{a}{}{l}{}{s}{}{e}$ | ${s}{}{h}{}{o}{}{r}{}{t}$ | ${0.2}{*}{3}{+}{0.8}{*}{100}{=}{80.6}$ |
${f}{}{a}{}{l}{}{s}{}{e}$ | ${l}{}{o}{}{n}{}{g}$ | ${0.01}{*}{0}{+}{0.99}{*}{80}{=}{79.2}$ |
Thus, the policy with the maximum value – the optimal policy – is to take the short way and wear pads, with an expected utility of 83.