10.3 Learning Belief Networks

10.3.1 Learning the Probabilities

The simplest case occurs when a learning agent is given the structure of the model and all the variables have been observed. The agent must learn the conditional probabilities, P(Xiparents(Xi)) for each variable Xi. Learning the conditional probabilities is an instance of supervised learning, where Xi is the target feature, and the parents of Xi are the input features.

For cases with few parents, each conditional probability can be learned separately using the training examples and prior knowledge, such as pseudocounts.

Model Data Probabilities
ABCDEtfttffttttttftf P(A)P(B)P(EA,B)P(CE)P(DE)
Figure 10.7: From the model and the data, learn the probabilities
Example 10.11.

Figure 10.7 shows a typical example. We are given the model and the data, and we must infer the probabilities.

For example, one of the elements of P(EAB) is

P(E=tA=tB=f)=n1+c1n0+n1+c0+c1

where n1 is the number of cases where E=tA=tB=f, and c10 is the corresponding pseudocount that is provided before any data is observed. Similarly, n0 is the number of cases where E=fA=tB=f, and c00 is the corresponding pseudocount.

If a variable has many parents, using counts and pseudocounts can suffer from overfitting. Overfitting is most severe when there are few examples for some of the combinations of the parent variables. In that case, the supervised learning techniques of Chapter 7 could be used. Decision trees can be used for arbitrary discrete variables. Logistic regression and neural networks can represent a conditional probability of a binary variable given its parents. For non-binary discrete variables, indicator variables may be used.