Efficient computation of marginal probabilities

Next: Learning Up: Inference Previous: Inference

Efficient computation of marginal probabilities

Once we have the complete joint probability distribution (JPD), we can answer all possible inference queries by marginalization (summing out over irrelevant variables), as illustrated above. However, the JPD has size , where is the number of nodes, and we have assumed each node can have 2 states. Hence summing over the JPD takes exponential time.

However, we can exploit the factored nature of the JPD to compute marginals more efficiently. The key idea is "push sums in" as far as possible when summing (marginalizing) out irrelevant terms, e.g.,

Notice that, as we perform the innermost sums, we create new terms, which need to be summed over in turn e.g.,

where

Continuing in this way,

where

A convenient way to handle the book-keeping arising from all these intermediate terms is by using the bucket elimination algorithm.

The amount of work we perform when computing a marginal is bounded by the size of the largest term that we encounter. Choosing a summation (elimination) ordering to minimize this is NP-hard, although greedy algorithms work well in practice (see Kjaerulff, 1990).

If we wish to compute several marginals at the same time, we can use Dynamic Programming to avoid repeated computation. If the underlying undirected graph of the BN is acyclic (i.e. a tree), we can use a local message passing algorithm due to Pearl (see Peot and Shachter, 1991). In the case of DBNs with the structure shown above (which is acyclic), Pearl's algorithm is equivalent to the forwards-backwards algorithm for HMMs, or to the Kalman-RTS smoother for LDSs (see Ghahramani, 1997).

If the BN has undirected cycles (as in the sprinkler example), we need to first create an auxiliary data-structure, called a join tree, and then apply the DP algorithm to the join tree (see Jensen, 1996 and Shafer and Lauritzen, 1996). The nodes of a join tree are subsets of nodes of the original BN, and satisfy the following property: if x is a member of join tree nodes i and j, then x must be a member of every node on the path between i and j. This property ensures that local propogation of information leads to global consistency.

We can create the join tree by moralizing and triangulating the BN graph (see Kjaerulff, 1990), and then finding the maximum cliques. Triangulation ensures that there are no chordless cycles of length greater than 4. Intuitively, triangulation connects together those nodes that feature in a common term when summing out; in the example above, we connect C to W and get the clique CWS, corresponding to the intermediate term . The process of moralization connects togther unmarried parents who share a common child: this ensures there will be a clique that contains terms of the form .

Next: Learning Up: Inference Previous: Inference

Sun Oct 18 12:11:28 PDT 1998