Belief Network Inference

## Belief Network Inference

The task of probabilistic inference is to determine the posterior probability of a variable or variables given some observations. In this section we outline a simple algorithm for belief net inference called variable elimination (VE) [33][34] or bucket elimination for belief assessment (BEBA) [11], that is based on the ideas of nonlinear dynamic programming [2]4 and is closely related to SPI [28]. This is a query oriented algorithm that exploits the conditional independence inherent in the network structure for efficient inference, similar to how clique tree propagation exploits the structure [20][18].

Suppose we observe variables E1,...,Es have corresponding values o1...os. We want to determine the posterior probability of variable X, the query variable, given evidence E1=o1 &...&Es=os:

P(X|E1=o1 &...&Es=os)= (P(X &E1=o1 &...&Es=os))/(P(E1=o1 &...&Es=os))
The denominator, P(E1=o1 &...&Es=os), is a normalizing factor:
P(E1=o1 &...&Es=os) = SUMv in dom(X) P(X=v &E1=o1 &...&Es=os)
The problem of probabilistic inference can thus be reduced to the problem of computing the probability of conjunctions.

Let Y = {Y1,...,Yk} be the non-query, non-observed variables (i.e., Y = {X1,...,Xn}-{X}-{E1,...,Es}). To compute the marginal distribution, we sum out the Yi's:

 P(X&E1=o1 &...&Es=os) = SUMYk···SUMY1 P(X1,...,Xn){E1=o1 &...&Es=os} = SUMYk···SUMY1 PRODi = 1n P(Xi|piXi){E1=o1 &...&Es=os}
where the subscripted probabilities mean that the associated variables are assigned the corresponding values.

Thus probabilistic inference reduces to the problem of summing out variables from a product of functions. To solve this efficiently we use the distribution law that we learned in high school: to compute a sum of products such as xy+xz efficiently, we distribute out the common factors (which here is x) which results in x(y+z). This is the essence of the VE algorithm. We call the elements multiplied together "factors" because of the use of the term in mathematics. Initially the factors represent the conditional probability tables, but the intermediate factors are just functions on variables that are created by adding and multiplying factors.

A factor on variables V1,...,Vd is a representation of a function from dom(V1) ×...×dom(Vd) into the real numbers.

Suppose that the Yi's are ordered according to some elimination ordering. We sum out the variables one at at time.

To sum out a variable Yi from a product, we distribute all of the factors that don't involve Yi out of the sum. Suppose f1,...,fk are some functions of the variables that are multiplied together (initially these are the conditional probabilities), then

SUMYi f1...fk = f1...fm SUMYi fm+1...fk
where f1...fm are those functions that don't involve Yi, and fm+1...fk are those that do involve Yi. We explicitly construct a representation for the new function SUMYi fm+1...fk, and continue summing out the remaining variables. After all the Yi's have been summed out, the result is a function on X that is proportional to X's posterior distribution.
 To compute P(X|E1=o1 &...&Es=os) Let F be the factors obtained from the original conditional probabilities. 1. Replace each f in F that involves some Ei with f{E1=o1 ,..., Es=os}. 2. While there is a factor involving a non-query variable Select non-query variable Y to eliminate Set F= eliminate(Y,F). 3. Return renormalize(F)
 Procedure eliminate(Y,F): Partition F into {f1,..., fm} that don't contain Y and {fm+1,..., fr} that do contain Y Compute f = SUMY fm+1×t...×tfr Return {f1,...,fm,f}
 Procedure renormalize({f1,...,fr}): Compute f = f1×t...×tfr Compute c = SUMX f % c is normalizing constant Return f/c % divide each element of f by c

The tabular VE algorithm

In the tabular implementation of the VE algorithm (Figure *), a function of d discrete variables V1,...,Vd, is represented as a d-dimensional table (which can be implemented, for example, as a d-dimensional array, as a tree of depth d, or, as in our implementation, as a 1-dimensional array based on a lexicographic ordering on the variables). If f is such a table, let variables(f) = {V1,...,Vd}. We sometimes write f as f[V1,...,Vd] to make the variables explicit. f is said to involve Vi if Vi in variables(f).

There are three primitive operations on tables: setting variables, forming the product of tables, and summing a variable from a table.

Definition. Suppose C is a set of variables, c is an assignment C = v, and f is a factor on variables X. Let Y = X-C, let Z = X intersect C, and let Z = v' be the assignment of values to Z that assigns the same values to elements of Z as c does. Define set(f,c) be the factor on Y given by:

set(f,c)(Y) = f(Y,Z=v').
That is, set(f,c) is a function of Y, the variables of f that are not in c, that is like f, but with some values already assigned. Note that, as a special case of this, if c doesn't involve any variable in f then set(f,c)=f.

Example. Consider the factor f(A,B,C,D,E) defined by the table of Figure *. Some examples of the value of this function are f(a,b,c,d,e) = 0.55, and f(`~`a,b,c,d,`~`e) = 1-0.08 = 0.92. set(f,`~`a&b &e) is a function of C and D defined by the table:

 C D value c d 0.08 c `~`d 0.08 `~`c d 0.025 `~`c `~`d 0.5

Definition. The product of tables f1 and f2, written f1×tf2 is a table on the union of the variables in f1 and f2 (i.e., variables(f1×tf2) = variables(f1) union variables(f2)) defined by:

(f1×tf2)(X,Y,Z) = f1(X,Y)f2(Y,Z)
where Y is variables(f1) intersect variables(f2), X is variables(f1)- variables(f2), and Z is variables(f2)- variables(f1).

Note that ×t is associative and commutative.

To construct the product of tables, fm+1×t···×tfk, we union all of the variables in fm+1...fk, say these are X1,...,Xr. Then we construct an r-dimensional table so there is an entry in the table for each combination v1,...,vr where vi in dom(Xi). The value for the entry corresponding to v1,...,vr is obtained by multiplying the values obtained from each fi applied to the projection of v1,...,vr onto the variables of fi.

Definition. The summing out of variable Y from table f, written SUMY f is the table with variables Z = variables(f)-{Y} such that5

(SUMY f)(Z) = SUMvi in dom(Y) f(Z&Y=vi)
where dom(Y) = {v1,...,vs}.

Thus, to sum out Y, we reduce the dimensionality of the table by one (removing the Y dimension), the values in the resulting table are obtained by adding the values of the table for each value of Y.

Example. Consider eliminating B from the factors of Example * (representing the belief network of Figure *), where all of the variables are Boolean. The factors that contain B, namely those factors that represent P(E|ABCD) and P(B|YZ), are removed from the set of factors. We construct a factor f1(A,B,C,D,E,Y,Z)=P(E|A,B,C,D) ×tP(B|Y,Z), thus, for example,

 f1(a,b,c,d,e,y,z) = P(e|a&b&c&d) P(b|y&z) f1(`~`a,b,c,d,e,y,z) = P(e|`~`a&b&c&d) P(b|y&z) f1(`~`a,`~`b,c,d,e,y,z) = P(e|`~`a&`~`b&c&d) P(`~`b|y&z) f1(a,`~`b,c,d,e,y,z) = P(e|a&`~`b&c&d) P(`~`b|y&z)
and similarly for the other values of A...Z. We then need to sum out B from f1, producing f2(A,C,D,E,Y,Z) where, for example,
f2(a,c,d,e,y,z) = f1(a,b,c,d,e,y,z)+f1(a,`~`b,c,d,e,y,z).
f2 is then added to the set of factors. Note that the construction of f1 is for exposition only; we don't necessarily have to construct a table for it explicitly.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

 Belief Network Inference