Exploiting Contextual Independence In Probabilistic Inference -- Belief Network Inference

Belief Network Inference

Belief Network Inference

The task of probabilistic inference is to determine the posterior probability of a variable or variables given some observations. In this section we outline a simple algorithm for belief net inference called variable elimination (VE) [33][34] or bucket elimination for belief assessment (BEBA) [11], that is based on the ideas of nonlinear dynamic programming [2]⁴ and is closely related to SPI [28]. This is a query oriented algorithm that exploits the conditional independence inherent in the network structure for efficient inference, similar to how clique tree propagation exploits the structure [20][18].

Suppose we observe variables E₁,...,E_s have corresponding values o₁...o_s. We want to determine the posterior probability of variable X, the query variable, given evidence E₁=o₁ &...&E_s=o_s:

P(X|E₁=o₁ &...&E_s=o_s)= (P(X &E₁=o₁ &...&E_s=o_s))/(P(E₁=o₁ &...&E_s=o_s))

The denominator, P(E₁=o₁ &...&E_s=o_s), is a normalizing factor:

P(E₁=o₁ &...&E_s=o_s) = SUM_{v in dom(X)} P(X=v &E₁=o₁ &...&E_s=o_s)

The problem of probabilistic inference can thus be reduced to the problem of computing the probability of conjunctions.

Let Y = {Y₁,...,Y_k} be the non-query, non-observed variables (i.e., Y = {X₁,...,X_n}-{X}-{E₁,...,E_s}). To compute the marginal distribution, we sum out the Y_i's:

P(X&E₁=o₁ &...&E_s=o_s)

= SUM_{Y_k}···SUM_Y₁ P(X₁,...,X_n)_{{E₁=o₁
&...&E_s=o_s}}

= SUM_{Y_k}···SUM_Y₁ PROD_{i = 1}ⁿ P(X_i|pi_{X_i})_{{E₁=o₁ &...&E_s=o_s}}

where the subscripted probabilities mean that the associated variables are assigned the corresponding values.

Thus probabilistic inference reduces to the problem of summing out variables from a product of functions. To solve this efficiently we use the distribution law that we learned in high school: to compute a sum of products such as xy+xz efficiently, we distribute out the common factors (which here is x) which results in x(y+z). This is the essence of the VE algorithm. We call the elements multiplied together "factors" because of the use of the term in mathematics. Initially the factors represent the conditional probability tables, but the intermediate factors are just functions on variables that are created by adding and multiplying factors.

A factor on variables V₁,...,V_d is a representation of a function from dom(V₁) ×...×dom(V_d) into the real numbers.

Suppose that the Y_i's are ordered according to some elimination ordering. We sum out the variables one at at time.

To sum out a variable Y_i from a product, we distribute all of the factors that don't involve Y_i out of the sum. Suppose f₁,...,f_k are some functions of the variables that are multiplied together (initially these are the conditional probabilities), then

SUM_{Y_i} f₁...f_k = f₁...f_m SUM_{Y_i} f_m+1...f_k

where f₁...f_m are those functions that don't involve Y_i, and f_m+1...f_k are those that do involve Y_i. We explicitly construct a representation for the new function SUM_{Y_i} f_m+1...f_k, and continue summing out the remaining variables. After all the Y_i's have been summed out, the result is a function on X that is proportional to X's posterior distribution.

To compute P(X|E₁=o₁ &...&E_s=o_s)

Let F be the factors obtained from the original conditional probabilities.

Replace each f in F that involves some E_i with f_{{E₁=o₁ ,..., E_s=o_s}}.

While there is a factor involving a non-query variable

Select non-query variable Y to eliminate

Set F= eliminate(Y,F).

Return renormalize(F)

Procedure eliminate(Y,F):

Partition F into

{f₁,..., f_m} that don't contain Y and

{f_m+1,..., f_r} that do contain Y

Compute f = SUM_Y f_m+1×_t...×_tf_r

Return {f₁,...,f_m,f}

Procedure renormalize({f₁,...,f_r}):

Compute f = f₁×_t...×_tf_r

Compute c = SUM_X f

% c is normalizing constant

Return f/c

% divide each element of f by c

The tabular VE algorithm

In the tabular implementation of the VE algorithm (Figure *), a function of d discrete variables V₁,...,V_d, is represented as a d-dimensional table (which can be implemented, for example, as a d-dimensional array, as a tree of depth d, or, as in our implementation, as a 1-dimensional array based on a lexicographic ordering on the variables). If f is such a table, let variables(f) = {V₁,...,V_d}. We sometimes write f as f[V₁,...,V_d] to make the variables explicit. f is said to involve V_i if V_i in variables(f).

There are three primitive operations on tables: setting variables, forming the product of tables, and summing a variable from a table.

Definition. Suppose C is a set of variables, c is an assignment C = v, and f is a factor on variables X. Let Y = X-C, let Z = X intersect C, and let Z = v' be the assignment of values to Z that assigns the same values to elements of Z as c does. Define set(f,c) be the factor on Y given by:

set(f,c)(Y) = f(Y,Z=v').

That is, set(f,c) is a function of Y, the variables of f that are not in c, that is like f, but with some values already assigned. Note that, as a special case of this, if c doesn't involve any variable in f then set(f,c)=f.

Example. Consider the factor f(A,B,C,D,E) defined by the table of Figure *. Some examples of the value of this function are f(a,b,c,d,e) = 0.55, and f(~a,b,c,d,~e) = 1-0.08 = 0.92. set(f,~a&b &e) is a function of C and D defined by the table:

C	D	value
c	d	0.08
c	`~`d	0.08
`~`c	d	0.025
`~`c	`~`d	0.5

Definition. The product of tables f₁ and f₂, written f₁×_tf₂ is a table on the union of the variables in f₁ and f₂ (i.e., variables(f₁×_tf₂) = variables(f₁) union variables(f₂)) defined by:

(f₁×_tf₂)(X,Y,Z) = f₁(X,Y)f₂(Y,Z)

where Y is variables(f₁) intersect variables(f₂), X is variables(f₁)- variables(f₂), and Z is variables(f₂)- variables(f₁).

Note that ×_t is associative and commutative.

To construct the product of tables, f_m+1×_t···×_tf_k, we union all of the variables in f_m+1...f_k, say these are X₁,...,X_r. Then we construct an r-dimensional table so there is an entry in the table for each combination v₁,...,v_r where v_i in dom(X_i). The value for the entry corresponding to v₁,...,v_r is obtained by multiplying the values obtained from each f_i applied to the projection of v₁,...,v_r onto the variables of f_i.

Definition. The summing out of variable Y from table f, written SUM_Y f is the table with variables Z = variables(f)-{Y} such that⁵

(SUM_Y f)(Z) = SUM_{v_i in dom(Y)} f(Z&Y=v_i)

where dom(Y) = {v₁,...,v_s}.

Thus, to sum out Y, we reduce the dimensionality of the table by one (removing the Y dimension), the values in the resulting table are obtained by adding the values of the table for each value of Y.

Example. Consider eliminating B from the factors of Example * (representing the belief network of Figure *), where all of the variables are Boolean. The factors that contain B, namely those factors that represent P(E|ABCD) and P(B|YZ), are removed from the set of factors. We construct a factor f₁(A,B,C,D,E,Y,Z)=P(E|A,B,C,D) ×_tP(B|Y,Z), thus, for example,

f₁(a,b,c,d,e,y,z) = P(e|a&b&c&d) P(b|y&z)

f₁(~a,b,c,d,e,y,z) = P(e|~a&b&c&d) P(b|y&z)

f₁(~a,~b,c,d,e,y,z) = P(e|~a&~b&c&d) P(~b|y&z)

f₁(a,~b,c,d,e,y,z) = P(e|a&~b&c&d) P(~b|y&z)

and similarly for the other values of A...Z. We then need to sum out B from f₁, producing f₂(A,C,D,E,Y,Z) where, for example,
f₂(a,c,d,e,y,z) = f₁(a,b,c,d,e,y,z)+f₁(a,~b,c,d,e,y,z).
f₂ is then added to the set of factors. Note that the construction of f₁ is for exposition only; we don't necessarily have to construct a table for it explicitly.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Belief Network Inference

P(X&E₁=o₁ &...&E_s=o_s)
	=	SUM_{Y_k}···SUM_Y₁ P(X₁,...,X_n)_{{E₁=o₁ &...&E_s=o_s}}
	=	SUM_{Y_k}···SUM_Y₁ PROD_{i = 1}ⁿ P(X_i\|pi_{X_i})_{{E₁=o₁ &...&E_s=o_s}}