Belief Network Inference |
Suppose we observe variables E_{1},...,E_{s} have corresponding values o_{1}...o_{s}. We want to determine the posterior probability of variable X, the query variable, given evidence E_{1}=o_{1} &...&E_{s}=o_{s}:
P(X|E_{1}=o_{1} &...&E_{s}=o_{s})= (P(X &E_{1}=o_{1} &...&E_{s}=o_{s}))/(P(E_{1}=o_{1} &...&E_{s}=o_{s}))The denominator, P(E_{1}=o_{1} &...&E_{s}=o_{s}), is a normalizing factor:
P(E_{1}=o_{1} &...&E_{s}=o_{s}) = SUM_{v in dom(X)} P(X=v &E_{1}=o_{1} &...&E_{s}=o_{s})The problem of probabilistic inference can thus be reduced to the problem of computing the probability of conjunctions.
Let Y = {Y_{1},...,Y_{k}} be the non-query, non-observed variables (i.e., Y = {X_{1},...,X_{n}}-{X}-{E_{1},...,E_{s}}). To compute the marginal distribution, we sum out the Y_{i}'s:
where the subscripted probabilities mean that the associated variables are assigned the corresponding values.
P(X&E_{1}=o_{1} &...&E_{s}=o_{s}) = SUM_{Yk}···SUM_{Y1} P(X_{1},...,X_{n})_{{E1=o1 &...&Es=os}} = SUM_{Yk}···SUM_{Y1} PROD_{i = 1}^{n} P(X_{i}|pi_{Xi})_{{E1=o1 &...&Es=os}}
Thus probabilistic inference reduces to the problem of summing out variables from a product of functions. To solve this efficiently we use the distribution law that we learned in high school: to compute a sum of products such as xy+xz efficiently, we distribute out the common factors (which here is x) which results in x(y+z). This is the essence of the VE algorithm. We call the elements multiplied together "factors" because of the use of the term in mathematics. Initially the factors represent the conditional probability tables, but the intermediate factors are just functions on variables that are created by adding and multiplying factors.
A factor on variables V_{1},...,V_{d} is a representation of a function from dom(V_{1}) ×...×dom(V_{d}) into the real numbers.
Suppose that the Y_{i}'s are ordered according to some elimination ordering. We sum out the variables one at at time.
To sum out a variable Y_{i} from a product, we distribute all of the factors that don't involve Y_{i} out of the sum. Suppose f_{1},...,f_{k} are some functions of the variables that are multiplied together (initially these are the conditional probabilities), then
SUM_{Yi} f_{1}...f_{k} = f_{1}...f_{m} SUM_{Yi} f_{m+1}...f_{k}where f_{1}...f_{m} are those functions that don't involve Y_{i}, and f_{m+1}...f_{k} are those that do involve Y_{i}. We explicitly construct a representation for the new function SUM_{Yi} f_{m+1}...f_{k}, and continue summing out the remaining variables. After all the Y_{i}'s have been summed out, the result is a function on X that is proportional to X's posterior distribution.
To compute P(X|E_{1}=o_{1} &...&E_{s}=o_{s}) | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Let F be the factors obtained from the original conditional probabilities. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1. | Replace each f in F that involves some E_{i} with f_{{E1=o1 ,..., Es=os}}. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
2. | While there is a factor involving a non-query variable | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Select non-query variable Y to eliminate | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Set F= eliminate(Y,F). | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
3. | Return renormalize(F) |
Procedure eliminate(Y,F): | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Partition F into | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
{f_{1},..., f_{m}} that don't contain Y and | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
{f_{m+1},..., f_{r}} that do contain Y | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Compute f = SUM_{Y} f_{m+1}×_{t}...×_{t}f_{r} | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Return {f_{1},...,f_{m},f} |
Procedure renormalize({f_{1},...,f_{r}}): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Compute f = f_{1}×_{t}...×_{t}f_{r} | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Compute c = SUM_{X} f | % c is normalizing constant | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Return f/c | % divide each element of f by c |
In the tabular implementation of the VE algorithm (Figure *), a function of d discrete variables V_{1},...,V_{d}, is represented as a d-dimensional table (which can be implemented, for example, as a d-dimensional array, as a tree of depth d, or, as in our implementation, as a 1-dimensional array based on a lexicographic ordering on the variables). If f is such a table, let variables(f) = {V_{1},...,V_{d}}. We sometimes write f as f[V_{1},...,V_{d}] to make the variables explicit. f is said to involve V_{i} if V_{i} in variables(f).
There are three primitive operations on tables: setting variables, forming the product of tables, and summing a variable from a table.
Definition. Suppose C is a set of variables, c is an assignment C = v, and f is a factor on variables X. Let Y = X-C, let Z = X intersect C, and let Z = v' be the assignment of values to Z that assigns the same values to elements of Z as c does. Define set(f,c) be the factor on Y given by:
set(f,c)(Y) = f(Y,Z=v').That is, set(f,c) is a function of Y, the variables of f that are not in c, that is like f, but with some values already assigned. Note that, as a special case of this, if c doesn't involve any variable in f then set(f,c)=f.
Example.
Consider the factor f(A,B,C,D,E) defined by the table of Figure
*. Some examples of the value of this function are
f(a,b,c,d,e) = 0.55, and f(~
a,b,c,d,~
e) = 1-0.08 = 0.92.
set(f,~
a&b &e) is a function of C and
D defined by the table:
C | D | value |
c | d | 0.08 |
c | ~ d | 0.08 |
~ c | d | 0.025 |
~ c | ~ d | 0.5 |
Definition. The product of tables f_{1} and f_{2}, written f_{1}×_{t}f_{2} is a table on the union of the variables in f_{1} and f_{2} (i.e., variables(f_{1}×_{t}f_{2}) = variables(f_{1}) union variables(f_{2})) defined by:
(f_{1}×_{t}f_{2})(X,Y,Z) = f_{1}(X,Y)f_{2}(Y,Z)where Y is variables(f_{1}) intersect variables(f_{2}), X is variables(f_{1})- variables(f_{2}), and Z is variables(f_{2})- variables(f_{1}).
To construct the product of tables, f_{m+1}×_{t}···×_{t}f_{k}, we union all of the variables in f_{m+1}...f_{k}, say these are X_{1},...,X_{r}. Then we construct an r-dimensional table so there is an entry in the table for each combination v_{1},...,v_{r} where v_{i} in dom(X_{i}). The value for the entry corresponding to v_{1},...,v_{r} is obtained by multiplying the values obtained from each f_{i} applied to the projection of v_{1},...,v_{r} onto the variables of f_{i}.
Definition. The summing out of variable Y from table f, written SUM_{Y} f is the table with variables Z = variables(f)-{Y} such that^{5}
(SUM_{Y} f)(Z) = SUM_{vi in dom(Y)} f(Z&Y=v_{i})where dom(Y) = {v_{1},...,v_{s}}.
Example. Consider eliminating B from the factors of Example * (representing the belief network of Figure *), where all of the variables are Boolean. The factors that contain B, namely those factors that represent P(E|ABCD) and P(B|YZ), are removed from the set of factors. We construct a factor f_{1}(A,B,C,D,E,Y,Z)=P(E|A,B,C,D) ×_{t}P(B|Y,Z), thus, for example,
and similarly for the other values of A...Z. We then need to sum out B from f_{1}, producing f_{2}(A,C,D,E,Y,Z) where, for example,
f_{1}(a,b,c,d,e,y,z) = P(e|a&b&c&d) P(b|y&z) f_{1}( ~
a,b,c,d,e,y,z) = P(e|~
a&b&c&d) P(b|y&z)f_{1}( ~
a,~
b,c,d,e,y,z) = P(e|~
a&~
b&c&d) P(~
b|y&z)f_{1}(a, ~
b,c,d,e,y,z) = P(e|a&~
b&c&d) P(~
b|y&z)
f_{2}(a,c,d,e,y,z) = f_{1}(a,b,c,d,e,y,z)+f_{1}(a,~
b,c,d,e,y,z).
f_{2} is then added to the set of factors.
Note that the construction of f_{1} is for exposition only; we don't
necessarily have to construct a table for it explicitly.
Belief Network Inference |