Exploiting Contextual Independence In Probabilistic Inference -- Absorption

Absorption

Absorption

In this section we characterize a case when we don't need to split during multiplication. Note that the result of eliminating a variable is exactly the same as before; we are saving because we don't create the residuals, but rather use the original confactor.

Definition. A multiset of confactors R is complete if the contexts of the confactors are mutually exclusive and covering.

If we have a multiset R of confactors that is complete, it means that we don't have to split other confactors r that may need to be multiplied by all of the confactors in R. For each residual of r, there will be another element of R with which it will be compatible. Instead of splitting r, we can just multiply it by each element of R. This is the intuition behind absorption. Note that we may need to split elements of R during multiplication.

Suppose we have a complete multiset of confactors R and another confactor r₁=<c₁,t₁>. Define

absorb(R,<c₁,t₁>)

= {<c_i,p_i> in R : incompatible(c_i,c₁)}

union union _{<c_i,t_i> in R

compatible(c₁,c_i)} residual(<c_i,t_i>,c₁) union {<c₁ union c_i, set(t₁,c_i)×_tset(t_i,c₁)>}

where ×_t is table multiplication (Definition *).

Correctness:

We can replace R union {r₁} with absorb(R,r₁) and the program invariant is preserved. First note that the confactors in absorb(R,r₁) are complete (and so the second part of the invariant holds). To see why the first part of the invariant is preserved, consider a complete context c. If c is compatible with c₁, then in the original confactors, we use one confactor from R as well as r₁. In the revised confactor base we use the appropriate confactor with the product of the original probabilities. If c is incompatible with c₁ then it is compatible with one element <c₂,p₂> of R. If c₂ is incompatible with c₁, the confactor to be used is the original confactor, and if c₂ is compatible with c₁, then we use the residual confactor. In each case we get the same contributions from R union {r₁} and from absorb(R,r₁).

Example. If we were to eliminate B from the confactors of Figure * (as in example *), we can treat the two confactors for P(B|Y,Z) as the complete multiset of confactors. This means that we don't need to split the other confactors on Y.

Note that if we try to use the confactors for E as the complete set of confactors when eliminating B, we don't have to split the confactors on A, C or D, but we have to consider confactors that don't involve B when eliminating B, and we end up multiplying confactors that don't need to be multiplied.

Note that if R cannot be represented as a decision tree (e.g., if R has confactors corresponding to the contexts in {{a,b}, {~a,c}, {~b,~c}, {a,~b,c}, {~a,b,~c}}), it's possible that there is no way to split r₁ that results in as compact a representation as results from absorb(R,r₁).

It seems as though it is very useful to have a multiset of confactors that is complete, but it is not of much use if we cannot easily find such a set. First note that if we have a multiset R of confactors that is complete, then absorb(R,r₁) is also a multiset of confactors that is complete, which can, in turn, be used to combine with another confactor.

Initially, for each variable X, the confactors that represent the conditional probability table P(X|pi_X) are complete. Moreover they will all contain X and so need to be involved when eliminating X. These can be used as the initial seed for absorbing other confactors. Unfortunately, after we have eliminated some variables, the confactors that define the initial conditional probability tables for some variable X don't exist anymore; they have been split, multiplied by other confactors and added to other confactors. However, for each variable X, we can still extract a useful multiset of confactors that all contain X that are complete on the empty context (i.e., are mutually exclusive and covering). These will be called the confactors for X, and correspond to the confactors with X in the head in an earlier version of contextual variable elimination [25].

Definition. If X is a variable, the confactors for X are a subset of the confactor base defined by:

Initially the confactors for X are the confactors that define the conditional probability P(X|pi_X).
When we split a confactor for X, the confactors created are also confactors for X. Note that this does not cover the case where we are splitting a confactor on a confactor for X.
When we multiply a confactor for X with another confactor, the confactor created is a confactor for X.
When we add a confactor for X with another confactor (when eliminating another variable Y, for example), the resulting confactor is also a confactor for X.

Proposition. The confactors for X at any stage of contextual variable elimination are complete.

To show this, we will show that the three basic operations preserve this property.

splitting preserves this property, as the resulting confactors are exclusive and cover the context of the confactor being split.
multiplication preserves this property as, for any variable X, only one of the confactors involved in a multiplication can be a confactor for X (as the confactors for X are mutually exclusive) and the set of contexts covered by the confactors isn't changed by multiplication. This argument also holds for absorption.
for addition, note that, for any variable X, either all of the confactors or none of the confactors involved in an addition are confactors for X when summing out Y. To show this suppose we have r₁=<c&Y=v₁,p₁> and r₂=<c&Y=v₂,p₂> where confactor r₁ is a confactor for X and confactor r₂ isn't. Because the confactors for X are mutually exclusive and covering, there must be a confactor that is covered by X that is applicable in a context when c&Y=v₂ is applicable. This confactor cannot mention Y, for otherwise addition isn't applicable, and so it must also be applicable when c&y=v₁ is true, which contradicts the mutual exclusiveness of the confactors for X. Now it is easy to see that addition preserves this property as the confactors being summed cover the same contexts as the resulting confactor.

We can also use the idea of the confactors for X to recognise when summing out a variable will create a table of 1's that can be removed (see Section *). First note that in the original confactors for X, if, when eliminating X, we don't have to multiply these by other confactors (i.e., they have no children in this context), then we know that summing out X will produce a table of 1's. We can do better than this for recognising when we will produce ones. We will use one bit of information to encode whether a confactor for X is pure for X. Initially all of the confactors for X are pure for X. If a confactor for X is pure for X, and, when eliminating Y is absorbed into a confactors for Y that is pure for Y, then the resulting confactor is pure for X. For every other case when a confactor for X is multiplied by another confactor, the result is not pure for X. If we are summing out X, after absorption, we can remove all confactors for X that are pure for X. This is correct because we have maintained the invariant that if we sum out X from the table of a confactor for X that is pure for X we create a table with only ones. Note that this procedure generalises the idea that we can recursively remove variables with no children that are neither observed nor queried.

The idea of absorption into the rules for the variable being eliminated described in this section should only be seen as a heuristic to avoid splitting. It does not necessarily reduce the amount of work. First note that in variable elimination there is a choice in the order that we multiply factors. Multiplication of factors is associative and commutative, however, the order in which the multiplication is carried out affects efficiency, as the following example shows.

Example. Suppose variable B has one parent, A and two children C and D, who each only have B as a parent. When eliminating B we have to multiply the factors representing P(B|A), P(C|B) and P(D|B). Suppose that B, C and D are binary, and that A has domain size of 1000. When multiplying two factors the number of multiplications required is equal to the size of the resulting factor. If we save intermediate results, and multiply these in the order (P(B|A)×_tP(C|B)) ×_tP(D|B), we will do 12000 multiplications. If we save intermediate results, and multiply these in the order P(B|A)×_t(P(C|B) ×_tP(D|B)), we will do 8008 multiplications. If we don't save intermediate tables, but instead recompute every product, we will do 16000 multiplications.

If you need to multiply k>1 factors, where m is the size of the resulting factor, the number of multiplications is bounded below by k-2+m (as the final product requires m multiplications and each other requires at least one multiplication) and bounded above by (k-1)*m (as there are k-1 factor multiplications and each of these requires at most m multiplications).

The associative ordering imposed by absorption into the rules for the variable being eliminated (which for the example above implies absorbing P(C|B) and P(D|B) into P(B|A)) may not be the optimal multiplication ordering. The absorption ordering (that saves because it reduced splitting) should be seen as a heuristic; it may be worthwhile to do a meta-level analysis to determine what order to multiply [2][28][8], but this is beyond the scope of this paper.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Absorption