Absorption |

**Definition.**
*
A multiset of confactors R is complete if the contexts of the
confactors are mutually exclusive and covering.
*

If we have a multiset *R* of confactors that is complete, it means
that we don't have to split other confactors *r* that may need to be
multiplied by all of the confactors in *R*. For each residual of *r*,
there will be another element of *R* with which it will be
compatible. Instead of splitting *r*, we can just multiply it by each
element of *R*. This is the intuition behind absorption. Note that we
may need to split elements of *R* during multiplication.

Suppose we have a complete multiset of confactors *R* and
another confactor *r _{1}=<c_{1},t_{1}>*. Define

where

absorb(R, <c_{1},t_{1}>)= { <c_{i},p_{i}>in R : incompatible(c_{i},c_{1})}union union _{ <ci,ti> in R compatible(c1,ci) }residual(<c_{i},t_{i}>,c_{1}) union {<c_{1}union c_{i}, set(t_{1},c_{i})×_{t}set(t_{i},c_{1})>}

**Example.**
*
If we were to eliminate B from the confactors of Figure
* (as in example *), we can treat
the two confactors for P(B|Y,Z) as the complete multiset of confactors. This
means that we don't need to split the other confactors on Y.
*

*Note that if we try to use the confactors for E as the complete set of
confactors when eliminating B, we don't have to split the
confactors on A, C or D, but we have to consider confactors that don't
involve B when eliminating B, and we end up multiplying confactors that
don't need to be multiplied.
*

Note that if *R* cannot be represented as a decision tree (e.g., if
*R* has confactors corresponding to the contexts in *{{a,b},
{ ~a,c}, {~b,~c}, {a,~b,c}, {~a,b,~c}}*), it's possible that there is no way to split

It seems as though it is very useful to have a multiset of confactors that
is complete, but it is not of much use if we cannot easily find such a
set. First note that if we have a multiset *R* of confactors that is
complete, then *absorb(R,r _{1})* is also a multiset of confactors that is
complete, which can, in turn, be used to combine with another confactor.

Initially, for each variable *X*, the confactors that represent the
conditional probability table *P(X|pi _{X})* are complete. Moreover they
will all contain

**Definition.**
*
If X is a variable, the confactors for X are a subset
of the confactor base defined by:
*

*Initially the confactors for**X*are the confactors that define the conditional probability*P(X|pi*._{X})*When we split a confactor for**X*, the confactors created are also confactors for*X*. Note that this does not cover the case where we are splitting a confactor on a confactor for*X*.*When we multiply a confactor for**X*with another confactor, the confactor created is a confactor for*X*.*When we add a confactor for**X*with another confactor (when eliminating another variable*Y*, for example), the resulting confactor is also a confactor for*X*.

**Proposition.**
*The confactors for X at any stage of contextual variable elimination
are complete.
*

- splitting preserves this property, as the resulting confactors are exclusive and cover the context of the confactor being split.
- multiplication preserves this property as, for any variable
*X*, only one of the confactors involved in a multiplication can be a confactor for*X*(as the confactors for*X*are mutually exclusive) and the set of contexts covered by the confactors isn't changed by multiplication. This argument also holds for absorption. - for addition, note that, for any variable
*X*, either all of the confactors or none of the confactors involved in an addition are confactors for*X*when summing out*Y*. To show this suppose we have*r*and_{1}=`<`c&Y=v_{1},p_{1}`>`*r*where confactor_{2}=`<`c&Y=v_{2},p_{2}`>`*r*is a confactor for_{1}*X*and confactor*r*isn't. Because the confactors for_{2}*X*are mutually exclusive and covering, there must be a confactor that is covered by*X*that is applicable in a context when*c&Y=v*is applicable. This confactor cannot mention_{2}*Y*, for otherwise addition isn't applicable, and so it must also be applicable when*c&y=v*is true, which contradicts the mutual exclusiveness of the confactors for_{1}*X*. Now it is easy to see that addition preserves this property as the confactors being summed cover the same contexts as the resulting confactor.

We can also use the idea of the confactors for *X* to recognise when
summing out a variable will create a table of 1's that can be removed
(see Section *). First note that in the original
confactors for *X*, if, when eliminating *X*, we don't have to
multiply these by other confactors (i.e., they have no children in
this context), then we know that summing out *X* will produce a table
of 1's. We can do better than this for recognising when we will
produce ones. We will use one bit of information to encode whether a
confactor for *X* is *pure* for *X*. Initially all of the
confactors for *X* are pure for *X*. If a confactor for *X* is pure
for *X*, and, when eliminating *Y* is absorbed into a confactors for
*Y* that is pure for *Y*, then the resulting confactor is pure for
*X*. For every other case when a confactor for *X* is multiplied by
another confactor, the result is not pure for *X*. If we are summing
out *X*, after absorption, we can remove all confactors for *X* that
are pure for *X*. This is correct because we have maintained the
invariant that if we sum out *X* from the table of a confactor for
*X* that is pure for *X* we create a table with only ones. Note that
this procedure generalises the idea that we can recursively remove
variables with no children that are neither observed nor queried.

The idea of absorption into the rules for the variable being eliminated described in this section should only be seen as a heuristic to avoid splitting. It does not necessarily reduce the amount of work. First note that in variable elimination there is a choice in the order that we multiply factors. Multiplication of factors is associative and commutative, however, the order in which the multiplication is carried out affects efficiency, as the following example shows.

**Example.**
*
Suppose variable B has one parent, A and two children C and D,
who each only have B as a parent. When eliminating B we have to
multiply the factors representing P(B|A), P(C|B) and
P(D|B). Suppose that B, C and D are binary, and that A has
domain size of 1000. When multiplying two factors the number of
multiplications required is equal to the size of the resulting
factor. If we save intermediate results, and multiply these in the
order (P(B|A)×_{t}P(C|B)) ×_{t}P(D|B), we will do 12000
multiplications. If we save intermediate results, and multiply these in
the order P(B|A)×_{t}(P(C|B) ×_{t}P(D|B)), we will do 8008
multiplications. If we don't save intermediate tables, but instead
recompute every product, we will do
16000 multiplications.
*

The associative ordering imposed by absorption into the
rules for the variable being eliminated (which for
the example above implies absorbing *P(C|B)* and *P(D|B)* into
*P(B|A)*) may not be the optimal multiplication ordering. The
absorption ordering (that saves because it reduced splitting) should
be seen as a heuristic; it may be worthwhile to do a meta-level
analysis to determine what order to multiply
[2][28][8], but this is beyond the
scope of this paper.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Absorption |