Exploiting Contextual Independence In Probabilistic Inference -- CVE Compared To VE

In order to understand the relationship between VE and CVE for the same query and the same elimination order, we can consider the VE derivation tree of the final answer. The tree contains all initial and intermediate factors created in VE. The parents in the tree of any factor are those factors that were multiplied or had a variable summed out to produce this factor. Note that this is a tree (each factor has only one child) as each factor is only used once in VE; once it is multiplied or a variable is summed from it, it is removed.

For each node in this tree that is created by multiplying two other factors, the number of multiplications in VE is equal to the table size of the resulting factor. For each factor created by summing out a variable, the number of additions is equal to the size of its parent minus its size.

We can define tree-based variable elimination (TVE) to be a composite of VE and CVE. It uses confactors as in CVE. Associated with each factor in the VE derivation tree is a set of confactors. When VE multiplies two factors, TVE multiplies (and does the requisite splitting) all of the compatible confactors associated with the factors being multiplied. TVE is essentially the same as the tree-based merging of [3] (but Boutilier also does maximization at decisions).

Whenever VE multiplies two factors, TVE multiplies all of the confactors associated with the factors. The TVE confactors associated with the VE factors will always have a total table size that is less than or equal to the VE factor size. TVE maintains a set of confactors with mutually exclusive and covering contexts. The number of multiplications is equal to the resulting table size for each pairwise multiplication (as each entry is computed by multiplying two numbers). It is easy to see that TVE always does fewer or an equal number of multiplications than VE.

CVE is like TVE except that CVE doesn't multiply some of the confactors when VE multiplies two factors. It delays the multiplications until they need to be done. It relies on the hope that the confactors can be separately simplified before they need to be multiplied. This hope is not unjustified because if eliminating a variable means that both of the factors need to be multiplied by other confactors, then they need to be multiplied by each other.

Example. If we were to use TVE for Example *, once OT is eliminated, TVE builds a tree representing the probability on both FH and MH. This entails multiplying out the confactors that were not combined in CVE, for example multiplying the third, fifth and sixth factors of the result of Example *, which produces the confactor of the form <~fb &~mb,t(FH,FT,MH,MT,S)>. Eliminating OT results in a set of confactors with total table size 72, compared to 24 for VE. Without any contextual structure, VE builds a table with 2⁷ = 128 values.

It is possible that not multiplying compatible confactors earlier means that we will eventually have to do more multiplications. The following example is the simplest example we could find where CVE does more multiplications than VE or TVE. Slight variations in the structure of this example, however result in CVE doing fewer multiplications.

Example. Consider the belief network shown in Figure *(a).

Belief network for Example *.

First we will sum out a variable, A, to create two confactors that are not multiplied in CVE but are multiplied in TVE. We then multiply one of these confactors by another factor when summing out the second variable, B. We then force the multiplication when eliminating C.
Suppose that all of the variables are binary except for variable W that has domain size 1000. (The counter example doesn't rely on non-binary variables; you could just have B having 10 binary parents, but this makes the arithmetic less clear). In the analysis below we only discuss the multiplications that involve W, as the other multiplications sum up to less than a hundred, and are dominated by the multiplications that involve W.
Suppose that we have the following confactors for S:
<x,t₁(A,B,C,S)>

<~x,t₂(B,C,S)>

Suppose that we have the following confactors for T:
<x,t₃(A,B,C,T)>

<~x,t₄(C,T)>

Suppose first that T is observed. Then the confactors for T are replaced by:
<x,t₅(A,B,C)>

<~x,t₆(C)>

Let's now eliminate A. In both CVE and TVE, confactors (*) and (*) and the prior on A are multiplied together, and A is summed out resulting in:
<x,t₇(B,C,S)>

In TVE, confactors (*) and (*) are also multiplied together resulting in:
<~x,t₈(B,C,S)>

Next we eliminate B. In both CVE and TVE, confactor (*) is multiplied by the factors representing P(C|B) and P(B|W). We assume that the factor representing P(B|W) is multiplied last as this minimises the number of multiplications. This involves 8008 multiplications (as the product has a table size of 8000 and the intermediate table is of size 8). Then B is summed out resulting in the factor:
<x,t₉(C,S,W)>

In CVE, confactor (*) is multiplied by the factors representing P(C|B) and P(B|W). This also involves 8008 multiplications. Then B is summed out from the product resulting in:
<~x,t₁₀(C,S,W)>

In TVE, confactor (*) is multiplied by the factors representing P(C|B) and P(B|W). This also involves 8008 multiplications. Then B is summed out from the product resulting in:
<~x,t₁₁(C,S,W)>

When C is eliminated, TVE requires no more multiplications. It just sums out C from the table of confactor (*). However in CVE, we need to multiply confactors (*) and (*), which involves 4000 multiplications. The resulting confactors from CVE and TVE are identical.

Thus CVE requires about 20000 multiplications, TVE requires about 16000 multiplications. VE, for the same elimination order, requires about 16000 multiplications.

This should not be surprising, particularly when you realise that VE is not optimal. For a given elimination ordering, it is sometimes optimal to multiply factors before VE actually multiplies them, as the following example shows:

Example. Consider the belief network in Figure *(b), with the same domain sizes as in the previous example. The factors represent P(W), P(B|W), P(C), P(S|BC). If we were to eliminate B then C, it is more efficient to preemptively multiply P(C) by P(S|BC) than to delay the multiplication till after summing out B (as VE does).

It may seem negative to show that CVE doesn't always do fewer multiplications than VE but has the overhead of maintaining contexts. However, there seems no reason why the preemptive multiplication of TVE is optimal either. One possibility is to treat "when to multiply" and "when to sum out variables" as a secondary optimisation problem [2][28][8]; unfortunately this optimization is also computationally difficult [8]. This, however is beyond the scope of this paper.