When to Split

## When to Split

Confactor splitting makes the multiset of confactors more complicated, so we have to be careful to apply this operation judiciously. We need to carry out confactor splitting in order to make identical or complementary contexts so we can carry out the operations of summing out a variable or multiplying confactors. These are the only cases we need to split.

Definition. Given confactor r1 = <c1,T1> and context c, such that c1 and c are compatible, to split r1 on c means to split r1 sequentially on each of the variables that are assigned in c that aren't assigned in c1.

When we split r1 on c, we end up with a single confactor with a context that is compatible with c; the contexts of all of the other confactors that are produced by the splitting are incompatible with c. These confactors that are incompatible with c are called residual confactors.

More formally, we can recursively define residual(r1,c), where r1 = <c1,t1> and c and c1 are compatible, by:

• residual(r1,c)={} if c subset c1
• Else if c is not a subset of c1, select a variable X that is assigned in c but not in c1.
 residual(r1,c) = {: vi in dom(X) & vi != cX} union residual(,c)
where cX is the value assigned to X in context c. Recall (Definition *) that set(t,X=vi) is t if t doesn't involve X and is the selection of the X=vi values from the table, followed by the projection onto the remaining variables, if t does involve X.

The results of splitting a confactor on a context is a set of confactors:

split(<c1,t1>,c)=residual(<c1,t1>,c) union {<c1 union c,t1>}.

Example. Consider residual(<a&b,t1[C,D]>,c &e). Suppose we split on C first, then on E. This results in two residual confactors: <a&b&`~`c,t2[D]> and <a&b&c &`~`e,t3[D]>. Note that t2[D] is the projection of t1[C,D] onto C=false and t3[D] is the projection of t1[C,D] onto C=true. The non-residual confactor that we want from the split is <a&b&c &e,t3[D]>.

If instead we split on E then C, we get the residual confactors: <a&b&`~`e,t1[C,D]> and <a&b&`~`c &e,t2[D]>, with the same non-residual confactor.

Note that the result can depend on the order in which variables are selected (see below for some useful splitting heuristics). The algorithms that use the split will be correct no matter which order the variables are selected, however some orderings may result in more splitting in subsequent operations.

Example * highlights one heuristic that seems generally applicable. When we have to split a confactor on variables that appear in its body and on variables in its table, it's better to split on variables in the table first, as these simplify the confactors that need to be subsequently split.

We can use the notion of a residual to split two rules that are compatible, and need to be multiplied. Suppose we have confactors r1 = <c1,t1> and r2 = <c2,t2>, that both contain the variable being eliminated and where c1 and c2 are compatible contexts. If we split r1 on c2, and split r2 on c1, we end up with two confactors whose contexts are identical. Thus we have the prerequisite needed for multiplying.

Example. Suppose we have confactors r1 = <a&b &`~`c,t1> and r2 = <a &d,t2> that both contain the variable being eliminated. We can split r1 on the body of r2, namely a &d, producing the confactors


Only the first of these is compatible with r2. The second confactor is a residual confactor.

We can split r2 on the body of r1, namely a&b &`~`c, by first splitting r2 on B, then on C, producing the confactors:


Only the second confactor (confactor (*))is compatible with r1 or any of the residual confactors produced by splitting r1. Confactors (*) and (*) have identical contexts and so can be multiplied.

Suppose we have confactors r1 = <c1&Y=vi,t1> and r2 = <c2&Y=vj,t2>, where c1 and c2 are compatible contexts, and vi != vj. If we split r1 on c2, and split r2 on c1, we end up with two confactors whose contexts are identical except for the complementary values for Y. This is exactly what we need for summing out Y.

If Y is binary with domain {vi,vj}, and there are confactors r1 = <c1&Y=vi,t1> and r2 = <c2&Y=vj,t2>, where c1 and c2 are compatible contexts, and there is no other confactor that contains Y that is compatible with c1 and c2, summing out Y in the context c1 union c2 results in the confactors:

residual(r1,c2) union residual(r2,c1) union {<c1 union c2,t1 +tt2>}.
If there are more than two values in the domain, we may need to split each pair of confactors, always using the results of previous splits for subsequent splits.

Proposition. Splitting confactor <c1,t1> on c creates

SUMX in vars(c)-vars(c1) (dom(X)-1)
extra confactors, independently of the order in which the variables are selected to be split, where vars(c) is the set of variables assigned in context c.

When we have to split, there is a choice as to which variable to split on first. While this choice does not influence the number of confactors created for the single split, it can influence the number of confactors created in total because of subsequent splitting. One heuristic was given above. Another useful heuristic seems to be: given a confactor with multiple possible splits, look at all of the confactors that need to be combined with this confactor to enable multiplication or addition, and split on the variable that appears most. For those cases where the conditional probability forms a tree structure, this will tend to split on the root of the tree first.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

 When to Split