When to Split |

**Definition.**
*
Given confactor r_{1} = <c_{1},T_{1}> and context c, such that c_{1} and
c are compatible, to split r_{1} on
c means to split r_{1} sequentially on each of the variables
that are assigned in c that aren't assigned in c_{1}.
*

*When we split r_{1} on c, we end up with a single confactor with a
context that is compatible with c; the contexts of all of the other
confactors that are produced by the splitting are incompatible with
c. These confactors that are incompatible with c are called residual confactors.
*

*More formally, we can recursively define residual(r_{1},c), where r_{1}
= <c_{1},t_{1}> and c and c_{1} are compatible, by:
*

*residual(r*if_{1},c)={}*c subset c*_{1}*Else if**c is not a subset of c*, select a variable_{1}*X*that is assigned in*c*but not in*c*._{1}

whereresidual(r _{1},c)= { `<`c_{1}&X=v_{i},set(t_{1}, X=v_{i})`>`: v_{i}in dom(X) & v_{i}!= c^{X}}union residual( `<`c_{1}&X=c^{X},set(t_{1},X=c^{X})`>`,c)*c*is the value assigned to^{X}*X*in context*c*. Recall (Definition *) that*set(t,X=v*is_{i})*t*if*t*doesn't involve*X*and is the selection of the*X=v*values from the table, followed by the projection onto the remaining variables, if_{i}*t*does involve*X*.

The results of splitting a confactor on a context is a set of confactors:

split(<c_{1},t_{1}>,c)=residual(<c_{1},t_{1}>,c) union {<c_{1}union c,t_{1}>}.

**Example.**
*
Consider residual(<a&b,t_{1}[C,D]>,c &e).
Suppose we split on C first, then on E.
This results in two residual confactors:
<a&b&*

`~`

c,t_{2}[D]> and
<a&b&c &`~`

e,t_{3}[D]>. Note that t_{2}[D] is the
projection of t_{1}[C,D] onto C=false and t_{3}[D] is the
projection of t_{1}[C,D] onto C=true. The non-residual confactor that
we want from the split is
<a&b&c &e,t_{3}[D]>.
*If instead we split on E then C, we
get the residual confactors:
<a&b&*

`~`

e,t_{1}[C,D]> and <a&b&`~`

c
&e,t_{2}[D]>, with the same non-residual confactor.
Note that the result can depend on the order in which variables are selected (see below for some useful splitting heuristics). The algorithms that use the split will be correct no matter which order the variables are selected, however some orderings may result in more splitting in subsequent operations.

Example * highlights one heuristic that seems generally applicable. When we have to split a confactor on variables that appear in its body and on variables in its table, it's better to split on variables in the table first, as these simplify the confactors that need to be subsequently split.

We can use the notion of a residual to split two rules that are
compatible, and need to be multiplied. Suppose we have confactors
*r _{1} = <c_{1},t_{1}>* and

**Example.**
*
Suppose we have confactors r_{1} = <a&b &*

`~`

c,t_{1}> and
r_{2} = <a &d,t_{2}> that both contain the variable being eliminated.
We can split r_{1} on the body of r_{2}, namely a &d, producing the confactors

<a&b &`~`

c &d,t_{1}><a&b &`~`

c &`~`

d,t_{1}>

We can split *r _{2}* on the body of

`~`

cOnly the second confactor (confactor (*))is compatible with

<a &b &c &d,t_{2}><a &b &`~`

c &d,t_{2}><a &`~`

b &d,t_{2}>

Suppose we have confactors *r _{1} = <c_{1}&Y=v_{i},t_{1}>* and

If *Y* is binary with domain *{v _{i},v_{j}}*, and there are confactors

If there are more than two values in the domain, we may need to split each pair of confactors, always using the results of previous splits for subsequent splits.residual(r_{1},c_{2}) union residual(r_{2},c_{1}) union {<c_{1}union c_{2},t_{1}+_{t}t_{2}>}.

**Proposition.**
*
Splitting confactor <c_{1},t_{1}> on c creates
*

SUM_{X in vars(c)-vars(c1)}(dom(X)-1)

When we have to split, there is a choice as to which variable to split on first. While this choice does not influence the number of confactors created for the single split, it can influence the number of confactors created in total because of subsequent splitting. One heuristic was given above. Another useful heuristic seems to be: given a confactor with multiple possible splits, look at all of the confactors that need to be combined with this confactor to enable multiplication or addition, and split on the variable that appears most. For those cases where the conditional probability forms a tree structure, this will tend to split on the root of the tree first.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

When to Split |