Exploiting Contextual Independence In Probabilistic Inference -- Parent Contexts and Contextual Belief Networks

Parent Contexts and Contextual Belief Networks

As in the definition of a belief network, let's assume that we have a total ordering of the variables, X₁,...,X_n.

Definition. Given variable X_i, we say that C=c, where C subset {X_i-1...X₁} and c in dom(C), is a parent context for X_i if X_i is contextually independent of the predecessors of X_i (namely {X_i-1...X₁}) given C=c.

Example. Consider the belief network and conditional probability table of Figure *. The predecessors of variable E are A,B,C,D,Y,Z. A set of minimal parent contexts for E is { {a,b}, {a,~b}, {~a,c}, {~a,~c,d,b}, {~a,~c,d,~b}, {~a,~c,~d}}. This is a mutually exclusive and exhaustive set of parent contexts. The probability of E given values for its predecessors can be reduced to the probability of E given a parent context. For example:

P(e|a,b,c,~d,y,~z) = P(e|a,b)

P(e|~a,b,c,~d,y,~z) = P(e|~a,c)

P(e|~a,~b,c, d,~y,~z) = P(e|~a,c).

In the belief network, the parents of E are A,B,C,D. To specify the conditional probability of E given its parents, the traditional tabular representation (as in Figure *) require 2⁴ = 16 numbers instead of the 6 needed if we were to use the parent contexts above. Adding an extra variable as a parent to E doubles the size of the tabular representation, but if it is only relevant in a single context it may only increase the number of parent contexts by one.
We can often (but not always) represent contextual independence in terms of trees.

Tree-structured representations of the conditional probabilities for E, B, and D given their parents. Left branches correspond to true and right branches to false. Thus, for example, P(e|a &b) = 0.55, P(e|a &~b) = 0.3, P(e|~a &~c &d &b) = 0.025 etc.

The left side of Figure * gives a tree-based representation for the conditional probability of E given its parents. In this tree, internal nodes are labelled with parents of E in the belief network. The left child of a node corresponds to the variable labelling the node being true, and the right child to the variable being false. The leaves are labelled with the probability that E is true. For example P(e|a&~b) = 0.3, irrespectively of the value for C or D. In the tree-based representation the variable (E in this case) is contextually independent of its predecessors given the context defined by a path through the tree. The paths through the tree correspond to parent contexts.

Before showing how the structure of parent contexts can be exploited in inference, there are a few properties to note:

The elements of a mutually exclusive and exhaustive set of parent contexts are not always the minimal parent contexts. For example, suppose we have a variable A with parents B and C, all of which are Boolean. Suppose probability of a is p₁ when both B and C are true and probability p₂ otherwise. One mutually exclusive and exhaustive set of parent contexts for A is {b&c, b &~c, ~b}. b &~c is not minimal as ~c is also a parent context. Another mutually exclusive and exhaustive set of parent contexts for this example is {b&c, ~b &c, ~c}. The set of minimal parent contexts, {b&c, ~b , ~c}, isn't a mutually exclusive and exhaustive set as the elements are not pairwise incompatible.

One could imagine using arbitrary Boolean formulae in the contexts. This was not done as it would entail using theorem proving (or a more sophisticated subsumption algorithm) during inference. We doubt that this would be worth the extra overhead for the limited savings.

A compact decision tree representation of conditional probability tables [5] always corresponds to a compact set of parent contexts (one context for each path through the tree). However, a mutually exclusive and exhaustive set of parent contexts cannot always be directly represented as a decision tree (as there isn't always a single variable to split on). For example, the mutually exclusive and exhaustive set of contexts {{a,b}, {~a,c}, {~b,~c}, {a,~b,c}, {~a,b,~c}} doesn't directly translate into a decision tree. More importantly, the operations we perform don't necessarily preserve the tree structure. Section * shows how we can do much better than an analogous tree-based formulation of our inference algorithm.

Definition. A contextual belief network is an acyclic directed graph where the nodes are random variables. Associated with each node X_i is a mutually exclusive and exhaustive set of parent contexts, Pi_i, and, for each pi in Pi_i, a probability distribution P(X_i|pi) on X_i. Thus a contextual belief network is like a belief network, but we only specify the probabilities for the parent contexts.

For each variable X_i and for each assignment X_i-1=v_i-1,..., X₁=v₁ of values to its preceding variables, there is a compatible parent context pi_{X_i}^v_i-1...v₁. The probability of a complete context (an assignment of a value to each variable) is given by:

P(X₁=v₁,...,X_n=v_n)

= PROD_{i = 1}ⁿ P(X_i=v_n|X_i-1=v_i-1,..., X₁=v₁)

= PROD_{i = 1}ⁿ P(X_i=v_i|pi_{X_i}^v_i-1...v₁)

Parent Contexts and Contextual Belief Networks

P(X₁=v₁,...,X_n=v_n)
	=	PROD_{i = 1}ⁿ P(X_i=v_n\|X_i-1=v_i-1,..., X₁=v₁)
	=	PROD_{i = 1}ⁿ P(X_i=v_i\|pi_{X_i}^v_i-1...v₁)