Exploiting Contextual Independence In Probabilistic Inference -- Belief Networks

Belief Networks

Belief Networks

We treat random variables as primitive. We use upper case letters to denote random variables. The domain of a random variable X, written dom(X), is a set of values. If X is a random variable and v in dom(X), we write X=v to mean the proposition that X has value v. The function dom can be extended to tuples of variables. We write tuples of variables in upper-case bold font. If X is a tuple of variables, <X₁,...,X_k>, then dom(X) is the cross product of the domains of the variables. We write <X₁,...,X_k>=<v₁,...,v_k> as X₁=v₁&...&X_k=v_k. This is called an instantiation of X. For this paper we assume there is a finite number of random variables, and that each domain is finite.

We start with a total ordering X₁,...,X_n of the random variables.

Definition. The parents of random variable X_i, written pi_{X_i}, are a minimal¹ set of the predecessors of X_i in the total ordering such that the other predecessors of X_i are independent of X_i given pi_{X_i}. That is pi_{X_i} subset {X₁,...,X_i-1} such that P(X_i|X_i-1...X₁) = P(X_i|pi_{X_i}).

A belief network [22] is an acyclic directed graph, where the nodes are random variables². We use the terms node and random variable interchangeably. There is an arc from each element of pi_{X_i} into X_i. Associated with the belief network is a set of probabilities of the form P(X|pi_X), the conditional probability of each variable given its parents (this includes the prior probabilities of those variables with no parents).

By the chain rule for conjunctions and the independence assumption:

P(X₁,...,X_n) = PROD_{i = 1}ⁿ P(X_i|X_i-1...X₁)

= PROD_{i = 1}ⁿ P(X_i|pi_{X_i})

This factorization of the joint probability distribution is often given as the formal definition of a belief network.

A	B	C	D	P(e\|ABCD)
a	b	c	d	0.55
a	b	c	`~`d	0.55
a	b	`~`c	d	0.55
a	b	`~`c	`~`d	0.55
a	`~`b	c	d	0.3
a	`~`b	c	`~`d	0.3
a	`~`b	`~`c	d	0.3
a	`~`b	`~`c	`~`d	0.3
`~`a	b	c	d	0.08
`~`a	b	c	`~`d	0.08
`~`a	b	`~`c	d	0.025
`~`a	b	`~`c	`~`d	0.5
`~`a	`~`b	c	d	0.08
`~`a	`~`b	c	`~`d	0.08
`~`a	`~`b	`~`c	d	0.85
`~`a	`~`b	`~`c	`~`d	0.5

A simple belief network and a conditional probability table for E.

Example. Consider the belief network of Figure *. This represents a factorization of the joint probability distribution:

P(A,B,C,D,E,Y,Z)

= P(E|ABCD) P(A|YZ) P(B|YZ)P(C|YZ) P(D|YZ) P(Y) P(Z)

If the variables are binary³, the first term, P(E|ABCD), requires the probability of E for all 16 cases of assignments of values to A,B,C,D. One such table is given in Figure *.

David Poole and Nevin Lianwen Zhang,Exploiting Contextual Independence In Probabilistic Inference, Journal of Artificial Intelligence Research, 18, 2003, 263-313.

Belief Networks

P(X₁,...,X_n)	=	PROD_{i = 1}ⁿ P(X_i\|X_i-1...X₁)
	=	PROD_{i = 1}ⁿ P(X_i\|pi_{X_i})

P(A,B,C,D,E,Y,Z)
	=	P(E\|ABCD) P(A\|YZ) P(B\|YZ)P(C\|YZ) P(D\|YZ) P(Y) P(Z)