Reasoning with uncertain knowledge and beliefs has long been recognized as an important research issue in AI [33,11]. Several methodologies have been proposed, including certainty factors, fuzzy sets, Dempster-Shafer theory, and probability theory. The probabilistic approach is now by far the most popular among all those alternatives, mainly due to a knowledge representation framework called Bayesian networks or belief networks [28,18].
Bayesian networks are a graphical representation of (in)dependencies amongst random variables. A Bayesian network (BN) is a DAG with nodes representing random variables, and arcs representing direct influence. The independence that is encoded in a Bayesian network is that each variable is independent of its non-descendents given its parents.
Bayesian networks aid in knowledge acquisition by specifying which probabilities are needed. Where the network structure is sparse, the number of probabilities required can be much less than the number required if there were no independencies. The structure can be exploited computationally to make inference faster [28,23,19,32].
The definition of a Bayesian network does not constrain how a variable depends on its parents. Often, however, there is much structure in these probability functions that can be exploited for knowledge acquisition and inference. One such case is where some dependencies depend on particular values of other variables; such dependencies can be stated as rules , trees  or as multinets . Another is where the the function can be described using a binary operator that can be applied to values from each of the parent variables. It is the latter, known as `causal independencies', that we seek to exploit in this paper.
Causal independence refers to the situation where multiple causes contribute independently to a common effect. A well-known example is the noisy OR-gate model . Knowledge engineers have been using specific causal independence models in simplifying knowledge acquisition [17,26,25]. Heckerman  was the first to formalize the general concept of causal independence. The formalization was later refined by Heckerman and Breese .
Kim and Pearl  showed how the use of noisy OR-gate can speed up inference in a special kind of BNs known as polytrees; D'Ambrosio [8,7] showed the same for two level BNs with binary variables. For general BNs, Olesen et al.  and Heckerman  proposed two ways of using causal independencies to transform the network structures. Inference in the transformed networks is more efficient than in the original networks (see Section 9).
This paper proposes a new method for exploiting a special type of causal independence (see Section 4) that still covers common causal independence models such as noisy OR-gates, noisy MAX-gates, noisy AND-gates, and noisy adders as special cases. The method is based on the following observation. A BN can be viewed as representing a factorization of a joint probability into the multiplication of a list of conditional probabilities [31,36,24]. The type of causal independence studied in this paper leads to further factorization of the conditional probabilities (Section 5). A finer-grain factorization of the joint probability is obtained as a result. We propose to extend exact inference algorithms that only exploit conditional independencies to also make use of the finer-grain factorization provided by causal independence.
The state-of-art exact inference algorithm is called clique tree propagation (CTP) [23,19,32]. This paper proposes another algorithm called variable elimination (VE ) (Section 3), that is related to SPI [31,24], and extends it to make use of the finer-grain factorization (see Sections 6, 7, and 8). Rather than compiling to a secondary structure and finding the posterior probability for each variable, VE is query-oriented; it needs only that part of the network relevant to the query given the observations, and only does the work necessary to answer that query. We chose VE instead of CTP because of its simplicity and because it can carry out inference in large networks that CTP cannot deal with.
Experiments (Section 10) have been performed with two CPCS networks provided by Pradhan. The networks consist of 364 and 421 nodes respectively and they both contain abundant causal independencies. Before this paper, the best one could do in terms of exact inference would be to first transform the networks by using Jensen et al.'s or Heckerman's technique and then apply CTP. In our experiments, the computer ran out of memory when constructing clique trees for the transformed networks. When this occurs one cannot answer any query at all. However, the extended VE algorithm has been able to answer almost all randomly generated queries with twenty or less observations (findings) in both networks.
One might propose to first perform Jensen et al.'s or Heckerman's transformation and then apply VE . Our experiments show that this is significantly less efficient than the extended VE algorithm.
We begin with a brief review of the concept of a Bayesian network and the issue of inference.