David Poole

This page contains some information on research by David Poole and students on probabilistic reasoning and decision making. It is not intended to be an introduction to the vast literature on these topics, but only the incremental work done by me. For more different perspectives, see the pointers from the Uncertainty in AI (UAI) home page. Maybe someday I will write an online introduction.

Probabilistic Horn Abduction
Searching Bayesian Networks
Variable Elimination
Exploiting Contextual Independence in Probabilistic Inference
Decision Making Under Uncertainty
Lifted Inference

Probabilistic Horn abduction

Probabilistic Horn abduction is a pragmatic combination of logic and probability. The general idea is to have a set of independent choices with a logic program that tells the consequences of choices. The logic is weak: it does not contain disjunction [all of the "uncertainty" is embodied in the choices], and it does not impose any probabilistic dependencies on the choices. This makes for a semantically simple framework that is surprisingly general. It is of interest because it naturally extends Bayesian networks and logic programming.

Two early papers on this were:

D. Poole, ``Representing Bayesian Networks within Probabilistic Horn Abduction'', Proceedings of the Seventh Conference on Uncertainty in AI, Los Angeles, July 1991, pp. 271-278.
D. Poole, `` Representing Diagnostic Knowledge for Probabilistic Horn Abduction'', Proc. Twelfth International Joint Conference on Artificial Intelligence, Sydney, Australia, August 1991, pp. 1129-1135. Reprinted in W. Hamscher, L. Console and J. de Kleer (Eds.), Readings in Model-based Diagnosis, Morgan Kaufmann, 1992.

There was a slight change in the formalism after these in order to make it more straight forward to interpret semantically. The following paper is the `definitive' version of probabilistic Horn abduction. In retrospect, it may have been simpler to describe it semantically (as in the appendix of this paper), and then to show that the abductive characterization is a consequence of this semantic definition. At the time, I thought presenting it in terms of the abductive characterization was simpler.

D. Poole, ``Probabilistic Horn abduction and Bayesian networks'', Artificial Intelligence, 64(1), 81-129, 1993.

A top-down proof procedure is described in:

D. Poole, ``Logic Programming, Abduction and Probability: a top-down anytime algorithm for estimating prior and posterior probabilities'', New Generation Computing, 11(3-4), 377-400, 1993.

The code described there is available, along with the example programs described in the AI Journal paper.

Anecdote: How did this work come about? I had been working on assumption-based reasoning (the Theorist system), where I saw the need to distinguish explaining observations (abduction) and predicting (default reasoning), and how they could be combined. I also saw how credulous default reasoning could be viewed as an agent being able to choose its assumptions and sceptical default reasoning as an adversary choosing assumptions; the next step was having nature choosing assumptions. I was studying Bayesian networks and realised that they could also be seen in terms of evidential reasoning (abduction) followed by causal reasoning (prediction). Moreover to interpret the assumption-based reasoning with probabilities over assumptions simply, I needed to restrict the logic to definite clauses (I didn't want the uncertainty of disjunction as well as the uncertainty of probabilities). It turns out that you can get an exact match between these two views (the abduction with probabilities over assumptions and Bayesian networks) if you do it right; this lead to probabilistic Horn abduction and more recently to the Independent Choice Logic.

I following up on this work in two different directions. In the first direction the logic can be made richer to include negation as failure, and to weaken the constraints on disjointedness of rules. This doesn't sacrifice semantic simplicity, but makes it much more useful. The second direction is to allow for choices by other agents as well as by "nature". The formalism then becomes a form of the strategic form of a game, but with a logic program giving the consequences of a complete play (each agent, including nature, making one choice). This is the basis of the "Independent Choice Logic". There are a number of papers published on this - see logical and probabilistic reasoning.

Searching Bayesian Networks

Inspired by the probabilistic Horn abduction characterization of Bayesian networks, and by the work in the consistency-based diagnosis community (particularly the work of de Kleer and Reiter), I worked on some search algorithms for estimating probabilities in Bayesian networks.

The general idea is to enumerate some of the most likely possible worlds, and then to estimate probabilities based on these worlds. In general this is a really stupid idea, but when there are skewed probability distributions, even a very naive algorithm can work well:

D. Poole, ``Average-case analysis of a search algorithm for estimating prior and posterior probabilities in Bayesian networks with extreme probabilities'', Proc. Thirteenth International Joint Conference on Artificial Intelligence, pages 606-612, France, August 1993.

We can however do much better than this. A general idea is to build the possible worlds bottom-up and always choose the most likely value for the next variable (given the value for the parents already chosen). We can do much better than the naive algorithm if, when we find a prediction that doesn't coincide with the observations, we try and find out on which values that prediction depended (forming a "conflict"), and then use that information to refine the search:

D. Poole, ``Probabilistic conflicts in a search algorithm for estimating posterior probabilities in Bayesian networks'', Artificial Intelligence, 88, 69-100, 1996.

This is a revised version of the paper:

D. Poole, ``The Use of Conflicts in Searching Bayesian Networks'', Proceedings of the Ninth Conference on Uncertainty in AI, Washington D.C., pages 359-367, July 1993.

Prolog code and the example described in the paper is available online. The example is diagnosing a cascaded adder with thousands of bits (tens of thousands of nodes in the Bayesian network).

Anecdote: You may wonder how one finds such algorithms. I actually implemented a number of search-based algorithms for Bayesian networks (about fifteen, I recall, although some were closely related). This was inspired by trying to understand whether some logic-based algorithms could be adopted for probabilistic inference (once I had realized their close relationship via probabilistic Horn abduction). The aim was to try to complement the algorithms that exploited the sparseness of the network for fast inference. These implementations ranged from a top-down implementation on the rules, to one that used lemmas (or should this be "lemmata") to not recompute shared subgoals, to ones based on ATMSs. One of these just blew the others away in the sense that I could keep increasing the size of the network, and it still ran. Whereas the others bogged down on tens of nodes (for my Prolog implementations), this could run with tens of thousands of nodes. This was the naive version that is extremely simple to implement, and is fast because it does very little bookkeeping. The conflict algorithm is much more complicated conceptually, and it wins by really only doing bookkeeping when an observation conflicts with a prediction, and then only does work in proportion to the size of the conflict (mainly because it only looks for a single conflict in a greedy manner; this is sensible because finding small conflicts only affects the speed of the algorithm, not the correctness).

Where to now? This paper has considered the case of where we can exploit the skewness of distributions independently of the structure (there are good algorithms for evaluating sparse Bayesian networks, using clique tree propagation - thus I looked at the case where the networks are not sparse). One interesting idea to follow up would be to consider Bayesian networks where part of the network is sparse (but doesn't have Skewed distributions) and part of the network is skewed but not sparse - how can we then get efficient algorithms? How can this algorithm be combined with other search-based algorithms such as recursive conditioning.

Variable Elimination

There is a simple way to understand probabilistic inference that exploits the independence structure of Bayesian networks. Given a Bayesian network, a query variable, a set of observations, and an elimination ordering on the remaining variables. The basic idea is to set the observed variables, sum out the other variables according to the elimination ordering, and then renormalize. Nevin Zhang and I called this the variable elimination (VE) algorithm:

N.L. Zhang and D. Poole, ``A simple approach to Bayesian network computations'', Proceedings of the Tenth Biennial Canadian Artificial Intelligence Conference (AI-94), Banff, May 1994.

This is a simple idea that keeps being rediscovered. It was the basis behind D'Ambrosio's SPI (but in SPI they were worried about finding good elimination orderings). We were the first to characterize the algorithm as distributing sums overs products. VE is the same as Dechter's bucket elimination for belief revision, except that bucket elimination has a way to index factors when there is a static elimination ordering (but our '94 paper also explains how to do this). VE forms a good basis for understanding the structured Bayesian network algorithms and as the basis for exploiting more structure than traditional Bayesian network algorithms.

The following paper shows how we can exploit a form of "causal independence" where the conditional probability of a variable can be described in terms of an associative and commutative operator on the values of its parents:

N.L. Zhang and D. Poole, ``Exploiting Causal Independence in Bayesian Network Inference'', Journal of Artificial Intelligence Research, 5, 301-328, 1996.

VE is also the basis behind the algorithms for exploiting contextual independence described below.

Exploiting Contextual Independence in Probabilistic Inference

One of the themes of my research is to find structure in problems and to exploit that structure computationally. The following two papers show how a notion of contextual independence (context-specific independence) leads to a rule-based representation of Bayesian networks, and shows how the rule-structure can be exploited for computational gain. In the worst case we get the same as the structured Bayesian network algorithms. In the best case we can have an exponential saving.

D. Poole, ``Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference'', Proc. Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, Japan, August 1997, pp. 1284-1291.

The following paper shows how the structured algorithm can be used for approximation. This gives us much more control than say removing arcs in a Bayesian network as we can carry out different approximations in different contexts. We can also compute the bounds on the resulting errors.

D. Poole, ``Context-specific approximation in probabilistic inference'', Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), Madison, Wisconsin, July 1998.

Nevin and I did a lot of work trying to get efficient algorithms, in particular algorithms where the advantage on context-specific independence could outweigh the advantage of efficient indexing that the dense tabular representation of factors allows. This cumulated in the following paper that can exploit more structure than a version of variable elimination that exploits context-specific independence in factors.

David Poole and Nevin Lianwen Zhang, ``Exploiting contextual independence in probabilistic inference'', Journal of Artificial Intelligence Research,18, 263-313, 2003.

Decision Making Under Uncertainty

Bayesian decision theory is a normative framework for decision making under uncertainty. I am most interested in sequential decision making (choosing a sequence of actions) under imperfect sensors. These decision problems can be cast as influence diagrams or partially observable Markov Decision processes (POMDPs).

Runping Qi and Nevin Zhang developed a method for influence diagram evaluation that used a form of reachability analysis to prune the states needed for dynamic programming. This is useful for asymmetric decision problems (when there are some values that are impossible, or very unlikely, based on previous values (which are very common when problems are cast as influence diagrams)). This uses alpha-beta search combined with standard Bayesian network inference algorithms.

R. Qi, N. L. Zhang and D. Poole, ``Solving Asymmetric Decision Problems with Influence Diagrams'', Proceedings of the Tenth Conference on Uncertainty in AI, Seattle, July 1994, 491-497.
R. Qi and D. Poole, ``A New Method for Influence Diagram Evaluation'', Computational Intelligence, 11(3), 498-528, 1995.

In other work, Nevin Zhang, Runping Qi and I developed a generalization of influence diagrams that are the most general kind that allow for dynamic programming. This forms the basis for a divide-and-conquer method for evaluating these forms of influence diagrams.

N.L. Zhang, R. Qi and D. Poole, ``A Computational Theory of Decision Networks'', International Journal of Approximate Reasoning, 11(2), 83-158, 1994.

Michael Horsch and I developed flexible (anytime) algorithm for decision making in large influence diagrams by "information refinement". The idea is to build an approximation of future actions that can be used to derive a good approximation of the value functions. We used a decision tree expressing how each decision node depends on its information predecessors. Often good decisions can be expressed with small decision trees. The following paper shows how this can be done for a single decision:

M.C. Horsch and D. Poole. Flexible Policy Construction by Information Refinement, Proc. Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, Oregon, August 1996.

The following paper shows how information refinement can be carried out for influence diagrams with multiple decision nodes.

M. C. Horsch and D. Poole, ``An Anytime Algorithm for Decision Making under Uncertainty'', Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), Madison, Wisconsin, July 1998.

In work related to the contextual independence, we have also been working on ways to exploit this for decision making. In the following paper I show how the structure of the value function and structure in the observation models impose a structure on the decision function. This allows us to extract structured (exact) policies.

D. Poole, ``Exploiting the Rule Structure for Decision Making within the Independent Choice Logic'', Proceedings of the Eleventh Conference on Uncertainty in AI, Montreal, August 1995, 454-463.

The following related paper shows how structure in the value function can be exploited in dynamic programming for POMDPs.

C. Boutilier and D. Poole, ``Computing Optimal Policies for Partially Observable Decision Processes using Compact Representations'', Proc. Thirteenth National Conference on Artificial Intelligence (AAAI-96), Portland, Oregon, August 1996.

Work on decision making is also in ICL.

Lifted Inference

to come...

Last updated 2008-01-21 - David Poole, poole@cs.ubc.ca

Probabilistic Research Overview

Contents