David Poole - Combining Logic and Probability

A general theme of my research is to find natural representations for problems and to exploit the representation for efficient computation. This means that we want a compact representation so that it is easy to map between a problem and a representation of a problem. We then want general methods to exploit the compactness of the representation. In this way we seek to exploit the special features of a problem to make inference quick.

The main problem I consider is: what should an agent do based on its prior knowledge, what it observes about the world, and its values or goals?

Independence Choice Logic

Much of this work is based on the "independent choice logic". This is a semantic framework which allows for independent choices made by various agents, and a logic program to give the consequences of the choices. This is an expansion of Probabilistic Horn abduction to include a richer logic (including negation as failure), and choices by multiple agents. This extends logic programs, Bayesian networks, influence diagrams, MDPs, and game theory representations.

This work can be reached from a number of different starting positions.

Abductive logic programming, but we structure the assumables into disjoint sets (alternatives) and let different agents (e.g., nature, self, adversary) choose which values get selected.
Game theory, where we specify the consequences of the choices by the various agents by a logic program.
Bayesian networks, influence diagrams, where we use rules to specify the conditional probability tables, the value function, and what information will be available when the agent must make a decision.
Dynamical systems where we structure the state in terms of propositions and specify the (stochastic) state transition functions in terms of logic programs.

The main highlights of this work are:

It forms a generalization of the strategic (normal) form of a game (as studied in game theory).
When restricted to just choices by nature (with probabilities over the choices), it forms a generalisation of Bayesian networks (see Probabilistic Horn abduction description) to a richer language (like parametrized Bayesian networks or first-order Bayesian networks), and a natural specification of ``contextual independence''.
It generalises, in a natural way, influence diagrams, Markov decision processes, etc. Moreover the language lets us exploit the propositional nature of the state space (i.e., the decomposition of state space into variables and the contextual independence).
It lets us use arbitrary (acyclic) logic programs to axiomatise the world. This is very flexible, as the language lets us model uncertainty in the choice space (there is no need for disjunction in the language).
It is essentially abductive. The explanations of a proposition form a sound and complete description of all of the worlds in which the proposition is true.
We can exploit the structure for computational gain.

For the best overview see:

David Poole, "The Independent Choice Logic and Beyond", in Luc De Raedt, Paolo Frasconi, Kristian Kersting, and Stephen Muggleton (eds), Probabilistic Inductive Logic Programming: Theory and Application, LNAI 4911, 2008.
D. Poole, ``Logic, Knowledge Representation and Bayesian Decision Theory'', Invited paper, First International Conference on Computational Logic (CL2000), London, July 2000.
D. Poole, ``The Independent Choice Logic for modelling multiple agents under uncertainty'', Artificial Intelligence, 94(1-2), special issue on economic principles of multi-agent systems, pages 7-56, 1997.

There there are also slides (in PDF format) from a talk "The Independent Choice Logic: A pragmatic combination of logic and decision theory", March 1998.

For a more informal overview see:

D. Poole, "Sensing and Acting in the Independent Choice Logic" in Working Notes AAAI Spring Symposium on Extending Theories of Actions: Formal Theory and Practical Applications, Stanford, March, 1995

The following recent paper abstracts the ideas of the ICL to more general programming languages.

David Poole, Probabilistic Programming Languages: Independent Choices and Deterministic Systems, in Heuristics, Probability and Causality: A Tribute to Judea Pearl, edited by R. Dechter, H. Geffner and J.Y. Halpern, College Publications, 2010, pages 253-269.

Abductive Characterization

Apart from the representational advantages, there are computational advantages due to the rule based representation. The rule structure imposes a partitioning of the state space into sets of states that do not need to be considered separately. Put another way, the framework is essentially abductive: the sets of explanations of a proposition provide a concise description of the worlds in which the proposition is true. Thus we can see a unification of abductive logic programming and decision-theoretic representations. See:

D. Poole, ``Abducing Through Negation as Failure: Stable models within the independent choice logic '' to appear, Journal of Logic Programming, 1998.

This leads to a reasonably straightforward implementation based on a backward-chaining abductive engine. You can get the ICL code distribution (a gzipped tar file) that is based on this idea. This contains Prolog code, documentation, and the examples from the various papers. The implementation includes some knowledge-level debugging tools that have been found to be (very) useful in building ICL knowledge bases.

Inference

The work in this area has considered both inference algorithms and representations. The algorithms have been for simpler cases than the representations; this is important as we want to get the foundations for the algorithms solid before applying them to more complicated representations. It is hoped that the algorithms developed for the straightforward (but quite general) case of rule-structured probability models will carry over to the more complicated (but often more restricted) models such as POMDPs. It is important that we understand (and debug) the simple case that will form the foundations for the complicated algorithms.

The rule structure can be exploited to speedup probabilistic inference. The following paper shows how to exploit the contextual independence that can easily be stated in terms of rules. In the worst case this algorithm is as good as the structured Bayesian network evaluation algorithms, but it can be exponentially better:

D. Poole, "Probabilistic Partial Evaluation: Exploiting rule structure in probabilistic inference", Proc. Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), Nagoya, Japan, August 1997, pp. 1284-1291.

This can also be extended to approximate reasoning, where we can make different approximations in different contexts and compute bounds on the resulting posterior probabilities:

D. Poole, "Context-specific approximation in probabilistic inference"', Proc. Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98), Madison, Wisconsin, July 1998.

One of the ways that the rule based structure can be exploited is in structured dynamic programming. Instead of considering all information states, and optimizing for each (e.g., as in an influence diagram), we can use the rule structure to determine structure over the information states (the values for the parents in an influence diagram) - we then only need to consider the structured partitioning of the information space. See:

D. Poole, "Exploiting the Rule Structure for Decision Making within the Independent Choice Logic" in Proceedings of the Eleventh Conference on Uncertainty in AI, Montreal, August 1995.

Similarly we can consider exploiting the rule structure in inference in POMDPs, where we can build structured representations of value functions using essentially regression (as used in planning).

C. Boutilier and D. Poole, ``Computing Optimal Policies for Partially Observable Decision Processes using Compact Representations'', in Proc. Thirteenth National Conference on Artificial Intelligence (AAAI-96), Portland Oregon, August 1996.

Representation

If we want to use the ICL in a planning situation we need to combine it with a model of time. This is done by allowing time to be represented in the logic program. I have been looking at three different cases:

The Situation Calculus in this framework:
- D. Poole ``A Framework for Decision-Theoretic Planning I: Combining the Situation Calculus, Conditional Plans, Probability and Utility'' in Proc. Twelfth Conference on Uncertainty in Artificial Intelligence, Portland Oregon, August 1996.
- D. Poole, ``Decision Theory, the Situation Calculus and Conditional Plans'' under discussion, The Electronic Transactions on Artificial Intelligence.
Discrete time. The simplest, and probably most useful, case is when the system is discrete.
- D. Poole and K. Kanazawa, ``A decision-theoretic abductive basis for planning'', Proc. AAAI Spring Symposium on Decision-Theoretic Planning, Stanford University, March 1994.
- D. Poole, ``The Independent Choice Logic for modelling multiple agents under uncertainty'', Artificial Intelligence, 94(1-2), special issue on economic principles of multi-agent systems, pages 7-56, 1997.
Continuous time. The following paper shows how to model continuous time; the difficult problems are differentiation and integration with respect to time when the continuous system is being controlled at discrete intervals, and modelling at multiple levels of time granularity. Note that this paper and chapter only consider the logic programming aspect, and not the choice space.
- D. Poole, ``Logic Programming for Robot Control'', Proc. 14th International Joint Conference on AI (IJCAI-95), Montreal, Aug 1995, 150-157.
- D. Poole, A. Mackworth, and R. Goebel, Computational Intelligence: A Logical Approach, Oxford University Press, 1998, Chapter 12.

Learning

The independent choiuce logic provides a natural specification language for spefifying learning problems. See:

D. Poole, "Learning, Bayesian Probability, Graphical Models, and Abduction", to appear, Peter Flach and Antonis Kakas, editors, Abduction and Induction: essays on their relation and integration, Kluwer, 1998.
You can also get the slides (in PDF format) from my invited talk at the IJCAI-97 workshop in induction and abduction.

I am currently working on learning ICL theories, combining Inductive Logic Programming and Bayesian network learning.

Applications

We are working on applications on diagnosis, robotics, user modelling...

Building applications is important to ground the work in real problems, so that we don't work on abstractions of problems that don't correspond to any actual problems.

Misrepresentations of the ICL

There have been a number of places where the ICL has been referenced, but where the claims made are false or misleading. Here are a few (please let me know of any more):

"Poole considers only models without recursion." [Laskey, Artificial Intelligence Journal, Vol 172 (2008) 140-178] This is false; ICL allows recursive models. Note that she references ICL, but only talks about parametrized belief networks, which are much more like [Horsch and Poole 1990].
``Bayesian logic programs [compared to ICL theories] have not as many constraints on the representation language, ...., have a richer representation language and their independence assumptions represent the causality of the domain'' [K. Kersting and L. De Raedt, Bayesian logic programming: Theory and tool. In L. Getoor and B. Taskar, editors, An Introduction to Statistical Relational Learning. MIT Press, 2007]. All these claims are false. ICL can represent arbitrary logic programs with no restrictions except the acyclicity restriction that recursions have to be well-founded (none of the textbooks on logic programming I know of contains any logic programs that are not acyclic, although they do have logic programs that contains cuts and input/output that ICL does not handle.). ICL is Turing-complete. [Pearl 2000] defines causality in terms of structural equation models with exogenous noise. The ICL represents this causality directly with the structural equation models represented as logic programs.

Last updated 2008-01-21 - David Poole, poole@cs.ubc.ca