Intelligence 2E

foundations of computational agents

Probability theory is built on the foundation of worlds and
variables.
The variables in probability theory are referred to as random variables. The term *random variable* is somewhat of a
misnomer because it is neither random nor
variable. As discussed in Section 4.1, worlds could be
described in terms of variables; a world is a function that maps each
variable to its value. Alternatively, variables could
be described in terms of worlds; a variable is a function from worlds into
the domain of the variable.

Variables will be written starting with an uppercase letter. Each variable has a domain which is the set of values that the variable can take. A Boolean variable is a variable with domain $\{true,false\}$. We will write the assignment of $true$ to a variable as the lower-case variant of the variable, e.g., $Happy=true$ is written as $happy$ and $Fire=true$ is $fire$. A discrete variable has a domain that is a finite or countable set.

A primitive proposition is an assignment of a value to a variable, or an inequality between variables and values or between variables (e.g., $A=true$, $$ or $Y>Z$). Propositions are built from primitive propositions using logical connectives.

This chapter mainly considers discrete variables with finite domains. The examples will have few variables, but modern applications may have thousands, millions or even billions of variables (or even infinitely many variables). For example, a world could consist of the symptoms, diseases and test results for all of the patients and care providers in a hospital throughout time. The model effectively goes on forever into the future, but we will only ever reason about a finite past and future. We might be able to answer questions about the probability that a patient with a particular combination of symptoms may come into the hospital in the next few years. There are infinitely many worlds whenever some variables have infinite domains or there are infinitely many variables.

We first define a probability over finite sets of worlds with finitely many variables and use this to define the probability of propositions.

A probability measure is a function $P$ from worlds into the non-negative real numbers such that,

$$\sum _{w\in \mathrm{\Omega}}P(w)=1$$ |

where $\mathrm{\Omega}$ is the set of all possible worlds.

The use of 1 as the probability of the set of all of the worlds is just by convention. You could just as well use 100.

The definition of $P$ is extended to cover propositions. The probability of proposition $\alpha $, written $P(\alpha )$, is the sum of the probabilities of possible worlds in which $\alpha $ is true. That is,

$$P(\alpha )=\sum _{\omega :\alpha \text{is true in}\omega}P(\omega ).$$ |

Note that this definition is consistent with the probability of worlds, because if proposition $\alpha $ completely describes a single world, the probability of $\alpha $ and the probability of the world are equal.

Consider the ten worlds of Figure 8.1, with Boolean variable ${F}{\mathit{}}{i}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{e}{\mathit{}}{d}$, and with variable ${S}{\mathit{}}{h}{\mathit{}}{a}{\mathit{}}{p}{\mathit{}}{e}$ with domain ${\mathrm{\{}}{c}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{c}{\mathit{}}{l}{\mathit{}}{e}{\mathrm{,}}{t}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{a}{\mathit{}}{n}{\mathit{}}{g}{\mathit{}}{l}{\mathit{}}{e}{\mathrm{,}}{s}{\mathit{}}{t}{\mathit{}}{a}{\mathit{}}{r}{\mathrm{\}}}$. Each world is defined by its shape, whether it’s filled and its position. Suppose the probability of each of these 10 worlds is ${\mathrm{0.1}}$, and any other worlds have probability 0. Then ${P}{\mathrm{(}}{S}{h}{a}{p}{e}{\mathrm{=}}{c}{i}{r}{c}{l}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.5}}$ and ${P}{\mathrm{(}}{F}{i}{l}{l}{e}{d}{\mathrm{=}}{f}{a}{l}{s}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.4}}$. ${P}{\mathrm{(}}{S}{h}{a}{p}{e}{\mathrm{=}}{c}{i}{r}{c}{l}{e}{\mathrm{\wedge}}{F}{i}{l}{l}{e}{d}{\mathrm{=}}{f}{a}{l}{s}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.1}}$

If $X$ is a random variable, a probability distribution, $P(X)$, over $X$ is a function from the domain of $X$ into the real numbers such that, given a value $x\in domain(X)$, $P(x)$ is the probability of the proposition $X=x$. A probability distribution over a set of variables is a function from the values of those variables into a probability. For example, $P(X,Y)$ is a probability distribution over $X$ and $Y$ such that $P(X=x,Y=y)$, where $x\in domain(X)$ and $y\in domain(Y)$, has the value $P(X=x\wedge Y=y)$, where $X=x\wedge Y=y$ is a proposition and $P$ is the function on propositions defined above. Whether $P$ refers to a function on propositions or a probability distribution should be clear from context.

If ${X}_{1}\mathrm{\dots}{X}_{n}$ are all of the random variables, then an assignment to all of the random variables corresponds to a world, and the probability of the proposition defining a world is equal to the probability of the world. The distribution over all worlds, $P({X}_{1},\mathrm{\dots},{X}_{n})$ is called the joint probability distribution.

Beyond Finitely Many Worlds
The definition of probability given in this chapter works when
there are finitely many worlds.
There are infinitely many worlds when
•
the domain of a variable is infinite, for example the domain of
a variable $\colorbox[rgb]{1,1,0.701960784313725}{$h$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$g$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$h$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$t$}$ might be the set of nonnegative real numbers or
•
there are infinitely many variables, for example, there might
be a variable for the location of a robot for every millisecond from
now infinitely into the future
When there are infinitely many worlds, probability is defined on a measure over sets of
worlds. A probability measure is a nonnegative function $\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}$ from sets of worlds into the
real numbers, that satisfies the axioms: $\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}{\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}\colorbox[rgb]{1,1,0.701960784313725}{$\cup $}{\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}{\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$\cup $}\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}{\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ if ${\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}\colorbox[rgb]{1,1,0.701960784313725}{$\cap $}{\colorbox[rgb]{1,1,0.701960784313725}{$S$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$\{$}\colorbox[rgb]{1,1,0.701960784313725}{$\}$}$, and $\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\Omega}$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$1$}$
where $\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\Omega}$}$ is the set of all worlds. $\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}$ does not have to be
defined over all sets of worlds, just those defined by
logical formulas. The probability of proposition $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$ is
defined by $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$\mu $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\{$}\colorbox[rgb]{1,1,0.701960784313725}{$w$}\colorbox[rgb]{1,1,0.701960784313725}{$:$}\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$\text{is true in}$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$w$}\colorbox[rgb]{1,1,0.701960784313725}{$\}$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$.
Variables with continuous domains typically do not have a probability distribution, because
the probability of a set of worlds can be non-zero even though the
measure of each individual world is 0. For variables with real-valued
domains, a probability density
function, written as $\colorbox[rgb]{1,1,0.701960784313725}{$p$}$, is a function from reals into
non-negative reals that integrates to $\colorbox[rgb]{1,1,0.701960784313725}{$1$}$. The probability that
a real-valued random variable
$\colorbox[rgb]{1,1,0.701960784313725}{$X$}$ has value between $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ and $\colorbox[rgb]{1,1,0.701960784313725}{$b$}$ is given by
$$\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$\le $}\colorbox[rgb]{1,1,0.701960784313725}{$X$}\colorbox[rgb]{1,1,0.701960784313725}{$\le $}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}{\colorbox[rgb]{1,1,0.701960784313725}{$\int $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$a$}}^{\colorbox[rgb]{1,1,0.701960784313725}{$b$}}\colorbox[rgb]{1,1,0.701960784313725}{$p$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$X$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$\text{d}$}\colorbox[rgb]{1,1,0.701960784313725}{$X$}\colorbox[rgb]{1,1,0.701960784313725}{$.$}$$
This allows the probability of any formula about intervals and less
than to be well defined. It is possible that, for every real number $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$, $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$X$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$\le $}\colorbox[rgb]{1,1,0.701960784313725}{$X$}\colorbox[rgb]{1,1,0.701960784313725}{$\le $}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$0$}$.
A parametric distribution is one where the probability or density function
is described by a formula. Although
not all distributions can be described by formulas, all of the ones
that are able to be represented are. Sometimes statisticians use the term
parametric to mean a distribution described using a fixed,
finite number of parameters. A non-parametric distribution is one
where the number of parameters is not fixed. (Oddly, non-parametric
typically means “many parameters”.)
Another common method is to consider only discretizations of
finitely many worlds. For example, only consider height to the
nearest centimeter or micron, and only consider heights up
to some finite number (e.g., a kilometer). Or only consider the
location of the robot for a millennium. While there might be a lot of
worlds, there are only finitely many. A challenge is to define
representations that work for any (fine enough) discretization.