Intelligence 2E

foundations of computational agents

Probability is a measure of belief. Beliefs need to be updated when new evidence is observed.

The measure of belief in proposition $h$ given proposition $e$ is called the conditional probability of $h$ given $e$, written $P(h\mid e)$.

A proposition $e$ representing the conjunction of *all* of the agent’s
observations of the world is called
evidence. Given evidence $e$, the
conditional probability $P(h\mid e)$ is the agent’s posterior probability
of $h$. The probability $P(h)$ is the prior
probability of $h$
and is the same as $P(h\mid true)$ because it is the probability before the
agent has observed anything.

The evidence used for the posterior probability is *everything* the agent observes about a particular
situation. Everything observed, and not just a few select observations, must be conditioned on to obtain the correct
posterior probability.

For the diagnostic assistant, the prior probability distribution over possible diseases is used before the diagnostic agent finds out about the particular patient. Evidence is obtained through discussions with the patient, observing symptoms, and the results of lab tests. Essentially any information that the diagnostic assistant finds out about the patient is evidence. The assistant updates its probability to reflect the new evidence in order to make informed decisions.

The information that the delivery robot receives from its sensors is its evidence. When sensors are noisy, the evidence is what is known, such as the particular pattern received by the sensor, not that there is a person in front of the robot. The robot could be mistaken about what is in the world but it knows what information it received.

Evidence $e$, where $e$ is a proposition, will rule out all possible worlds that are incompatible with $e$. Like the definition of logical consequence, the given proposition $e$ selects the possible worlds in which $e$ is true. As in the definition of probability, we first define the conditional probability over worlds, and then use this to define a probability over propositions.

Evidence $e$ induces a new probability $P(w\mid e)$ of world $w$ given $e$. Any world where $e$ is false has conditional probability $0$, and the remaining worlds are normalized so that the probabilities of the worlds sum to $1$:

$$P(w\mid e)=\{\begin{array}{ccc}\hfill c*P(w)& \hfill \text{if}\hfill & e\text{is true in world}w\hfill \\ \hfill 0& \hfill \text{if}\hfill & e\text{is false in world}w\hfill \end{array}$$ |

where $c$ is a constant (that depends on $e$) that ensures the posterior probability of all worlds sums to 1.

For $P(w\mid e)$ to be a probability measure over worlds for each $e$:

$1$ | $={\displaystyle \sum _{w}}P(w\mid e)$ | ||

$={\displaystyle \sum _{w:e\text{is true in}w}}P(w\mid e)+{\displaystyle \sum _{w:e\text{is false in}w}}P(w\mid e)$ | |||

$={\displaystyle \sum _{w:e\text{is true in}w}}c*P(w)+0$ | |||

$=c*P(e)$ |

Therefore, $c=1/P(e)$. Thus, the conditional probability is only defined if $P(e)>0$. This is reasonable, as if $P(e)=0$, $e$ is impossible.

The conditional probability of proposition $h$ given evidence $e$ is the sum of the conditional probabilities of the possible worlds in which $h$ is true. That is,

$P(h\mid e)=$ | $\sum _{w:h\text{is true in}w}}P(w\mid e)$ | ||

$=$ | $\sum _{w:h\wedge e\text{is true in}w}}P(w\mid e)+{\displaystyle \sum _{w:\mathrm{\neg}h\wedge e\text{is true in}w}}P(w\mid e)$ | ||

$=$ | $\sum _{w:h\wedge e\text{is true in}w}}{\displaystyle \frac{1}{P(e)}}*P(w)+0$ | ||

$=$ | $\frac{P(h\wedge e)}{P(e)}}.$ |

The last form above is typically given as the definition of conditional probability. Here we have derived it as a consequence of a more basic definition.

As in Example 8.2, consider the worlds of Figure 8.1, each with probability 0.1. Given the evidence ${F}{\mathit{}}{i}{\mathit{}}{l}{\mathit{}}{l}{\mathit{}}{e}{\mathit{}}{d}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$, only 4 worlds have a non-zero posterior probability. ${P}{\mathrm{(}}{S}{h}{a}{p}{e}{\mathrm{=}}{c}{i}{r}{c}{l}{e}{\mathrm{\mid}}{F}{i}{l}{l}{e}{d}{\mathrm{=}}{f}{a}{l}{s}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.25}}$ and ${P}{\mathrm{(}}{S}{h}{a}{p}{e}{\mathrm{=}}{s}{t}{a}{r}{\mathrm{\mid}}{F}{i}{l}{l}{e}{d}{\mathrm{=}}{f}{a}{l}{s}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.5}}$.

A conditional probability distribution, written $P(X\mid Y)$ where $X$ and $Y$ are variables or sets of variables, is a function of the variables: given a value $x\in domain(X)$ for $X$ and a value $y\in domain(Y)$ for $Y$, it gives the value $P(X=x\mid Y=y)$, where the latter is the conditional probability of the propositions.

Background Knowledge and Observation
The difference between background knowledge and observation was
described in Section 5.4.1. When reasoning with uncertainty,
the background model is described in terms of a probabilistic model,
and the observations form evidence that must be conditioned on.
Within probability, there are two ways to state that $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ is true:
•
The first is to state that the probability of $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ is $\colorbox[rgb]{1,1,0.701960784313725}{$1$}$ by writing $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$1$}$.
•
The second is to condition on $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$, which involves using $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ on the right-hand
side of the conditional bar, as in $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\cdot $}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$a$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$.
The first method states that $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ is true in all possible worlds. The second
says that the agent is only interested in worlds where $\colorbox[rgb]{1,1,0.701960784313725}{$a$}$ happens to be
true.
Suppose an agent was told about a particular animal:
$\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$
$\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$0.8$}\colorbox[rgb]{1,1,0.701960784313725}{$,$}$
$\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$m$}\colorbox[rgb]{1,1,0.701960784313725}{$u$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$
$\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$1.0$}\colorbox[rgb]{1,1,0.701960784313725}{$,$}$
$\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$m$}\colorbox[rgb]{1,1,0.701960784313725}{$u$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$
$\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$0.001$}\colorbox[rgb]{1,1,0.701960784313725}{$.$}$
If the agent determines the animal is an emu, it cannot add the statement $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$m$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$u$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}\colorbox[rgb]{1,1,0.701960784313725}{$=$}\colorbox[rgb]{1,1,0.701960784313725}{$1$}$. No probability
distribution satisfies these four assertions. If emu were true in all possible worlds, it would not be the case
that in 0.8 of the possible worlds, the individual flies. The agent, instead,
must condition on the fact that the individual is an emu.
To build a probability model, a knowledge base designer takes some knowledge
into consideration and builds a probability model based on this
knowledge. All knowledge acquired subsequently must be treated as
observations that are conditioned on.
Suppose proposition $\colorbox[rgb]{1,1,0.701960784313725}{$k$}$ represents an agent’s observations up to some time. The agent’s subsequent belief states can be modeled
by either of the following:
•
construct a probability model for the agent’s belief before it had
observed $\colorbox[rgb]{1,1,0.701960784313725}{$k$}$ and then condition on the
evidence $\colorbox[rgb]{1,1,0.701960784313725}{$k$}$ conjoined with the subsequent evidence $\colorbox[rgb]{1,1,0.701960784313725}{$e$}$ (i.e, for
each proposition $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$ use $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$\wedge $}\colorbox[rgb]{1,1,0.701960784313725}{$k$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$)
•
construct a probability model, call it ${\colorbox[rgb]{1,1,0.701960784313725}{$P$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$k$}}$, which
models the agent’s beliefs after observing $\colorbox[rgb]{1,1,0.701960784313725}{$k$}$, and then condition on subsequent
evidence $\colorbox[rgb]{1,1,0.701960784313725}{$e$}$ (i.e., use ${\colorbox[rgb]{1,1,0.701960784313725}{$P$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$k$}}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ for proposition $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$).
All subsequent probabilities will be
identical no matter which construction was used. Building ${\colorbox[rgb]{1,1,0.701960784313725}{$P$}}_{\colorbox[rgb]{1,1,0.701960784313725}{$k$}}$ directly
is sometimes easier because the model does not have to cover the cases of when
$\colorbox[rgb]{1,1,0.701960784313725}{$k$}$ is false. Sometimes, however, it is easier to build $\colorbox[rgb]{1,1,0.701960784313725}{$P$}$ and condition on $\colorbox[rgb]{1,1,0.701960784313725}{$k$}$.
What is important is that there is a coherent stage where the
probability model is reasonable and every subsequent observation is
conditioned on.

The definition of conditional probability allows the decomposition of a conjunction into a product of conditional probabilities:

(Chain rule) For any propositions ${\alpha}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\alpha}_{n}$:

$P({\alpha}_{1}\wedge {\alpha}_{2}\wedge \mathrm{\dots}\wedge {\alpha}_{n})=$ | $P({\alpha}_{1})*$ | ||

$P({\alpha}_{2}\mid {\alpha}_{1})*$ | |||

$P({\alpha}_{3}\mid {\alpha}_{1}\wedge {\alpha}_{2})*$ | |||

$\mathrm{\vdots}$ | |||

$P({\alpha}_{n}\mid {\alpha}_{1}\wedge \mathrm{\cdots}\wedge {\alpha}_{n-1})$ | |||

$=$ | $\prod _{i=1}^{n}}P({\alpha}_{i}\mid {\alpha}_{1}\wedge \mathrm{\cdots}\wedge {\alpha}_{i-1}),$ |

where the right-hand side is assumed to be zero if any of the products are zero (even if some of them are undefined).

Note that any theorem about unconditional probabilities can be converted into a theorem about conditional probabilities by adding the same evidence to each probability. This is because the conditional probability measure is a probability measure. For example, case (e) of Proposition 8.2 implies $P(\alpha \vee \beta \mid k)=P(\alpha \mid k)+P(\beta \mid k)-P(\alpha \wedge \beta \mid k)$.

An agent using probability updates its belief when it observes new evidence. A new piece of evidence is conjoined to the old evidence to form the complete set of evidence. Bayes’ rule specifies how an agent should update its belief in a proposition based on a new piece of evidence.

Suppose an agent has a current belief in proposition $h$ based on evidence $k$ already observed, given by $P(h\mid k)$, and subsequently observes $e$. Its new belief in $h$ is $P(h\mid e\wedge k)$. Bayes’ rule tells us how to update the agent’s belief in hypothesis $h$ as new evidence arrives.

(Bayes’ rule) As long as $P\mathrm{(}e\mathrm{\mid}k\mathrm{)}\mathrm{\ne}\mathrm{0}$,

$$P(h\mid e\wedge k)=\frac{P(e\mid h\wedge k)*P(h\mid k)}{P(e\mid k)}.$$ |

This is often written with the background knowledge $k$ implicit. In this case, if $P(e)\ne 0$, then

$$P(h\mid e)=\frac{P(e\mid h)*P(h)}{P(e)}.$$ |

$P(e\mid h)$ is the likelihood and $P(h)$ is the prior probability of the hypothesis $h$. Bayes’ rule states that the posterior probability is proportional to the likelihood times the prior.

The commutativity of conjunction means that $h\wedge e$ is equivalent to $e\wedge h$, and so they have the same probability given $k$. Using the rule for multiplication in two different ways,

$P(h\wedge e\mid k)$ | $=P(h\mid e\wedge k)*P(e\mid k)$ | ||

$=P(e\wedge h\mid k)$ | $=P(e\mid h\wedge k)*P(h\mid k).$ |

The theorem follows from dividing the right-hand sides by $P(e\mid k)$, which is not 0 by assumption. ∎

Often, Bayes’ rule is used to compare various hypotheses (different ${h}_{i}$s). The denominator $P(e\mid k)$ is a constant that does not depend on the particular hypothesis, and so when comparing the relative posterior probabilities of hypotheses, the denominator can be ignored.

To derive the posterior probability, the denominator may be computed by reasoning by cases. If $H$ is an exclusive and covering set of propositions representing all possible hypotheses, then

$P(e\mid k)$ | $={\displaystyle \sum _{h\in H}}P(e\wedge h\mid k)$ | ||

$={\displaystyle \sum _{h\in H}}P(e\mid h\wedge k)*P(h\mid k).$ |

Thus, the denominator of Bayes’ rule is obtained by summing the numerators for all the hypotheses. When the hypothesis space is large, computing the denominator is computationally difficult.

Generally, one of $P(e\mid h\wedge k)$ or $P(h\mid e\wedge k)$ is much easier to estimate than the other. Bayes’ rule is used to compute one from the other.

In medical diagnosis, the doctor observes a patient’s symptoms, and would like to know the likely diseases. Thus the doctor would like ${P}{\mathrm{(}}{D}{i}{s}{e}{a}{s}{e}{\mathrm{\mid}}{S}{y}{m}{p}{t}{o}{m}{s}{\mathrm{)}}$. This is difficult to assess as it depends on the context (e.g., some diseases are more prevalent in hospitals). It is typically more easy to assess ${P}{\mathrm{(}}{S}{y}{m}{t}{o}{m}{s}{\mathrm{\mid}}{D}{i}{s}{e}{a}{s}{e}{\mathrm{)}}$ as how the disease gives rise to the symptoms is typically less context dependent. These two are related by Bayes’ rule, where the prior probability of the disease, ${P}{\mathit{}}{\mathrm{(}}{D}{\mathit{}}{i}{\mathit{}}{s}{\mathit{}}{e}{\mathit{}}{a}{\mathit{}}{s}{\mathit{}}{e}{\mathrm{)}}$, reflects the context.

The diagnostic assistant may need to know whether the light switch ${{s}}_{{\mathrm{1}}}$ of Figure 1.8 is broken or not. You would expect that the electrician who installed the light switch in the past would not know if it is broken now, but would be able to specify how the output of a switch is a function of whether there is power coming into the switch, the switch position, and the status of the switch (whether it is working, shorted, installed upside-down, etc.). The prior probability for the switch being broken depends on the maker of the switch and how old it is. Bayes’ rule lets an agent infer the status of the switch given the prior and the evidence.

Suppose an agent has information about the reliability of fire alarms. It may know how likely it is that an alarm will work if there is a fire. To determine the probability that there is a fire, given that there is an alarm, Bayes’ rule gives:

${P}{(}{f}{i}{r}{e}{\mid}{a}{l}{a}{r}{m}{)}$ | ${=}{\displaystyle \frac{{P}{(}{a}{l}{a}{r}{m}{\mid}{f}{i}{r}{e}{)}{*}{P}{(}{f}{i}{r}{e}{)}}{{P}{}{(}{a}{}{l}{}{a}{}{r}{}{m}{)}}}$ | ||

${=}{\displaystyle \frac{{P}{(}{a}{l}{a}{r}{m}{\mid}{f}{i}{r}{e}{)}{*}{P}{(}{f}{i}{r}{e}{)}}{{P}{(}{a}{l}{a}{r}{m}{\mid}{f}{i}{r}{e}{)}{*}{P}{(}{f}{i}{r}{e}{)}{+}{P}{(}{a}{l}{a}{r}{m}{\mid}{\mathrm{\neg}}{f}{i}{r}{e}{)}{*}{P}{(}{\mathrm{\neg}}{f}{i}{r}{e}{)}}}$ |

where ${P}{\mathrm{(}}{a}{l}{a}{r}{m}{\mathrm{\mid}}{f}{i}{r}{e}{\mathrm{)}}$ is the probability that the alarm worked, assuming that there was a fire. It is a measure of the alarm’s reliability. The expression ${P}{\mathit{}}{\mathrm{(}}{f}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{)}}$ is the probability of a fire given no other information. It is a measure of how fire-prone the building is. ${P}{\mathit{}}{\mathrm{(}}{a}{\mathit{}}{l}{\mathit{}}{a}{\mathit{}}{r}{\mathit{}}{m}{\mathrm{)}}$ is the probability of the alarm sounding, given no other information. ${P}{\mathrm{(}}{f}{i}{r}{e}{\mathrm{\mid}}{a}{l}{a}{r}{m}{\mathrm{)}}$ is more difficult to directly represent because it depends, for example, on how much vandalism there is in the neighborhood.

Other Possible Measures of Belief
Justifying other measures of belief is problematic. Consider, for example,
the proposal that the belief in $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$\wedge $}\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}$ is some function of the
belief in $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$ and the belief in $\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}$. Such a measure of belief is
called compositional. To see why this is not sensible, consider the single toss of
a fair coin. Compare the case where $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$ is “the coin will land
heads”, ${\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}$ is “the coin will land tails” and ${\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}$ is “the coin will land
heads.” The belief in ${\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}$ would be
the same as the belief in ${\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}$. But the belief in $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$\wedge $}{\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$1$}}$,
which is impossible, is very different from the belief in $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}\colorbox[rgb]{1,1,0.701960784313725}{$\wedge $}{\colorbox[rgb]{1,1,0.701960784313725}{$\beta $}}_{\colorbox[rgb]{1,1,0.701960784313725}{$2$}}$, which is the same as $\colorbox[rgb]{1,1,0.701960784313725}{$\alpha $}$.
The conditional probability $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ is very different
from the probability of the implication $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$\to $}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$. The
latter is the same as $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\neg}$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$\vee $}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$, which is the measure of the
interpretations for which $\colorbox[rgb]{1,1,0.701960784313725}{$f$}$ is true or $\colorbox[rgb]{1,1,0.701960784313725}{$e$}$ is false. For example,
suppose there is a domain where birds are relatively rare, and
non-flying birds are a small proportion of the birds. Here $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\neg}$}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$\mid $}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ would be the proportion of birds that do not fly, which
would be low. $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$\to $}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\neg}$}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ is the same as $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\neg}$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$\vee $}\colorbox[rgb]{1,1,0.701960784313725}{$\mathrm{\neg}$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$, which would be dominated by non-birds and so
would be high. Similarly, $\colorbox[rgb]{1,1,0.701960784313725}{$P$}\colorbox[rgb]{1,1,0.701960784313725}{$($}\colorbox[rgb]{1,1,0.701960784313725}{$b$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$r$}\colorbox[rgb]{1,1,0.701960784313725}{$d$}\colorbox[rgb]{1,1,0.701960784313725}{$\to $}\colorbox[rgb]{1,1,0.701960784313725}{$f$}\colorbox[rgb]{1,1,0.701960784313725}{$l$}\colorbox[rgb]{1,1,0.701960784313725}{$i$}\colorbox[rgb]{1,1,0.701960784313725}{$e$}\colorbox[rgb]{1,1,0.701960784313725}{$s$}\colorbox[rgb]{1,1,0.701960784313725}{$)$}$ would also be
high, the probability also being dominated by the non-birds. It is difficult to
imagine a situation where the probability of an implication is
the kind of knowledge that is appropriate or useful.