Intelligence 2E

foundations of computational agents

Instead of creating a sample and then rejecting it, it is possible to mix sampling with inference to reason about the probability that a sample would be rejected. In importance sampling methods, each sample has a weight, and the sample average is computed using the weighted average of samples. Likelihood weighting is a form of importance sampling where the variables are sampled in the order defined by a belief network, and evidence is used to update the weights. The weights reflect the probability that a sample would not be rejected.

Consider the belief network of Figure 8.3. In this ${P}{\mathit{}}{\mathrm{(}}{f}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.01}}$, ${P}{\mathrm{(}}{s}{m}{o}{k}{e}{\mathrm{\mid}}{f}{i}{r}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.9}}$ and ${P}{\mathrm{(}}{s}{m}{o}{k}{e}{\mathrm{\mid}}{\mathrm{\neg}}{f}{i}{r}{e}{\mathrm{)}}{\mathrm{=}}{\mathrm{0.01}}$. Suppose ${S}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ is observed, and another descendant of ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}$ is queried.

Starting with 1000 samples, approximately 10 will have ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$, and the other 990 samples will have ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$. In rejection sampling, of the 990 with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$, 1%, which is approximately 10, will have ${S}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ and so will not be rejected. The remaining 980 samples will be rejected. Of the 10 with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$, about 9 will not be rejected. Thus about 98% of the samples are rejected.

Instead of rejecting so many samples, the samples with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{t}{\mathit{}}{r}{\mathit{}}{u}{\mathit{}}{e}$ are weighted by ${\mathrm{0.9}}$ and the samples with ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}{\mathrm{=}}{f}{\mathit{}}{a}{\mathit{}}{l}{\mathit{}}{s}{\mathit{}}{e}$ are weighted with ${\mathrm{0.01}}$. This potentially give a much better estimate of any of the probabilities that use these samples.

Figure 8.30 shows the details of the likelihood weighting for computing $P(Q\mid e)$ for query variable $Q$ and evidence $e$. The $for$ loop (from line 15) creates a sample containing a value for all of the variables. Each observed variable changes the weight of the sample by multiplying by the probability of the observed value given the assignment of the parents in the sample. The variables not observed are sampled according the probability of the variable given its parents in the sample. Note that the variables are sampled in an order to ensure that the parents of a variable have been assigned in the sample before the variable is selected.

To extract the distribution of the query variable $Q$, the algorithm maintain an array $counts$, such that $counts[v]$ is the sum of the weights of the samples where $Q=v$. This algorithm can also be adapted to the case where the query is some complicated condition on the values; we just have to count the cases where the condition is true and those where the condition is false.

Suppose we want to use likelihood weighting to compute ${P}{\mathrm{(}}{T}{a}{m}{p}{e}{r}{i}{n}{g}{\mathrm{\mid}}{s}{m}{o}{k}{e}{\mathrm{\wedge}}{\mathrm{\neg}}{r}{e}{p}{o}{r}{t}{\mathrm{)}}$.

The following table gives a few samples. In this table, ${s}$ is the sample; ${e}$ is ${\mathrm{\neg}}{\mathit{}}{s}{\mathit{}}{m}{\mathit{}}{o}{\mathit{}}{k}{\mathit{}}{e}{\mathrm{\wedge}}{r}{\mathit{}}{e}{\mathit{}}{p}{\mathit{}}{o}{\mathit{}}{r}{\mathit{}}{t}$. The weight is ${P}{\mathrm{(}}{e}{\mathrm{\mid}}{s}{\mathrm{)}}$, which is equal to ${P}{\mathrm{(}}{s}{m}{o}{k}{e}{\mathrm{\mid}}{F}{i}{r}{e}{\mathrm{)}}{\mathrm{*}}{P}{\mathrm{(}}{\mathrm{\neg}}{r}{e}{p}{o}{r}{t}{\mathrm{\mid}}{L}{e}{a}{v}{i}{n}{g}{\mathrm{)}}$, where the value for ${F}{\mathit{}}{i}{\mathit{}}{r}{\mathit{}}{e}$ and ${L}{\mathit{}}{e}{\mathit{}}{a}{\mathit{}}{v}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}$ are from the sample.

${T}{}{a}{}{m}{}{p}{}{e}{}{r}{}{i}{}{n}{}{g}$ | ${F}{}{i}{}{r}{}{e}$ | ${A}{}{l}{}{a}{}{r}{}{m}$ | ${S}{}{m}{}{o}{}{k}{}{e}$ | ${L}{}{e}{}{a}{}{v}{}{i}{}{n}{}{g}$ | ${R}{}{e}{}{p}{}{o}{}{r}{}{t}$ | weight |
---|---|---|---|---|---|---|

${f}{}{a}{}{l}{}{s}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${0.9}{*}{0.25}{=}{0.225}$ |

${t}{}{r}{}{u}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${0.9}{*}{0.99}{=}{0.891}$ |

${f}{}{a}{}{l}{}{s}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${0.01}{*}{0.25}{=}{0.0025}$ |

${f}{}{a}{}{l}{}{s}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${t}{}{r}{}{u}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${f}{}{a}{}{l}{}{s}{}{e}$ | ${0.9}{*}{0.99}{=}{0.891}$ |

${P}{(}{t}{a}{m}{p}{e}{r}{i}{n}{g}{\mid}{\mathrm{\neg}}{s}{m}{o}{k}{e}{\wedge}{r}{e}{p}{o}{r}{t}{)}$ is estimated from the weighted proportion of the samples that have ${T}{\mathit{}}{a}{\mathit{}}{m}{\mathit{}}{p}{\mathit{}}{e}{\mathit{}}{r}{\mathit{}}{i}{\mathit{}}{n}{\mathit{}}{g}$ true.