9 Planning with Uncertainty

9.3 Sequential Decisions

Dimensions: flat, features, finite horizon, partially observable, stochastic, utility, non-learning, single agent, offline, perfect rationality

Generally, agents do not make decisions in the dark without observing something about the world, nor do they make just a single decision. A more typical scenario is that the agent makes an observation, decides on an action, carries out that action, makes observations in the resulting world, then makes another decision conditioned on the observations, and so on. Subsequent actions can depend on what is observed, and what is observed can depend on previous actions. In this scenario, it is often the case that the sole reason for carrying out an action is to provide information for future actions. Actions that are carried out to just acquire information are called information seeking actions. Such actions are only ever needed in partially observable environments. The formalism does not need to distinguish information seeking actions from other actions. Typically actions will have both information outcomes as well as effects on the world.

A sequential decision problem models

  • what actions are available to the agent at each stage

  • what information is, or will be, available to the agent when it has to act

  • the effects of the actions and

  • the desirability of these effects.

Example 9.12.

Consider a simple case of diagnosis where a doctor first chooses some tests and then treats a patient, taking into account the outcome of the tests. The reason the doctor may decide to do a test is so that some information (the test results) will be available at the next stage when treatment may be performed. The test results will be information that is available when the treatment is decided, but not when the test is decided. It is often a good idea to test, even if testing itself may harm the patient.

The actions available are the possible tests and the possible treatments. When the test decision is made, the information available will be the symptoms exhibited by the patient. When the treatment decision is made, the information available will be the patient’s symptoms, what tests were performed, and the test results. The effect of the test is the test result, which depends on what test was performed and what is wrong with the patient. The effect of the treatment is some function of the treatment and what is wrong with the patient. The utility may include, for example, costs of tests and treatments, the pain and inconvenience to the patient in the short term, and the long-term prognosis.