Structured methods for solving factored Markov decision processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We present algebraic decision diagrams (ADDs) as efficient data structures for solving very large MDPs. We describe a new value iteration algorithm for exact dynamic programming, using an ADD input representation of the MDP. We extend this algorithm with an approximate version for generating near-optimal value functions and policies with much lower time and space requirements than the exact version. We demonstrate our methods on a class of large MDPs (up to 34 billion states). We show that significant gains can be had with our optimal value iteration algorithm when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions). We then demonstrate our approximate algorithm and compare results with the optimal ones. Finally, we examine various variable reordering techniques and demonstrate their use within the context of our methods.
If you have any questions or comments regarding this page please send mail to email@example.com.