Explaining Automated Policies for Sequential Decision Making

By Pascal Poupart

In many situations, a sequence of decisions must be taken by an individual or system (e.g., course selection by students, inspection of parts for testing in a factory, etc.). However, deciding on a course of action is notoriously difficult when there is uncertainty in the effects of the actions and the objectives are complex. Markov decision processes (MDPs) provide a principled approach for automated planning under uncertainty. While the beauty of an automated approach is that the computational power of machines can be harnessed to optimize difficult sequential decision making tasks, the drawback is that users no longer understand why certain actions are recommended. This lack of understanding is a serious bottleneck that is currently holding back the widespread use of automated tools such as MDPs in recommender systems. Hence, there is a need for explanations that enhance the user's understanding and trust of these recommendations.

In this talk, I will present a generic technique to explain policies in arbitrary domains where the sequential decision making problem is formulated as a factored Markov decision process. The explanations consist of template sentences that are filled with relevant information to justify why some action was recommended in a given state. I will describe a mechanism to determine a minimal set of templates that are sufficient to completely justify the action choice. The approach will be demonstrated and evaluated with a user study in the context of advising undergraduate students in their course selection.

Joint work with Omar Zia Khan and James Black

Reference: Minimal Sufficient Explanations for Factored Markov Decision Processes. Omar Zia Khan, Pascal Poupart and James Black. International Conference on Automated Planning and Scheduling (ICAPS), Thessaloniki, Greece, 2009.

Pascal Poupart is an Associate Professor in the David R. Cheriton School of Computer Science at the University of Waterloo, Waterloo (Canada). He received the B.Sc. in Mathematics and Computer Science at McGill University, Montreal (Canada) in 1998, the M.Sc. in Computer Science at the University of British Columbia, Vancouver (Canada) in 2000 and the Ph.D. in Computer Science at the University of Toronto, Toronto (Canada) in 2005. His research focuses on the development of algorithms for reasoning under uncertainty and machine learning with application to Assistive Technologies, Natural Language Processing and Information Retrieval. He is most well known for his contributions to the development of approximate scalable algorithms for partially observable Markov decision processes (POMDPs) and their applications in real-world problems, including automated prompting for people with dementia for the task of handwashing and spoken dialog management. Other notable projects that his research team are currently working on include a smart walker to assist older people and a wearable sensor system to assess and monitor the symptoms of Alzheimer's disease.

Pascal Poupart received the Early Researcher Award, a competitive honor for top Ontario researchers, awarded by the Ontario Ministry of Research and Innovation in 2008. He was also a co-recipient of the Best Paper Award Runner Up at the 2008 Conference on Uncertainty in Artificial Intelligence (UAI) and the IAPR Best Paper Award at the 2007 International Conference on Computer Vision Systems (ICVS). He is a member of the editorial board of the Journal of Artificial Intelligence Research (JAIR) and the Journal of Machine Learning Research (JMLR). His research collaborators include Google, Intel, AideRSS, the Alzheimer Association, the UW-Schlegel Research Institute for Aging, Sunnybrook Health Science Centre, the Toronto Rehabilitation Institute and the Intelligent Assistive Technology and Systems Laboratory at the University of Toronto.  

Visit the LCI Forum page