Extrinsic Evaluation of Speech Summarization: A Decision Audit Task

By Gabriel Murray

In much of the research on automatic summarization, intrinsic measures are used to evaluate summary quality, e.g. by measuring n-gram overlap between a machine summary and multiple reference summaries. While such measures are very useful for development purposes, we ultimately want to know how our automatically generated summaries aid a real-world task. We have designed and conducted a large-scale extrinsic evaluation of summarization in the meetings domain. The particular task is a decision audit, in which a participant must review the decision-making process of a group carrying a discussion across several meetings. This task involves a complex information need, as the participant must determine the reasoning leading up to a given decision by the group. Using this evaluation framework, we compare the performance of extractive and abstractive summarization approaches for the challenge of summarizing meetings.

