Extrinsic Evaluation of Speech Summarization: A Decision Audit Task

By Gabriel Murray

In much of the research on automatic summarization, intrinsic measures are used to evaluate summary quality, e.g. by measuring n-gram overlap between a machine summary and multiple reference summaries. While such measures are very useful for development purposes, we ultimately want to know how our automatically generated summaries aid a real-world task. We have designed and conducted a large-scale extrinsic evaluation of summarization in the meetings domain. The particular task is a decision audit, in which a participant must review the decision-making process of a group carrying a discussion across several meetings. This task involves a complex information need, as the participant must determine the reasoning leading up to a given decision by the group. Using this evaluation framework, we compare the performance of extractive and abstractive summarization approaches for the challenge of summarizing meetings.

Go to the LCI Forum page