Extrinsic Evaluation of Speech Summarization: A Decision Audit Task
By Gabriel Murray
In much of the research on automatic summarization, intrinsic measures are used
to evaluate summary quality, e.g. by measuring n-gram overlap between a
machine summary and multiple reference summaries. While such measures are very
useful for development purposes, we ultimately want to know how our
automatically generated summaries aid a real-world task. We have designed and
conducted a large-scale extrinsic evaluation of summarization in the meetings
domain. The particular task is a decision audit, in which a participant must
review the decision-making process of a group carrying a discussion across
several meetings. This task involves a complex information need, as the
participant must determine the reasoning leading up to a given decision by the
group. Using this evaluation framework, we compare the performance of extractive
and abstractive summarization approaches for the challenge of summarizing
meetings.