|
|||
The BC3: British Columbia Conversation CorpusThe First Publicly Available Annotated Corpus for Email SummarizationThe corpus consists of 40 email threads / 3222 sentences from the W3C corpus. Each thread has been annotated by three different annotators. The annotation consists of the following:
If you use the BC3 corpus please cite the following paper:Ulrich J., Murray G., Carenini G., A Publicly Available Annotated Corpus for Supervised Email Summarization AAAI08 EMAIL Workshop, Chicago, USA, 2008. [pdf] [bib]
Papers using BC3 corpus:Minwoo Jeong, Chin-Yew Lin and Gary Geunbae Lee, Semi-Supervised Speech Act Recognition in Emails and Forums, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1250–1259, Singapore, 6-7 August 2009. [pdf] [bib]
The BC3 Annotation SoftwareAn open-source tool for annotating email thread or other conversationsThe BC3 corpus was annotated using a web-based annotation framework. This framework is open-sourced and is available for download for conversation annotation.The framework is built with Ruby on Rails and a MySQL database so that a webserver can be set up that lets researchers import and manage an email corpus. It also lets users annotate emails threads for summaries and label email features.
The BC3 Corpus is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. The BC3 framework is licensed under the MIT license. If you have any questions or comments, please contact one of the following team members: Previous Team members include:
|
|||
|