UBC The Laboratory for Computational Intelligence
Home | About | People | Research Projects | Publications | Events & Seminars | Reading Groups

The BC3: British Columbia Conversation Corpus

The First Publicly Available Annotated Corpus for Email Summarization


The corpus consists of 40 email threads / 3222 sentences from the W3C corpus. Each thread has been annotated by three different annotators. The annotation consists of the following:
  • Extractive Summaries
  • Abstractive Summaries with linked sentences
  • Labeled Sentences with the following labels
    • Speech Acts: Propose, Request, Commit, Meeting
    • Meta Sentences
    • Subjectivity

If you use the BC3 corpus please cite the following paper:


Ulrich J., Murray G., Carenini G., A Publicly Available Annotated Corpus for Supervised Email Summarization AAAI08 EMAIL Workshop, Chicago, USA, 2008. [pdf] [bib]

Papers using BC3 corpus:


Minwoo Jeong, Chin-Yew Lin and Gary Geunbae Lee, Semi-Supervised Speech Act Recognition in Emails and Forums, Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 1250–1259, Singapore, 6-7 August 2009. [pdf] [bib]


Murray G. and Carenini G., Predicting Subjectivity in Multimodal Conversations. Empirical Methods in NLP (EMNLP 2009), Singapore, 2009. [pdf] [bib]


Jan Ulrich, Giuseppe Carenini, Gabriel Murray, Raymond Ng. Regression-Based Summarization of Email Conversations. 3rd Int'l AAAI Conference on Weblogs and Social Media (ICWSM-09), San Jose, CA. [pdf] [bib]


Murray G. and Carenini G., Summarizing Spoken and Written Conversations. Empirical Methods in NLP (EMNLP 2008), Waikiki, Hawaii, 2008. [pdf] [bib]

The BC3 Annotation Software

An open-source tool for annotating email thread or other conversations

The BC3 corpus was annotated using a web-based annotation framework. This framework is open-sourced and is available for download for conversation annotation.
The framework is built with Ruby on Rails and a MySQL database so that a webserver can be set up that lets researchers import and manage an email corpus. It also lets users annotate emails threads for summaries and label email features.


Creative Commons License
The BC3 Corpus is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

The BC3 framework is licensed under the MIT license.

If you have any questions or comments, please contact one of the following team members:

Previous Team members include: