Software and Data

These software listed here are open source, licensed under the GNU General Public License. Note that this is the full GPL, which allows many free uses but does not allow its incorporation (even in part or in translation) into any type of proprietary software which you distribute.

BC3 Corpus

The corpus consists of 40 email threads (3222 sentences) from the W3C corpus. Each thread has been annotated by three different annotators. The annotation consists of the following:

  • Extractive Summaries
  • Abstractive Summaries with linked sentences
  • Sentences labeled with Speech Acts and Subjectivity
Recently, topics are also annotated for this email corpora as well as for 20 blog conversations.

Topic Segmentation and Labelling

More information on Topic Segmentation and Labeling.

Discourse Parser

This parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Con- ditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing.

Document-level Discourse Parser:  Demo  
For the latest version of the Discourse Parser see QCRI's Source Code Page  
More information on Discourse Parser.

Abstractive Meeting Summarization

An automatic abstractive summarization system of meeting conversations.

More information on abstrctive meeting summarization: Paper pubished in INLG 2014, Masters thesis

ConVis: Visual text analytic system for Asyncrhonous conversations

More information about this project.