Software and Data
Unleashing the Power of Neural Discourse Parsers - A Context and Structure Aware Approach Using Large Scale Pretraining (COLING 2020)
We investigate the benefits of large-scale language models and silver-standard tree pretraining for RST discourse parsing. Our results indicate that both additions significanly improve the parsing results.
We enhance a neural topic segmenter based on a hierarchical attention BiLSTM network to better model context, by adding a coherence-related auxiliary task and restricted self-attention.
We organize the current redundancy reduction methods into categories based on when and how the redundancy is considered, and propose three new methods balancing non-redundancy and importance in a general and flexible way.
We propose an approach for classifying document coherence on Grammarly Corpus of Discourse Coherence using silver-standard RST discourse trees.
MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision (EMNLP 2020)
We present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets, creating MEGA-DT, a new large-scale discourse-annotated corpus.
Towards Domain-Independent Text Structuring Trainable on Large Discourse Treebanks (Findings of EMNLP 2020)
We propose a pretraining approach for learning content structuring/ordering for long-document neural NLG. The results indicate that our approach learns getter groupings of the semantically relevant content than the pointer-based baseline.
Do We Really Need That Many Parameters In Transformer For Extractive Summarization? Discourse Can Help ! (CODI workshop at EMNLP 2020)
We incorporate the discourse information to the attention module in the neural extractive summarization model, to reduce the size of the model while keeping a competitive performance.
We perform experiments on incorporating coreference resolution information into RST discourse parser.
The software listed below are open source, licensed under the GNU General Public License. Note that this is the full GPL, which allows many free uses but does not allow its incorporation (even in part or in translation) into any type of proprietary software which you distribute.
The corpus consists of 40 email threads (3222 sentences) from the W3C corpus. Each thread has been annotated by three different annotators. The annotation consists of the following:
This parser builds a discourse tree by applying an optimal parsing algorithm to probabilities inferred from two Con- ditional Random Fields: one for intra-sentential parsing and the other for multi-sentential parsing.Document-level Discourse Parser: Demo For the latest version of the Discourse Parser see QCRI's Source Code Page More information on Discourse Parser.
An automatic abstractive summarization system of meeting conversations.Paper pubished in INLG 2014, Masters thesis More information about this project.