MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision (EMNLP 2020)

Abstract

The lack of large and diverse discourse treebanks hinders the application of data-driven approaches, such as deep-learning, to RST-style discourse parsing. In this work, we present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment-annotated datasets, creating and publishing MEGA-DT, a new large-scale discourse-annotated corpus. Our approach generates discourse trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient heuristic beam-search strategy, extended with a stochastic component. Experiments on multiple datasets indicate that a discourse parser trained on our MEGA-DT treebank delivers promising inter-domain performance gains when compared to parsers trained on human-annotated discourse corpora.

Source code

Github

Paper

ACL Anthology, ArXiv

Dataset

License Agreement

SOFTWARE/DATASET LICENSE AGREEMENT

IMPORTANT! The Software/Dataset you seek to use is licensed only on the condition that YOU agree with The University of British Columbia to the terms and conditions set forth below.

PLEASE CAREFULLY READ THE TERMS OF THIS SOFTWARE LICENSE AGREEMENT.

If you do not agree to the terms of this agreement, delete and do not use the Software/Dataset.

1) License to use the UBC Software. The MEGA-DT software/dataset (the “Software”) you seek to use is licensed only on the condition that you ("YOU") agree with The University of British Columbia, a corporation continued under the University Act of British Columbia (“UBC”) with offices at:

103 – 6190 Agronomy Road,
Vancouver, British Columbia,
V6T 1Z3

to the terms and conditions set forth below. UBC grants to YOU a non-exclusive, non-transferable, non-sublicensable right to use the Software on a single computer at a single location and on the terms and conditions set out in this Agreement, for internal trial and evaluation purposes only. YOU shall not use the Software for any other purposes including without limitation for any commercial purposes whatsoever. UBC specifically retains the right to grant licenses of the Software to other persons. If YOU are interested in using the Software for any commercial purposes, please contact UBC’s University-Industry-Liaison Office.

2) Representation of Authority. YOU represent and warrant to UBC that YOU possess the legal authority to enter into this Agreement, and that YOU will be financially responsible for your use of the Software. You agree to be responsible for all license fees, costs, charges and taxes arising out of your use of the Software. YOU also acknowledge that YOU are solely responsible for supplying any hardware necessary to use the Software pursuant to this Agreement.

3) Confidential Information. YOU agree that the Software and any and all documentation, knowledge, know-how and/or techniques relating to the Software, is and will remain the sole and absolute property of UBC. YOU acknowledge that all documentation, trade-marks, trade names, inventions, discoveries, improvements, software, dataset, copyright, know-how or other intellectual property, whether or not patentable or copyrightable, created by UBC prior to, during, or after the termination of, this license, pertaining to the Software are and will remain the sole and absolute property of UBC. Furthermore, YOU agree that the Software (including any and all confidential information, documentation, or computer code which UBC may at any time disclose to YOU relating to the Software) is the proprietary and confidential information of UBC (the “Confidential Information”) and YOU will not disclose any Confidential Information, directly or indirectly, to any third parties, or allow any third party to use the Software or Confidential Information, and YOU shall use the Confidential information solely in accordance with the provisions of this Agreement. UBC retains all rights, title and interest in and to the Software.

4) Use of Third Party Code. The Software may use or incorporate certain third party code libraries which UBC has obtained under various licenses or permissions. Information on the libraries, and where applicable, source code to the libraries, may be obtained as indicated in the Appendix. Any licenses, notices or statements with respect to the 3rd party code apply to those specific libraries only and do not apply to the Software.

5) No Warranty. YOU further acknowledge and agree that the Software is experimental in nature and is provided to YOU on an “as is” basis and for internal evaluation purposes only. UBC has no obligation to provide any services, modifications, upgrades, updates, or replacements relating to the Software. YOU acknowledge and agree that UBC makes no representations and extends no warranties of any kind, either express or implied. There are no express or implied warranties of merchantability or fitness of the Software for a particular purpose, or that the use of the Software will not infringe any patent, copyright, trademark or other rights, or any other express or implied representations or warranties. In particular, nothing in this Agreement is or will be construed as a warranty or representation by UBC as to the validity or scope of copyright or other intellectual property rights in the Software; an obligation to furnish any other software, technology, or technological information; an obligation on UBC to correct malfunctions that arise in the version of the Software that YOU receive. UBC does not warrant that the software is free from malfunction, nor that any malfunctions can or will be corrected or that UBC will develop or provide to you any operations, capabilities or features not present in the version of the software that you receive.

6) Limitation of liability. You agree that in no event shall UBC be liable to YOU or any third party for any indirect, consequential, incidental, punitive or special damages whatsoever, without regard to cause or theory of liability, or any damages (whether direct or indirect) incurred for loss of business, profits or revenue, loss of privacy, loss of use of any device or software (including but not limited to the Software), costs of procuring substitute or replacement goods and services, business interruption, loss of business information or other pecuniary loss arising out of this Agreement or the Software provided hereunder, even if UBC has been advised of the possibility of such damages. No member of UBC, its board of governors, officers, employees, faculty, students, staff or agents will be liable for any unauthorized access to, or any corruption, erasure, theft, destruction, alteration, inadvertent discourse or loss of data, information or content transmitted, received or stored by or in connection with the Software regardless of the cause. UBC’s total liability, whether under the express or implied terms of the Agreement, in torrt (including negligence), or at common law, for any loss or damage suffered by the licensee whether direct, indirect or special, or any other similar or like damage that may arise from any breaches of this agreement by UBC, its board of governors, officers, employees, faculty, students, staff or agents, is strictly limited to five canadian dollars (CA $5.00).

7) Restrictions of Use. YOU SHALL NOT and will NOT authorize any third party to: Make copies of the Software, other than a single backup copy, and any such copy together with the original must be kept in YOUR possession or control. YOU shall reproduce and include the copyright notice of UBC on any backup copy; Reverse engineer, reproduce, derive source code, modify, improve, adapt, translate, decompile, disassemble, copy, translate into another computer language, create data or executable programs which mimic data or functionality in the Software, and/or create derivative works from the Software, in whole or in part Distribute, sell, resell, lease, transfer, loan, assign, trade, rent, publish or otherwise transfer the Software or any part thereof and/or copies thereof, to others; License or sublicense the use of the Software to others without the written permission of UBC; Use, without its express permission, the name of UBC or any trademark or logo of UBC in advertising, publicity, or otherwise; Use the Software, or permit use of the Software, or make the Software or any portion of it, in any form, available for use on the Internet, in a network, multi-user arrangement, remote access arrangement, including without limitation in circumstances where the Software could be downloaded by multiple users; Remove, disable or circumvent any security protections, proprietary notices or labels contained on or within the Software; and Export or re-export the Software or any copy or adaptation, whether in violation of any applicable laws or regulations or otherwise.

8) Indemnification. You agree to indemnify, defend and hold harmless UBC, its board of governors, officers, employees, faculty, students, staff or agents from and against any and all liability, loss, damage, action, claim or expense (including attorney’s fees and costs at trial and appellate levels) in connection with any claim, suit, action, demand or judgement arising out of, connected with, resolution from, or sustained as a result of your use of the Software or in executing and performing this Agreement.

9) Termination. YOU may terminate the license at any time by ceasing all use of the Software and destroying or deleting the Software (including the related documentation), together with all copies in any form. UBC may terminate this license immediately, and this license shall be deemed to have automatically terminated, if YOU breach or fail to comply with any term or condition of this Agreement. Upon any termination, including termination by YOU, YOU must destroy or return to UBC the Software (including the related documentation or materials), together with all copies in any form, and YOU will have no further right to use the Software. Article 3 (Confidential Information), Article 5 (No Warranty), Article 6 (Limitation of Liability), Article 7 (Restrictions of Use), Article 8 (Indemnification) and Article 9 (Termination) of this Agreement will however survive any termination of this Agreement.

10) Governing law. You agree this agreement shall be governed by, interpreted and construed in accordance with, the laws of the Province of British Comlumbia, and where applicable, the laws of Canada, without regard to any conflict of laws principles that would result in the application of laws of any other jurisdiction. You agree that by accepting the terms of this agreement and using the Software you have attorned to the exclusive jurisdiction of the supreme court of British Columbia. The parties agree that the British Columbia supreme court has exclusive jurisdiction over this agreement.

11) Miscellaneous. No modification of this Agreement will be binding on the parties, unless in writing and signed by an authorized representative of each party. Should any provision of this Agreement be declared invalid or unenforceable, then such provision shall be deemed severable from this Agreement and shall not affect the validity or enforceability of the remaining provisions hereof. All rights in the Software not specifically granted in this Agreement are reserved by UBC. YOU may not assign or transfer this Agreement (by merger, operation of law or in any other manner) without the prior written consent of UBC and any attempt to do so without such consent shall be void, with no legal force and effect, and shall constitute a material breach of this Agreement by YOU. YOU acknowledge that YOU have read this Agreement, understand it, and that by using the Software YOU agree to be bound by its terms and conditions. YOU further agree that it is the complete and exclusive statement of the agreement between UBC and YOU, and supersedes any proposal or prior agreement, oral or written, and any other communication between UBC and YOU relating to the subject matter of this Agreement.

APPENDIX

THIRD PARTY LIBRARIES

PyTorch, Copyright (c) 2016-Facebook, Inc (Adam Paszke) obtained under the BSD licence, found here: https://github.com/pytorch/pytorch/blob/master/LICENSE

Two-Stage Parser, Copyright (c) 2019 Yizhong Wang, found here: https://github.com/yizhongw/StageDP

MILNet, Copyright (c) Stefanos Angelidis, found here: https://github.com/stangelid/oposum

Updates

Citation(s)

  • If you use our dataset, code or any parts thereof, please cite this paper:

    @inproceedings{huber-carenini-2020-mega,
    title = "{MEGA} {RST} Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision",
    author = "Huber, Patrick and Carenini, Giuseppe",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.603",
    pages = "7442--7457",
    abstract = "The lack of large and diverse discourse treebanks hinders the application of data-driven approaches, such as deep-learning, to RST-style discourse parsing. In this work, we present a novel scalable methodology to automatically generate discourse treebanks using distant supervision from sentiment annotated datasets, creating and publishing MEGA-DT, a new large-scale discourse-annotated corpus. Our approach generates discourse trees incorporating structure and nuclearity for documents of arbitrary length by relying on an efficient heuristic beam-search strategy, extended with a stochastic component. Experiments on multiple datasets indicate that a discourse parser trained on our MEGA-DT treebank delivers promising inter-domain performance gains when compared to parsers trained on human-annotated discourse corpora.",
    }