CPSC 503 - 2021-22 Term1 - Computational Linguistics

Readings, Syllabus, Assignments, Software&Data


Syllabus, Assignments, Software& Data

Sep  9 Thu  Intro and Course Overview

We will communicate through Canvas: to log in use your CWL

Chp. 1  


- NLP demos/videos
- Ambiguity

NLP toolkits: NLTK (Python), Stanford CoreNLP (java)

ProbInfoTheory Handout

Sep  14 Tue English Morphology and Finite State Machines: FSA and FST
Chp. 2&3 (2nd Edition)
missing pages
a b c
Assignment1on Canvas (due Sept 23)
Dementia Material: instructions, data, lib, run.py

Applications of FSTs in NLP Lauri Karttunen, CIAA, 2000.

Sep 16 Thu Finish FST + Stemming + Spelling
Chp. 3&4
Sep 21 Tue  Minimum Edit Distance + Probabilistic Models: N-grams -  N-grams  Evaluation -

Chp. 4

Google ngrams model

Google books Ngrams viewer

An empirical study of smoothing techniques for NLP S.F. Chen, J. Goodman - TR CS Harvard Univ - 1998

Sep 23 Thu Intro - Neural Networks and Neural Language Models - Start Markov Models

J&M 3Ed
Chp. 7-8

Neural Network Demos



Sep 28 Tue

Markov Sequence Labelling Models - Part-of-speech Tagging


J&M 3Ed
Some of Appendix A
Chp. 8

- state of the art POS tagging

why tagging can be challenging for humans: Penn tagging scheme

Part-of-Speech Tagging from 97% to 100% C. Manning 2011

Assignment2 on Canvas (due Oct 15) Corpora: wsj-p.txt  wsj-ps.txt  atis3.pos.tags.txt cmpt-hw2-3.txt

  Sep 30 Thu National Day for Truth and Reconciliation. University closed.    

Oct 5 Tue

 Neural Sequence processing with Recurrent Neural Networks (RNN)    J&M 3Ed
Chp. 9
see also Goldberg Chps 14-15-16

Oct 7 Thur  Start English Syntax and Context-free Grammars -- Parsing Algorithms J&M 3Ed
Chp. 12-13-14

 Interactive tutorials on the English grammar  (not working 2020?)
English Dept. University of Calgary.

Another resource on grammar from UCL

- NLTK (demos) - look at *Getting Started*
 - Some public parsers (inlcuding Stanford and MINIPAR visualization  tools)


Oct 12 Tue
Chunking / Dependency Grammars and Transition-based Dep. Parsing/ Treebanks -
J&M 3Ed
Chp. 13
 Stanford Parser -
-Popular Stat Parser

- MaltParser - State of the Art Dependency Parser

-Penn Treebank - Universal dependency Treebanks

Oct 14 Thu
Probabilistic CFGs - PCFGs Parsing + Lexicalized PCFGs - Neural Constituency and Dependency Parsing  J&M 3Ed
 Chp 14 Appendix C
- Berkeley Parser with demo!
11 Oct 19 Tue Representing Meaning and
Semantic Analysis
J&M Chp.  15 (16 not available yet) book on Computational Semantics

Time ML

Semantic Parser (Cornell - Yoav Artzi)

 Oct 21 Thu Lexical Semantics J&M Chp. 19 - Wordnet and YAGO (Wikipedia + Wordnet + GeoNames). See also Probase and Freebase and BabelNet

- (Domain specific thesaurus) Medical Subject Headings (MeSH)
- FrameNet
- ProbBank (adding semantic annotations to the Penn Treebank)

Assignment3 on Canvas (due Nov 2)  needed files

 Oct 26  Tue Computational Lexical Semantics (focus on Vector Semantics) J&M Chp. 6 - word2vec 

- A systematic comparison of context-counting vs. context-predicting semantic vectors ! (predicting is clearly better)

- generalization of skip-grams to sentences (skip-thought vectors) 2015

- SENSEVAL(Evaluation for WSD)

- WSD online public systems

- WSD with Deep Belief Networks
- Dependency-based word similarity demo
- TREC (Text REtrieval Conference)
- Semantic Labeling (ASSERT)

-Illinois Semantic Role Labeler


14 Oct 28 Thu CNNs,  Semantic Role labeling, start Encoder-Decoder
- Appied CL Discourse Research Lab
- RST annotation tool
Nov 2 Tue  Encoder-Decoder, Attention and Transformers; Brief Intro Pragmatics: J&M 9.7-9.9; 10.2-10.6
Conditioned Neural Generation (Encoder-Decoder framework) pag. 195-211 - Y. Goldberg book 2017- Chp. 17 
Assignment4 on Canvas (due Nov 15 

HuggingFace's Transformers: State-of-the-art Natural Language Processing - Best demo at EMNLP 2020
16 Nov 4 Thu   Project Proposal Presentations -

see project work plan



17  Nov 9 Tue

Project Proposal Presentations - (cont')

READINGS  (what to do?)  

Contextual Embeddings + Distributed Representations for Sentence

  •  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding J. Devlin, MW Chang, K. Lee, K. Toutanova   NAACL-HLT 2019 [pdf] [Xu (Neil) Fan, Li Zha]



18 Nov 16 Tue

NOTE:Consider these two papers (DocBERT and M2Lens) as ONE reading 18(1)

19 Nov 18 Thu

Generic Topic Modeling (background reading Comm. ACM) and Topic Modeling in Asynchronous Conversations

  • Steyvers, M. & Griffiths, T. (). Probabilistic topic models. In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum 2006 [ pdf ] or [ pdf  [Helen Zhang, Reza Soltani]  

  • Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manju-natha, Douglas Oard, and Philip Resnik. 2020. A joint model for document segmentation and segment labeling. InProceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics,pages 313–322.[pdf]   [Ghazal Ebrahimi, Changbing Yang ] 

  • GS. Joty, G. Carenini and R. T. Ng (2013) Topic Segmentation and Labeling in Asynchronous Conversations JAIR, Volume 47, pages 521-573 (2013) (only intro, conclusions and sections of topic segmentation (not labeling) [pdf]  

20  Nov 23 Tue

Visual Text Analytics and Interactive Topic Modeling

 Intelligent User Interfaces (IUI), 2016 ] VIDEO  [Felipe González-Pizarro]  

Nov 25 Thu


  • Wen XiaoGiuseppe Carenini: Extractive Summarization of Long Documents by Combining Global and Local Context. EMNLP/IJCNLP (1) 3009-3019 [pdf [Yiwei Hou, Swati Kanwal ]

  •   Jingqing  Zhang,  Yao  Zhao,  Mohammad  Saleh,  andPeter Liu.   PEGASUS: Pre-training with ex-tracted gap-sentences for abstractive summarization.InProceedings of the 37th International Conferenceon Machine Learning,  volume 119  pages 11328–11339.PMLR, ICML 2020 [pdf]  [ Sahithya Ravi, Yixin (Helen) Wang]

Nov 30 Tue Discourse Parsing

Patrick Huber, Giuseppe Carenini: MEGA RST Discourse Treebanks with Structure and Nuclearity from Scalable Distant Sentiment Supervision.  (EMNLP 2020)  [ pdf ] Shen Chenxinran ,  weirui chen ]

GrigoriiGuzPatrickHuber, GiuseppeCarenini: Unleashing the Power of Neural Discourse Parsers - A Context and Structure Aware Approach Using Large Scale Pretraining. (COLING 2020) [pdf]  Inna Ivanova ,  LinJian Li ]
Dec 2 Thu Natural Language Generation (data2text)
  •  Portet, E Reiter, A Gatt, J Hunter, S Sripada, Y Freer, C Sykes  Automatic Generation of Textual Summaries from Neonatal Intensive Care Data. Artificial Intelligence 173:789-816. 2009 (pdf)    [ Melika Farahani], Zhipeng Zhu 
  • Data-to-Text Generation with Content Selection and Planning Ratish Puduppully, Li Dong, Mirella Lapata  AAAI 2019 (pdf   [ Md Tawkat Islam Khondaker, Xiaoxuan Liang

Dec 7 Tue Project Update Presentations
25 TBD

deadline for grade submission TBD
Project Final Presentations

Final Project Report Hand in

Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, Rebecca J. Passonneau,  Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics  (ACL 2015) pdf  [ ]
  X Not in this offering X Discourse Parsing:  Applications + Distant Supervision
  • Ji and Smith – Neural Discourse Structure for Text Categorization ACL-17  [pdf] [  ]
  • pre-reading (not mandatory) for paper2: Chaper 20 (sec 20.1 and 20.2) of Y. Goldberg (only 9 pages, lots of figures ;-)
  • Yang Liu and Sujian Li. Implicit discourse relation classification via multi-task neural networks. In Proceedings of AAAI Conference, (2016). [pdf] [ ] 




One-Shot, Few shot - Interfpretability


 Yin, W., Hay, J. & Roth, D. Benchmarking Zero-shot Text Classification: Datasets, Evaluation and
Entailment Approach. in Proceedings of the 2019 Conference on Empirical Methods in Natural
Language Processing and the 9th International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP) 3914–3923 (aclweb.org, 2019).
 Wei, J., Huang, C., Vosoughi, S., Cheng, Y. & Xu, S. Few-Shot Text Classification with Triplet
Page 2 of 5
Literature References (NFRFE-2021-00776)
Networks, Data Augmentation, and Curriculum Learning. in Proceedings of the 2021 Conference of
the North American Chapter of the Association for Computational Linguistics: Human Language
Technologies 5493–5500 (aclweb.org, 2021).



carenini at cs.ubc.ca