CPSC 503 - 2020-21 Term1 - Computational Linguistics

Readings, Syllabus, Assignments, Software&Data


Readings
Required
References


Syllabus, Assignments, Software& Data

1
Sep  9 Wed  Intro and Course Overview

We will communicate through Canvas: to log in use your CWL

J&M
Chp. 1  

Intro

- ACL
- NLP demos/videos
- Ambiguity

NLP toolkits: NLTK (Python), Stanford CoreNLP (java)

ProbInfoTheory Handout

2
Sep  14 Mon English Morphology and Finite State Machines: FSA and FST
J&M
Chp. 2&3 (2nd Edition)
missing pages
a b c
Assignment1on Canvas (due Sept 23)
Dementia Material: instructions, data, lib, run.py


Applications of FSTs in NLP Lauri Karttunen, CIAA, 2000.


3
Sep 16 Wed Finish FST + Stemming + Spelling
J&M
Chp. 3&4
4
Sep 21 Mon  Minimum Edit Distance + Probabilistic Models: N-grams -  N-grams  Evaluation -


J&M
Chp. 4

Google ngrams model

Google books Ngrams viewer

An empirical study of smoothing techniques for NLP S.F. Chen, J. Goodman - TR CS Harvard Univ - 1998

5
Sep 23 Wed Intro - Neural Networks and Neural Language Models - Start Markov Models


J&M 3Ed
Chp. 7-8



Neural Network Demos

 

 

6
Sep 28 Mon

Markov Sequence Labelling Models - Part-of-speech Tagging

 


J&M 3Ed
Some of Appendix A
Chp. 8

- state of the art POS tagging

why tagging can be challenging for humans: Penn tagging scheme

Part-of-Speech Tagging from 97% to 100% C. Manning 2011

Assignment2 on Canvas (due Oct 15) Corpora: wsj-p.txt  wsj-ps.txt  atis3.pos.tags.txt cmpt-hw2-3.txt

7
Sep 30 Wed  Neural Sequence processing with Recurrent Neural Networks (RNN)    J&M 3Ed
Chp. 9
see also Goldberg Chps 14-15-16



 8
Oct 5 Mon  Start English Syntax and Context-free Grammars -- Parsing Algorithms J&M 3Ed
Chp. 10-11


 Interactive tutorials on the English grammar  (not working 2020?)
English Dept. University of Calgary.

Another resource on grammar from UCL

- NLTK (demos) - look at *Getting Started*
 - Some public parsers (inlcuding Stanford and MINIPAR visualization  tools)

 9

Oct 7 Wed
Chunking / Dependency Grammars and Transition-based Dep. Parsing/ Treebanks -
 
J&M 3Ed
Chp. 13
 Stanford Parser -
-Popular Stat Parser

- MaltParser - State of the Art Dependency Parser

-Penn Treebank - Universal dependency Treebanks

  Oct 12 Mon ThxsGiving day no class    
 10
Oct 15 Wed
Probabilistic CFGs - PCFGs Parsing + Lexicalized PCFGs - Neural Constituency and Dependency Parsing  J&M 3Ed
Chp. 12
- Berkeley Parser with demo!
11 Oct 19 Mon Representing Meaning and
Semantic Analysis
J&M Chp.  16 (17 not available yet) book on Computational Semantics

Time ML

Semantic Parser (Cornell - Yoav Artzi)

12
 Oct 21 Wed Lexical Semantics J&M Chp. 19 - Wordnet and YAGO (Wikipedia + Wordnet + GeoNames). See also Probase and Freebase and BabelNet

- (Domain specific thesaurus) Medical Subject Headings (MeSH)
- FrameNet
- ProbBank (adding semantic annotations to the Penn Treebank)

Assignment3 on Canvas (due ...)  needed files

13
 Oct 26  Mon Computational Lexical Semantics (focus on Vector Semantics) J&M Chp. 6 - word2vec 

- A systematic comparison of context-counting vs. context-predicting semantic vectors ! (predicting is clearly better)

- generalization of skip-grams to sentences (skip-thought vectors) 2015

- SENSEVAL(Evaluation for WSD)

- WSD online public systems

- WSD with Deep Belief Networks
- Dependency-based word similarity demo
- TREC (Text REtrieval Conference)
- Semantic Labeling (ASSERT)

-Illinois Semantic Role Labeler

 

14
Nov 2 Mon CNNs,  Semantic Role labeling, Brief Intro Pragmatics:
- Appied CL Discourse Research Lab
- DAMSL
- RST annotation tool
15 Nov 6 Wed  Encoder-Decoder, Attention and Transformers Conditioned Neural Generation (Encoder-Decoder framework) pag. 195-211 - Y. Goldberg book 2017- Chp. 17  Assignment4 on Canvas (due ......) 
16 Nov 9 Mon Project Proposal Presentations -

see project work plan

    READINGS (what to do?)   avg. year 2015
17 Nov 16 Mon

Generic Topic Modeling (background reading Comm. ACM) and Topic Modeling in Asynchronous Conversations

  • Steyvers, M. & Griffiths, T. (). Probabilistic topic models. In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum 2006 [ pdf ] or [ pdf  [Eric Lee]  

  • Joe Barrow, Rajiv Jain, Vlad Morariu, Varun Manju-natha, Douglas Oard, and Philip Resnik. 2020. A joint model for document segmentation and segment labeling. InProceedings of the 58th Annual Meet-ing of the Association for Computational Linguistics,pages 313–322.[pdf]   [Sean La]  

  • GS. Joty, G. Carenini and R. T. Ng (2013) Topic Segmentation and Labeling in Asynchronous Conversations JAIR, Volume 47, pages 521-573 (2013) (only intro, conclusions and sections of topic segmentation (not labeling) [pdf]  

18 Nov 18 Wed

Visual Text Analytics and Interactive Topic Modeling

 Intelligent User Interfaces (IUI), 2016 ] VIDEO  [Cloris Feng]  

19  Nov 23 Mon

Distributed Representations for Sentence + Summarization (1)

  •  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding J. DevlinMW Chang, K. Lee, K. Toutanova   2018 [pdf] [Mark Ma]

  • (background reading:  Summarization Evaluation notes)
    Lidong Bing, Piji Li, Yi Liao, Wai Lam, Weiwei Guo, Rebecca J. Passonneau,  Abstractive Multi-Document Summarization via Phrase Selection and Merging. Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics  (ACL 2015) pdf  [Cara Reisle ]   

20
Nov 25 Wed

 Summarization (2)

  • Jianpeng Cheng Mirella Lapata, Neural Summarization by Extracting Sentences and Words   ACL-2016 [pdf] [Weiwei Su ]

  •  

  • Wen XiaoGiuseppe Carenini: Extractive Summarization of Long Documents by Combining Global and Local Context. EMNLP/IJCNLP (1) 3009-3019 [pdf Jason Yoo ]

  

21
Nov 30 Mon Sentiment + Graph Based WSD

pre-reading for paper1: Chaper 18 of Y. Goldberg (only 5 pages)
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng and Christopher Potts, Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) [pdf] [Muchen Li]

  • Navigli, R. and Lapata, M. (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation.  IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 678–692. [Junze Wu ]


22
Dec 2 Wed

Neural: Text Classification

- Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., Hovy, E.: Hierarchical attention networks for document classication. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 1480{1489 NAACL (2016) [pdf]   [Aritro R Arko]

- Weirui Kong, Hyeju Jang, Giuseppe Carenini, Thalia Shoshana Field: A Neural Model for Predicting Dementia from Language. MLHC (2019): 270-286 [pdf] [ Ryan  ]


23
Dec 7 Mon  Natural Language Generation (data2text) + Discourse Parsing
  •  Portet, E Reiter, A Gatt, J Hunter, S Sripada, Y Freer, C Sykes  Automatic Generation of Textual Summaries from Neonatal Intensive Care Data. Artificial Intelligence 173:789-816. 2009 (pdf)    [ TBA] 
  • Patrick HuberGiuseppe Carenini: Predicting Discourse Structure using Distant Supervision from Sentiment. EMNLP/IJCNLP (1) 2306-2316 [ pdf ] [ TBA  ]
  X Not in this offering X Discourse Parsing:  Applications + Distant Supervision
  •  
  • Ji and Smith – Neural Discourse Structure for Text Categorization ACL-17  [pdf] [  ]
  • pre-reading (not mandatory) for paper2: Chaper 20 (sec 20.1 and 20.2) of Y. Goldberg (only 9 pages, lots of figures ;-)
  • Yang Liu and Sujian Li. Implicit discourse relation classification via multi-task neural networks. In Proceedings of AAAI Conference, (2016). [pdf] [ ] 

 

 

23 Dec 7 Mon Project Update Presentations    
25
Dec 16 Wed 
 
(12-230pm)

deadline for grade submission TBD
Project Final Presentations

Final Project Report Hand in






carenini at cs.ubc.ca