CPSC 503 - Winter 2014 -
Computational Linguistics
Readings, Syllabus, Assignments,
Software&Data
|
Natural Language Processing with Python: Bird, Steven; Klein, Ewan, Loper, Edward. n, O'Reilly, 2009. Free HTML version. You can order this book directly from O'Reilly
Introduction to Information Retrieval. by Manning, Raghavan, Schutze webpage
Graph-based Natural Language Processing and Information Retrieval. Rada Mihalcea (Author), Dragomir Radev (Author)
Foundations of Statistical Natural Language Processing by Christopher D. Manning, Hinrich Schutze. (M&S). In many cases the statistical approaches are covered in more detail in this book. However, it does not contain all the topics that we will cover in this course. This book also has a webpage.680 pages 1 edition (1999), M.I.T. Press/Triliteral, ISBN: 0262133601. This book will be useful in cases where you want a different presentation of the same material that is required reading from J&M
Contemporary Linguistics: An introduction by W. O'Grady, J. Archibald, M. Aronoff, J. Rees-Miller. 684 pages 5th Edition (2004). ISBN: 0312419368. This book will be useful in cases where you want a more detailed description of linguistic theories. It also contains lots of clear examples of linguistic phenomena. This book also has a webpage.
Synthesis Lectures in Natural Language Processing webpage
Syllabus, Assignments, Software& Data
1
Sep 4 Th Intro and Course Overview We will communicate through Connect : to log in use your CWL
J&M
Chp. 1
- ACL
- NLP demos/videos
- Ambiguity
2
Sep 9 Tu English Morphology and Finite State Machines: FSA and FST
J&M
Chp. 2&3
Applications of FSTs in NLP Lauri Karttunen, CIAA, 2000.
Assignment1 (due Sept 18)
- Recent book and software
- Xerox: FiniteState Technology
- Finite State Utilities (Van Noord)
3
Sep 11 Th Finish FST + Stemming + Spelling
J&M
Chp. 3&4
- The Porter Stemmer (includes perl implementation)
- ProbInfoTheory Handout
- min-edit-dist demo
- A spelling correction program based on a noisy channel model Kerninghan et al. COLING ,1990.
- minimal Python implementation of spelling correction (by P. Norvig)
4
Sep 16 Tu Minimum Edit Distance + Probabilistic Models: N-grams -
J&M
Chp. 4
An empirical study of smoothing techniques for NLP S.F. Chen, J. Goodman - TR CS Harvard Univ - 1998
5
Sep 18 Th N-grams Evaluation - Markov Models - Part-of-speech Tagging
J&M
Chp. 4-5-6
- state of the art POS tagging why tagging can be challenging for humans: Penn tagging scheme
Part-of-Speech Tagging from 97% to 100% C. Manning 2011
6
Sep 23 Tu English Syntax and Context-free Grammars J&M
Chp. 12Interactive tutorials on the English grammar
English Dept. University of Calgary.Assignment2 on Connect (due Oct 7)
Corpora: wsj-p.txt wsj-ps.txt atis3.pos.tags.txt cmpt-hw2-3.txt7
Sep 25 Th Parsing Algorithms / J&M Chp. 13
- NLTK (demos) - look at *Getting Started*
- Some public parsers (inlcuding Stanford and MINIPAR visualization tools)
8
Sep 30 Tu Chunking / Dependency Grammars/ Treebank - Start Probabilistic CFGs
J&M Chp. 14
-Penn Treebank - Stanford Parser -
-Popular Stat Parser
- MaltParser - State of the Art Dependency Parser9
Oct 2 Th PCFGs Parsing + Lexicalized PCFGs - Berkeley Parser with demo! 10 Oct 7 Tu
Representing Meaning and
Semantic AnalysisJ&M Chp. 17-18 book on Computational Semantics 11
Oct 9 Th
Lexical Semantics J&M Chp.19 - Wordnet and YAGO (Wikipedia + Wordnet + GeoNames). See also Probase and Freebase - (Domain specific thesaurus) Medical Subject Headings (MeSH)
- FrameNet
- ProbBank (adding semantic annotations to the Penn Treebank)12
Oct 14 Tu Computational Lexical Semantics J&M Chp. 20 - SENSEVAL(Evaluation for WSD) - WSD with Deep Belief Networks
- Dependency-based word similarity demo
- TREC (Text REtrieval Conference)
- Semantic Labeling (ASSERT)Assignment3 onConnect (due Oct 28) needed files
13
Oct 16 Th Pragmatics: Discourse&Dialog J&M Chp. 21 & 24
- DAMSL
- RST annotation tool
14
Oct 21 Tu Project Proposal Presentations -
buffer Natural Language Generation (NLG): sample system: Generator Evaluative Arguments (GEA)
handout
- SIGGEN
- NLG systems book, STOP system, SimpleNLG
- NLG companies: data2text CoGenTex
READINGS (what to do?) 15 Oct 23 Th Generic Topic Modeling (background reading Comm. ACM) and Topic Modeling in Synchronous Conversations
Steyvers, M. & Griffiths, T. (). Probabilistic topic models. In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum 2006 [ pdf ] [Alim ]
Galley, M., McKeown, K., Fosler-Lussier, E., & Jing, H. Discourse segmentation of multi-party conversation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, Sapporo, Japan.
ACL. 2003 [pdf] [ Chris]
16 Oct 28 Tu Topic Modeling and Labelling in Asynchronous Conversations Guest (PhD student Enamul Hoque):
- S. Joty, G. Carenini and R. T. Ng (2013) Topic Segmentation and Labeling in Asynchronous Conversations JAIR, Volume 47, pages 521-573 (2013) [pdf]
NOTE: Split in two presentations. One student will cover topic segmentation [ Chris] , the other topic labelling [Johanna ]
17 Oct 30 Th Visual Text Analytics and Interactive Topic Modeling Guest (PhD student Enamul Hoque)
E. Hoque and G. Carenini, ConVis: A Visual Text Analytic System for Exploring Blog Conversations, Journal of Computer Graphics Forum (Proc. EuroVis), 2014 [pdf] [Enamul ]
E. Hoque and G. Carenini, Exploring Asynchronous Conversations Through Human-in-the-loop Topic Model (submitted) 2014 [pdf] [ Enamul ]
18
Nov 4 Tu Natural Language Generation (Sentence Planning)
Walker, M., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research, 30, 413-456 2007. (long but lots of tables / figure) pdf [ Daniel ]Natural Language Generation (data2text)
F Portet, E Reiter, A Gatt, J Hunter, S Sripada, Y Freer, C Sykes Automatic Generation of Textual Summaries from Neonatal Intensive Care Data. Artificial Intelligence 173:789-816. 2009 (pdf) [Olivia ]19
Nov 6 Th Summarization (1)
Regina Barzilay, Kathleen McKeown "Sentence Fusion for Multidocument News Summarization",
Computational Linguistics, 2005. [ps] [Johanna]
(background reading: textbook sec. 23.7 Summarization Evaluation)
Ani Nenkova et al. The Pyramid Method: Incorporating human content selection variation in summarization evaluation ACM Trans. on Speech and Language Processing (TSLP), 2007 pdf [Alim]
Nov 11 Tu holiday 20
Nov 13 Th Summarization (2)
(Biographies) Fadi Biadsy, Julia Hirschberg, Elena Filatova, "An Unsupervised Approach to Biography Production using Wikipedia", ACL-08: HLT, Columbus, Ohio, Jun 2008 pdf [Daniel]
Janara Christensen, Stephen Soderland, Gagan Bansal, and Mausam "Hierarchical Summarization: Scaling Up Multi-Document Summarization" Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics
(ACL 2014) [pdf ] [Johanna]
21
Nov 18 Tu Subjectivity and Sentiment
Fangzhong Su and Katja Markert Subjectivity recognition on word senses via semi-supervised mincuts HLT- NAACL 2009 [Daniel]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2009). Recognizing Contextual Polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35:3, pages 399-433. [Olivia]
22
Nov 20 Th Online Discussions + Graph Based WSD
- Arjun Mukherjee, Vivek Venkataraman, Bing Liu, and Sharon Meraz. Public Dialogue: Analysis of Tolerance in Online Discussions. Proceedings of The 51st Annual Meeting of the Association for Computational Linguistics (ACL-2013), August 4-9, 2013, Sofia, Bulgaria. [Olivia ]
- Navigli, R. and Lapata, M. (2010) An experimental study of graph connectivity for unsupervised word sense disambiguation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(4), 678–692. [Chris]
23 Nov 25 Tu Discourse Parsing
- R. Soricut and D. Marcu. 2003. Sentence Level Discourse Parsing Using Syntactic and Lexical Information. NAACL 2003 [ pdf ] [ Alim ]
- Please Read the first 4 sections (up to pag. 23) of this draft paper under submission, [ pdf ] [ Giuseppe ]
24 Nov 27 Th Project Update Presentations 25
Dec 11 Th 11-1:30
tentative
Project Final Presentations Final Project Report Hand in