CPSC 503 - Winter 2012 -
Computational Linguistics
Readings, Syllabus, Assignments,
Software&Data
|
Natural Language Processing with Python: Bird, Steven; Klein, Ewan, Loper, Edward. n, O'Reilly, 2009. Free HTML version. You can order this book directly from O'Reilly
Introduction to Information Retrieval. by Manning, Raghavan, Schutze webpage
Graph-based Natural Language Processing and Information Retrieval. Rada Mihalcea (Author), Dragomir Radev (Author)
Foundations of Statistical Natural Language Processing by Christopher D. Manning, Hinrich Schutze. (M&S). In many cases the statistical approaches are covered in more detail in this book. However, it does not contain all the topics that we will cover in this course. This book also has a webpage.680 pages 1 edition (1999), M.I.T. Press/Triliteral, ISBN: 0262133601. This book will be useful in cases where you want a different presentation of the same material that is required reading from J&M
Contemporary Linguistics: An introduction by W. O'Grady, J. Archibald, M. Aronoff, J. Rees-Miller. 684 pages 5th Edition (2004). ISBN: 0312419368. This book will be useful in cases where you want a more detailed description of linguistic theories. It also contains lots of clear examples of linguistic phenomena. This book also has a webpage.
Synthesis Lectures in Natural Language Processing webpage
Syllabus, Assignments, Software& Data
1
Jan 8 Tu Intro and Course Overview We will communicate through Connect : to log in use your CWL
J&M
Chp. 1
- ACL
- NLP demos
-Ambiguity
2
Jan 10Th English Morphology and Finite State Machines: FSA and FST
J&M
Chp. 2&3
Applications of FSTs in NLP Lauri Karttunen, CIAA, 2000.
Assignment1 (due Jan 17)
- Recent book and software
- Xerox: FiniteState Technology
- Finite State Utilities (Van Noord)
3
Jan 15Tu Finish FST + Stemming + Spelling
J&M
Chp. 3&4
- The Porter Stemmer (includes perl implementation)
- ProbInfoTheory Handout
- min-edit-dist demo
- A spelling correction program based on a noisy channel model Kerninghan et al. COLING ,1990.
- minimal Python implementation of spelling correction (by P. Norvig)
4
Jan 17Th Minimum Edit Distance + Probabilistic Models: N-grams -
J&M
Chp. 4
An empirical study of smoothing techniques for NLP S.F. Chen, J. Goodman - TR CS Harvard Univ - 1998
5
Jan 22 Tu N-grams Evaluation - Markov Models - Part-of-speech Tagging
J&M
Chp. 4-5-6
- state of the art POS tagging why tagging can be challenging for humans: Penn tagging scheme
6
Jan 24 Th English Syntax and Context-free Grammars J&M
Chp. 12Interactive tutorials on the English grammar
English Dept. University of Calgary.Assignment2 on Connect (due Feb 7)
Corpora: wsj-p.txt wsj-ps.txt atis3.pos.tags.txt cmpt-hw2-3.txt7
Jan 29 Tu Parsing Algorithms / J&M Chp. 13
- NLTK (demos) - look at *Getting Started*
- Some public parsers (inlcuding Stanford and MINIPAR visualization tools)
8
Jan 31 Th Chunking / Dependency Grammars/ Treebank - Start Probabilistic CFGs
J&M Chp. 14
-Penn Treebank - Stanford Parser -
-Popular Stat Parser
- MaltParser - State of the Art Dependency Parser9
Feb 5 Tu PCFGs Parsing + Lexicalized PCFGs - Berkeley Parser with demo! 10 Feb 7 Th
Representing Meaning and
Semantic AnalysisJ&M Chp. 17-18 book on Computational Semantics 11
Feb 12 Tu
Lexical Semantics J&M Chp.19 - Wordnet and YAGO (Wikipedia + Wordnet + GeoNames). See also Probase and Freebase - (Domain specific thesaurus) Medical Subject Headings (MeSH)
- FrameNet
- ProbBank (adding semantic annotations to the Penn Treebank)12
Feb 14 Thu Computational Lexical Semantics J&M Chp. 20 - SENSEVAL(Evaluation for WSD) - WSD online public systems
- Dependency-based word similarity demo
- TREC (Text REtrieval Conference)
- Semantic Labeling (ASSERT)Assignment3 onConnect (due Feb 28) needed files
Midterm break Feb 18 - 24 13
Feb 26 Tu Pragmatics: Discourse&Dialog J&M Chp. 21 & 24
- DAMSL
- RST annotation tool
Buffer Natural Language Generation (NLG): sample system: Generator Evaluative Arguments (GEA)
handout
- SIGGEN
- NLG systems book, STOP system, SimpleNLG
- NLG companies: data2text CoGenTex
14
Feb 28 Thu Project Proposal Presentations -
15
Mar 5 Tu Project Proposal Presentations -
READINGS (what to do?) 16 Mar 7 Thu (data2text) Natural Language Generation (1)
F Portet, E Reiter, A Gatt, J Hunter, S Sripada, Y Freer, C Sykes Automatic Generation of Textual Summaries from Neonatal Intensive Care Data. Artificial Intelligence 173:789-816. 2009 (pdf) [ Matthew up to 4.5 exluded ] [Baipeng from 4.5 to end]
Ryuichiro Higashinaka et al. Learning to generate naturalistic utterances using reviews in spoken dialogue systems Proceeding of ACL 2006 pdf [Sanjana]17 Mar 12 Tu Summarization (1)
(Biographies) Fadi Biadsy, Julia Hirschberg, Elena Filatova, "An Unsupervised Approach to Biography Production using Wikipedia", ACL-08: HLT, Columbus, Ohio, Jun 2008 pdf [Suman]
(Evaluative Text e.g., customer reviews) Carenini, G., Ng, R., & Pauls, A. (2006). Multi-document summarization of evaluative text.
In Proceedings of EACL, 2006. ( pdf ) [Enamul]
18
Mar 14 Thu Summarization (2)
Regina Barzilay, Kathleen McKeown "Sentence Fusion for Multidocument News Summarization",
Computational Linguistics, 2005. [ps] [Connor]
(read about Rouge on my book)
Ani Nenkova et al. The Pyramid Method: Incorporating human content selection variation in summarization evaluation ACM Trans. on Speech and Language Processing (TSLP), 2007 pdf [Kaya]19
Mar 19 Tu Summarization(3)
Gabriel Murray and Giuseppe Carenini Summarizing Spoken and Written Conversations EMNLP 2008 [pdf] [Maryam]Giuseppe Carenini , Raymond NG, Xiaodong Zhou, Summarizing Emails with Conversational Cohesion and Subjectivity ACL 2008 [pdf] [Tatsuro]
20
Mar 21 Thu Subjectivity and Sentiment (1)
Fangzhong Su and Katja Markert Subjectivity recognition on word senses via semi-supervised mincuts HLT- NAACL 2009 [Sylvie]
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2009). Recognizing Contextual Polarity: An exploration of features for phrase-level sentiment analysis. Computational Linguistics, 35:3, pages 399-433. [Kalan]
21
Mar 26 Tu Guest (postdoc Yashar Mahdad): Information Extraction + Textual Entailment IE: Open Information Extraction from the Web IJCAI 2007: [Mahsa]
TE: A Survey of Paraphrasing and Textual Entailment Methods (only till page 18, i.e. section 1 and 2) JAIR 2010
22
Mar 28 Thu Guest (finishing PhD student Shafiq Joty): Discourse Parsing 23 Apr 2 Tu Natural Language Generation (2) / Topic Modeling
Walker, M., Stent, A., Mairesse, F., & Prasad, R. (2007). Individual and domain adaptation in sentence planning for dialogue. Journal of Artificial Intelligence Research, 30, 413-456 2007. (long but lots of tables / figure) pdf [ Arni ]
Kino Coursey and Rada Mihalcea and William Moen, Using Encyclopedic Knowledge for Automatic Topic Identification, in Proceedings of the Conference on Natural Language Learning (CONLL 2009), pp. 210-218, Boulder, Colorado, May 2009. [ pdf ] [ Vincent ]
BUFFER Topic Modeling and Topic Identification (background reading Comm. ACM)
Y. Chali, S. R. Joty and S. A. Hasan (2009) "Complex Question Answering: Unsupervised Learning Approaches and Experiments", JAIR, Volume 35, pages 1-47, 2009 pdf
Steyvers, M. & Griffiths, T. (2006). Probabilistic topic models. In T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum [ pdf ]
24 Apr 4 Thu Project Update Presentations 25
Apr 15 Mon Project Final Presentations Final Project Report Handin