CPSC 436N: Natural Language Processing

Information

Canvas: TBD (follow for updates)
Piazza
Time: Term 1 (Sep-Dec 2022), TR 9:30-11:00
Location: DMP 110
Instructor: Vered Shwartz	Office hours: By appointment, contact through Piazza
TA: Felipe Gonzalez-Pizarro	Office hours: TBD
TA: Raymond Li	Office hours: TBD
TA: Mehar Bhatia	Office hours: TBD

Course Description

Natural Language Processing (NLP) is one of the fastest growing sub-area of Artificial Intelligence, with applications in all sectors of our society, including healthcare, business, science and government. In this course, starting from a solid background in computer science, students will learn how to analyze and apply fundamental NLP algorithms and techniques, combining traditional and neural models to better address the given requirements, considering possible trade-offs between accuracy, time/space efficiency and interpretability of the model’s output. In particular, the course will teach the fundamentals of modern data-driven natural language processing, including applications (such as question answering and machine translation), text representations (word embeddings, language models), and various approaches for natural language understanding and generation (classification, tagging, parsing, encoder-decoder). Importantly, they will also learn how to perform informed error-analysis and revise preliminary solutions to address most common sources of error (e.g., more data, different learning methods). Finally, special emphasis will be placed on the critical skill of reasoning about what will happen when a model is deployed in some context, especially focusing on learning how ethical issues intersect with NLP considerations (e.g., societal bias in text representations, fake news generation, sustainable models). The course will involve attending class, homework assignments, a midterm, an exam, and reading and discussing papers.

Tentative Syllabus

Course Overview + Introduction to NLP

Finite State Text Processing + Morphology

Text normalization + Spelling Correction

Language Models: Traditional vs Neural

Text Classification: Traditional Methods + Sentiment Analysis

Text Classification: Neural Methods

Sequence labeling: Traditional Methods + POS Tagging and NER

Sequence labeling: Neural Methods - RNN, LSTM

Encoder-Decoder + Attention

Transformers

Pre-trained Language Models + Transfer Learning with Contextual Embeddings

Syntax + Context Free Grammars and Parsing

Chunking + Dependency Parsing + Treebanks

Traditional and Neural Methods for Constituency and Dependency Parsing

Semantics

Lexical Semantics

Topic Modeling (LDA)

Discourse + Discourse Parsing and Neural topic segmentation

Summarization

Grading

Final Exam	35%*
Midterm	30%
Assignments	25%
Readings	10%
iClickers	3%**

The instructor reserves the right to modify this scheme at any time; however, it is likely that the scheme will remain similar to that stated here.

*You must pass the final exam in order to pass the course.

**bonus (1% participation + 2% correct answers).

Textbook

Selected Chapters of Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin. (J&M). We will follow the draft chapters from planned 3rd Edition. Although this text will be our main reference for the class, it must be stressed that you will need to know all the material covered in class, whether or not it is included in the readings or available on-line. Likewise, you are responsible for all the material in assigned readings, whether or not it is covered in class.