Information

Canvas
Piazza
Time: Term 1 (Sep-Dec 2023), TR 9:30-11:00
Location: Woodward IRC 1
Instructor: Vered Shwartz Office hours: By appointment
TAs:
Shruthi Chockkalingam Office hours: TBD
Tanzila Rahman Office hours: TBD

Course Description

Natural Language Processing (NLP) is one of the fastest growing sub-area of Artificial Intelligence, with applications in all sectors of our society, including healthcare, business, science and government. In this course, starting from a solid background in computer science, students will learn how to analyze and apply fundamental NLP algorithms and techniques, combining traditional and neural models to better address the given requirements, considering possible trade-offs between accuracy, time/space efficiency and interpretability of the model’s output. In particular, the course will teach the fundamentals of modern data-driven natural language processing, including applications (such as question answering and machine translation), text representations (word embeddings, language models), and various approaches for natural language understanding and generation (classification, tagging, parsing, encoder-decoder). Importantly, they will also learn how to perform informed error-analysis and revise preliminary solutions to address most common sources of error (e.g., more data, different learning methods). Finally, special emphasis will be placed on the critical skill of reasoning about what will happen when a model is deployed in some context, especially focusing on learning how ethical issues intersect with NLP considerations (e.g., societal bias in text representations, fake news generation, sustainable models). The course will involve attending class, homework assignments, a midterm, an exam, and reading and discussing papers.

Tentative Syllabus

Course Overview + Introduction to NLP
Finite State Text Processing + Morphology
Text normalization + Spelling Correction
Language Models: Traditional vs Neural
Text Classification: Traditional Methods + Sentiment Analysis
Text Classification: Neural Methods
Sequence labeling: Traditional Methods + POS Tagging and NER
Sequence labeling: Neural Methods - RNN, LSTM
Sequence-to-Sequence: Encoder-Decoder + Attention
Transformers
Pre-trained Language Models
Syntax + Context Free Grammars and Parsing
Chunking + Dependency Parsing + Treebanks
Traditional and Neural Methods for Constituency and Dependency Parsing
Semantics - Lexical Semantics, Semantic Role Labeling, Semantic Parsing
Topic Modeling (LDA)
Discourse + Coreference
Summarization
Advanced topics: e.g. prompting, ethics, efficient NLP, commonsense reasoning, etc.

Grading

Final Exam 35%*
Midterm 30%
Assignments 25%
Readings 10%

The instructor reserves the right to modify this scheme at any time; however, it is likely that the scheme will remain similar to that stated here.

*You must pass the final exam in order to pass the course.

Textbook

Selected Chapters of Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition by Daniel Jurafsky, James H. Martin. (J&M). We will follow the draft chapters from planned 3rd Edition. Although this text will be our main reference for the class, it must be stressed that you will need to know all the material covered in class, whether or not it is included in the readings or available on-line. Likewise, you are responsible for all the material in assigned readings, whether or not it is covered in class.