COMP394 - Natural Language Processing (Fall 2024)
Welcome to the COMP394 (Natural Language Processing) Course Page. For course policies, please check the syllabus.
Resources
Course Staff & Office Hours
Instructor:
Suhas Arehalli
Tu/Weds/Fri 2–3pm
OLRI 229
Preceptors:
Kien Nguyen
Mon 1–2pm, Th 4:30–6:30pm
Smail Gallery
Textbook
Jurafsky & Martin, Speech and Language Processing
Schedule
The schedule below will be updated to keep track of all released course materials. Keep in mind that I may shift planned topics to adjust pace as necessary.
Week | Date | Topic | Reading | Materials |
---|---|---|---|---|
1 | 9/3 | Introduction | Syllabus | Survey Set-up |
1 | 9/5 | Language is Hard: Sentence Structure (Syntax) | Skim Carnie (2011) Unit 1 | NACLO Problem 1, 2 |
2 | 9/10 | Language is Hard: Word Structure (Morphology) | NACLO Problem 1, 2, Spaces | |
2 | 9/12 | Modeling with Probability | Probability Notes | |
3 | 9/17 | N-grams (Maximum Likelihood Estimation) | Jurafsky & Martin 3.1–3.5 | |
3 | 9/19 | N-grams (Smoothing Techniques) | Jurafsky & Martin 3.6, 3.8, and Historical Notes | |
4 | 9/24 | Tokenization | J&M 2.5 | |
4 | 9/26 | CFGs | J&M 18.1–18.6 | |
5 | 10/1 | Parsing | J&M 18.1–18.6 | |
5 | 10/3 | Probabilistic Parsing 1 | J&M Appendix C | |
6 | 10/8 | Probabilistic Parsing 2 | J&M 4.1–4.6 | |
6 | 10/10 | Exam Prep | Practice Exam | |
7 | 10/15 | Exam 1 | Unit 1 Extended Readings | |
7 | 10/17 | No Class (Fall Break) | ||
8 | 10/22 | Text Classification (Naive Bayes) | J&M 4.1–4.6 | |
8 | 10/24 | Text Classification (Logistic Regression) & Evaluation | J&M 4.7–4.9, 5.1–5.5 | |
9 | 10/29 | Vector Embeddings | J&M 6 | |
9 | 10/31 | Intro to Deep Learning/Feedforward Networks | J&M 7.1–7.5 | |
10 | 11/5 | No Class (Election Day) | Complete the “Pytorch Practice” activity on Moodle | |
10 | 11/7 | Modern NLP Architectures 2 (RNNs/Self-Attention/Transformers) | J&M 8.1–8.3, 9.1–9.6 | |
11 | 11/12 | Modern NLP Architectures 2 (Self-Attention & Transformers) | Reread J&M 9.1–9.6 | Matrices, Practice Exam + solutions |
11 | 11/14 | Exam 2 | ||
12 | 11/19 | Intro to Bias and Model Interpretability | Bender et al. 2021. Optional: De-Arteaga et al 2019 | Slides |
12 | 11/21 | Targeted Evaluation, Probing, and de-biasing | Skim Bolukbasi et al. 2016 Optional: Cynthia Rudin Q&A, Been Kim Q&A | |
13 | 11/26 | LLMs, trust, and factuality | Weizenbaum 1967 and Turkle 1984 pg.33-45. Optional: Vaithilingam et al. (2022) | A14: LLM Factuality |
14 | 12/3 | Guest Lecture: Elizabeth Engle (Economics) | slides | |
14 | 12/5 | Data Ethics | ||
15 | 12/10 | Audits and Algorithmic Fairness | slides | |
16 | 12/16 1:30-3:30pm | Final Project Poster Presentation |
Homeworks
Homework 1: N-Gram Language Modeling
Released: Sep. 17th 5pm
Due: Oct. 2nd 9pm
Enrolled students should access the Github classroom link through Moodle.
Updates
Autograding and Revision Policy
Autograding test cases can be found here with a README that explain it’s use, as well as where to find the inputs you were tested on along with expected output and detailed logs of how the outputs were computed and the data they were computed from.
I will re-run the autograder at the end of the semester to determine your implementation score for this assignment for final grades. Consider this a rolling “part 2” of the assignment. The goal is not just to give you a second chance at getting the points, but to encourage you to view multiple stages of writing code, testing code (externally and internally), and revising that code as the standard practice of writing any substantial bit of software. Plus, this is motivation to make sure an assignment you do poorly on is not something to forget about as we move forward.
Homework 2: Text Classification
Released: Oct 31st 3PM
Part 1 Due: Nov 7th 9PM
Due: Nov 19th 22nd 9PM
Enrolled students can access the Github classroom link through Moodle.
Updates
- A few bugs in the
data.py
file and a typo in the README are fixed in a pull request sent through Github classroom.
Final Project
Released: Oct 29th 3PM
Proposal Due: Nov 21st 9PM
Report/Code Due: Dec 15th 9PM
Posters Due: Dec 9th 12PM
Presentations: Dec 16th 1:30–3:30PM
Guidelines can be accessed by Macalester students here.
Updates
- TBD