COMP394 - Natural Language Processing (Fall 2024)

Welcome to the COMP394 (Natural Language Processing) Course Page. For course policies, please check the syllabus.

Resources

Course Staff & Office Hours

Instructor:

Suhas Arehalli
Tu/Weds/Fri 2–3pm
OLRI 229

Preceptors:

Kien Nguyen
Mon 1–2pm, Th 4:30–6:30pm
Smail Gallery

Textbook

Jurafsky & Martin, Speech and Language Processing

Schedule

The schedule below will be updated to keep track of all released course materials. Keep in mind that I may shift planned topics to adjust pace as necessary.

Week	Date	Topic	Reading	Materials
1	9/3	Introduction	Syllabus	Survey Set-up
1	9/5	Language is Hard: Sentence Structure (Syntax)	Skim Carnie (2011) Unit 1	NACLO Problem 1, 2
2	9/10	Language is Hard: Word Structure (Morphology)		NACLO Problem 1, 2, Spaces
2	9/12	Modeling with Probability	Probability Notes
3	9/17	N-grams (Maximum Likelihood Estimation)	Jurafsky & Martin 3.1–3.5
3	9/19	N-grams (Smoothing Techniques)	Jurafsky & Martin 3.6, 3.8, and Historical Notes
4	9/24	Tokenization	J&M 2.5
4	9/26	CFGs	J&M 18.1–18.6
5	10/1	Parsing	J&M 18.1–18.6
5	10/3	Probabilistic Parsing 1	J&M Appendix C
6	10/8	Probabilistic Parsing 2	J&M 4.1–4.6
6	10/10	Exam Prep		Practice Exam
7	10/15	Exam 1		Unit 1 Extended Readings
7	10/17	No Class (Fall Break)
8	10/22	Text Classification (Naive Bayes)	J&M 4.1–4.6
8	10/24	Text Classification (Logistic Regression) & Evaluation	J&M 4.7–4.9, 5.1–5.5
9	10/29	Vector Embeddings	J&M 6
9	10/31	Intro to Deep Learning/Feedforward Networks	J&M 7.1–7.5
10	11/5	No Class (Election Day)	Complete the “Pytorch Practice” activity on Moodle
10	11/7	Modern NLP Architectures 2 (RNNs/Self-Attention/Transformers)	J&M 8.1–8.3, 9.1–9.6
11	11/12	Modern NLP Architectures 2 (Self-Attention & Transformers)	Reread J&M 9.1–9.6	Matrices, Practice Exam + solutions
11	11/14	Exam 2
12	11/19	Intro to Bias and Model Interpretability	Bender et al. 2021. Optional: De-Arteaga et al 2019	Slides
12	11/21	Targeted Evaluation, Probing, and de-biasing	Skim Bolukbasi et al. 2016 Optional: Cynthia Rudin Q&A, Been Kim Q&A
13	11/26	LLMs, trust, and factuality	Weizenbaum 1967 and Turkle 1984 pg.33-45. Optional: Vaithilingam et al. (2022)	A14: LLM Factuality
14	12/3	Guest Lecture: Elizabeth Engle (Economics)		slides
14	12/5	Data Ethics
15	12/10	Audits and Algorithmic Fairness		slides
16	12/16 1:30-3:30pm	Final Project Poster Presentation

Homeworks

Homework 1: N-Gram Language Modeling

Released: Sep. 17th 5pm
Due: Oct. 2nd 9pm
Enrolled students should access the Github classroom link through Moodle.

Updates

Advice on doing simple (not stupid!) backoff

Autograding and Revision Policy

Autograding test cases can be found here with a README that explain it’s use, as well as where to find the inputs you were tested on along with expected output and detailed logs of how the outputs were computed and the data they were computed from.

I will re-run the autograder at the end of the semester to determine your implementation score for this assignment for final grades. Consider this a rolling “part 2” of the assignment. The goal is not just to give you a second chance at getting the points, but to encourage you to view multiple stages of writing code, testing code (externally and internally), and revising that code as the standard practice of writing any substantial bit of software. Plus, this is motivation to make sure an assignment you do poorly on is not something to forget about as we move forward.

Homework 2: Text Classification

Released: Oct 31st 3PM
Part 1 Due: Nov 7th 9PM
Due: Nov ~~19th~~ 22nd 9PM
Enrolled students can access the Github classroom link through Moodle.

Updates

A few bugs in the data.py file and a typo in the README are fixed in a pull request sent through Github classroom.

Final Project

Released: Oct 29th 3PM
Proposal Due: Nov 21st 9PM
Report/Code Due: Dec 15th 9PM
Posters Due: Dec 9th 12PM
Presentations: Dec 16th 1:30–3:30PM

Guidelines can be accessed by Macalester students here.

COMP394 - Natural Language Processing (Fall 2024)

COMP394 - Natural Language Processing (Fall 2024)

Resources

Course Staff & Office Hours

Instructor:

Preceptors:

Textbook

Schedule

Homeworks

Homework 1: N-Gram Language Modeling

Updates

Autograding and Revision Policy

Homework 2: Text Classification

Updates

Final Project

Updates

results matching ""

No results matching ""