COMP394 - Natural Language Processing (Fall 2024)

Welcome to the COMP394 (Natural Language Processing) Course Page. For course policies, please check the syllabus.

Resources

Course Staff & Office Hours

Instructor:

Suhas Arehalli
Tu/Weds/Fri 2–3pm
OLRI 229

Preceptors:

Kien Nguyen
Mon 1–2pm, Th 4:30–6:30pm
Smail Gallery

Textbook

Jurafsky & Martin, Speech and Language Processing

Schedule

The schedule below will be updated to keep track of all released course materials. Keep in mind that I may shift planned topics to adjust pace as necessary.

Week Date Topic Reading Materials
1 9/3 Introduction Syllabus Survey Set-up
1 9/5 Language is Hard: Sentence Structure (Syntax) Skim Carnie (2011) Unit 1 NACLO Problem 1, 2
2 9/10 Language is Hard: Word Structure (Morphology)   NACLO Problem 1, 2, Spaces
2 9/12 Modeling with Probability Probability Notes  
3 9/17 N-grams (Maximum Likelihood Estimation) Jurafsky & Martin 3.1–3.5  
3 9/19 N-grams (Smoothing Techniques) Jurafsky & Martin 3.6, 3.8, and Historical Notes  
4 9/24 Tokenization J&M 2.5  
4 9/26 CFGs J&M 18.1–18.6  
5 10/1 Parsing J&M 18.1–18.6  
5 10/3 Probabilistic Parsing 1 J&M Appendix C  
6 10/8 Probabilistic Parsing 2 J&M 4.1–4.6  
6 10/10 Exam Prep   Practice Exam
7 10/15 Exam 1   Unit 1 Extended Readings
7 10/17 No Class (Fall Break)    
8 10/22 Text Classification (Naive Bayes) J&M 4.1–4.6  
8 10/24 Text Classification (Logistic Regression) & Evaluation J&M 4.7–4.9, 5.1–5.5  
9 10/29 Vector Embeddings J&M 6  
9 10/31 Intro to Deep Learning/Feedforward Networks J&M 7.1–7.5  
10 11/5 No Class (Election Day) Complete the “Pytorch Practice” activity on Moodle  
10 11/7 Modern NLP Architectures 2 (RNNs/Self-Attention/Transformers) J&M 8.1–8.3, 9.1–9.6  
11 11/12 Modern NLP Architectures 2 (Self-Attention & Transformers) Reread J&M 9.1–9.6 Matrices, Practice Exam + solutions
11 11/14 Exam 2    
12 11/19 Intro to Bias and Model Interpretability Bender et al. 2021. Optional: De-Arteaga et al 2019 Slides
12 11/21 Targeted Evaluation, Probing, and de-biasing Skim Bolukbasi et al. 2016 Optional: Cynthia Rudin Q&A, Been Kim Q&A  
13 11/26 LLMs, trust, and factuality Weizenbaum 1967 and Turkle 1984 pg.33-45. Optional: Vaithilingam et al. (2022) A14: LLM Factuality
14 12/3 Guest Lecture: Elizabeth Engle (Economics)   slides
14 12/5 Data Ethics    
15 12/10 Audits and Algorithmic Fairness   slides
16 12/16 1:30-3:30pm Final Project Poster Presentation    

Homeworks

Homework 1: N-Gram Language Modeling

Released: Sep. 17th 5pm
Due: Oct. 2nd 9pm
Enrolled students should access the Github classroom link through Moodle.

Updates
Autograding and Revision Policy

Autograding test cases can be found here with a README that explain it’s use, as well as where to find the inputs you were tested on along with expected output and detailed logs of how the outputs were computed and the data they were computed from.

I will re-run the autograder at the end of the semester to determine your implementation score for this assignment for final grades. Consider this a rolling “part 2” of the assignment. The goal is not just to give you a second chance at getting the points, but to encourage you to view multiple stages of writing code, testing code (externally and internally), and revising that code as the standard practice of writing any substantial bit of software. Plus, this is motivation to make sure an assignment you do poorly on is not something to forget about as we move forward.

Homework 2: Text Classification

Released: Oct 31st 3PM
Part 1 Due: Nov 7th 9PM
Due: Nov 19th 22nd 9PM
Enrolled students can access the Github classroom link through Moodle.

Updates
  • A few bugs in the data.py file and a typo in the README are fixed in a pull request sent through Github classroom.

Final Project

Released: Oct 29th 3PM
Proposal Due: Nov 21st 9PM
Report/Code Due: Dec 15th 9PM
Posters Due: Dec 9th 12PM
Presentations: Dec 16th 1:30–3:30PM

Guidelines can be accessed by Macalester students here.

Updates
  • TBD

results matching ""

    No results matching ""