Description: Overview of modern natural language processing techniques: text normalization, language model, part-of-speech tagging, hidden Markov model, syntatic and dependency parsing, semantics, word sense, reference resolution, dialog agent, machine translation. Two class projects to design, implement and evaluate classic NLP algorithms. Credit will not be given for both CSE 398 and CSE 498.
Lectures: Tuesday/Thursday 1:10-2:25, Packard Lab 258
Office Hours: Thursday 4:30pm - 6:30 pm, Packard Lab 329
For CSE 398: (MATH 231 or ECO 045) and (CSE017), for CSE 498 instructor permission is required.
We will mainly use Java for projects. Relevant programming and math concepts will be discussed briefly only when necessary.
Formats: 1 closed-book mid-term, 2 coding projects, 4 homework assigments, 1 final presentation.
Grading: Mid-term (15%), 4 coding projects (15% each), final (25%). There is no homework or quiz. Late submissions will be penalized 20% of the total grades per late day (24 hours or part thereof) and no assignment will be accepted more than four days after its due date. The projects will graded partly based on your programs' performance in terms of metrics defined in individual projects.
SLP2= Speech and Language Processing, 2nd Edition by Daniel Jurafsky, James H. Martin.
SLP3= Speech and Language Processing, 3nd Edition by Daniel Jurafsky, James H. Martin. Most chapters freely available at Link.
FSNLP= Foundations of statistical natural language processing, by Manning, Christopher D., Schütze, Hinrich. Cambridge, Mass.: MIT Press, 2000. Paper book available at Linderman Reserve and Ebook available to Lehigh users.
Coursesite: for posting grades only Link.
Piazza: you may post your questions that can be answered by the instructor and other students Link.
This website: for general information and resources (codes, data, projects).
The following topics will be coverved (tentatively): words (language models); grammar (parts-of-speech tagging, inference and training algorithms for HMM, grammar and syntactic parsing, dependency parsing); semantics (word sense, semantics role labeling); discourse (coreference resolution and summarization); application (machine translation, conversational agents).