CS 322: Natural Language Processing

Course Information

Textbook

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edition, by Daniel Jurafsky and James H. Martin. Good book.

The Plan

This course will be organized around a sequence of problems chosen to give you experience with a collection of core NLP techniques. Our interaction with each problem will go roughly like this:

  1. I'll describe the problem briefly in class.
  2. We will discuss approaches to the problem based on whatever ideas you have, just to get a feel for the space of solution possibilities.
  3. I'll introduce a particular approach that I would like you to pursue (often, I imagine the class will have foreshadowed or outright named the approach I have in mind during step 2).
  4. You will go out and start working to write code or use existing software to solve the problem.
  5. While step 4 is going on outside of class, I'll lecture on the core solution techniques, answer questions about ideas or problems you're having, demonstrate relevant software, etc.
  6. If appropriate to the problem, you'll collect data to evaluate the success of your solution and submit a report (including code, if any).
  7. We'll spend a class day having each group report on its experiences and results.

For most of the problems, I'm going to ask you to work in groups of two or three, partly to make our wrap-up discussion work better, and partly because having somebody to bounce ideas off is very valuable for these sorts of problems. That said, I'll give you a break or two from partner work.

Here are the problems. Since each one will take somewhere from 3 to 7 class days, we'll probably be able to fit about 5 problems into the term, but we'll see how it goes. Close to the end of the term, I'll give you a take-home exam to give you the opportunity to revisit the core ideas of the course.

  1. Document Classification. Can n-gram language models be used to detect the difference between a paragraph from the Washington Post and a paragraph from an Agatha Christie novel? This problem will introduce not just n-grams, but also some essential techniques for evaluating the effectiveness of NLP algorithms.
  2. Spelling Suggestions. How do you decide to issue error messages like "Did you mean 'receive'?" or "Did you mean 'emu handler'?" There are lots of ways to do this, but we'll use a dynamic programming technique called minimum edit distance as part of our solution.
  3. Yoda-fication, Elmo-fication, and Oden-ification of Sentences. How can you create a tool that will make predictable transformations of sentences? For Yoda, for example, one might devise a tool to turn sentences like "He is too impatient" to "Too impatient he is". Just as with Eliza, we could use a collection of little tricks to pull off any particular transformation goal. But we'll take a more general route by first using a parser to determine a parse tree describing the syntax of a sentence, and then applying tree-transformation rules to the parse tree to obtain the transformed sentence. (Note that though this problem has a silly goal, a more complex version of the same problem could be used to do important parts of machine translation.)
  4. Interlude: everybody gets to find a cool NLP tool out in the world and show it to the class.
  5. Part-of-speech Tagging. Given a sentence, mark each word with a part of speech (or a list of parts of speech accompanied by probabilities). Once again, there are many approaches to this problem, but we will use this problem to motivate a study of Hidden Markov Models, which are a very important tool in NLP.
  6. [Something about Semantic Analysis, not yet decided]
  7. What Does "it" Mean? (a.k.a. Anaphora Resolution) Given a sequence of sentences, identify the pronouns, and figure out which noun each pronoun points to.
  8. Real-Word Spelling Error Detection/Correction. Go ahead and reed a book, take a wok, or bear you're sole.

Grading

Your grade in the course will be determined by your reports the problems we work on (85%) plus a take-home exam (15%).