CS 322: Natural Language Processing

Exam

Due 5:00PM Monday 11/24. You may submit your exam on paper (either to me or to my mailbox on the second floor of the CMC) or by placing it into your hand-in folder. Any code you write should go into your hand-in folder in any case. Please put final-related electronic submissions in a folder called hand-in/final/. For electronic text, I prefer .txt, but .doc or .rtf are OK if you have something that doesn't work in straight text.

This is an open-book, open-Internet, open-library takehome exam. You may not consult with any person other than me (Jeff Ondich, to be precise) about this exam.

Several of the questions on this exam concern a tiny grammar and lexicon restricted to the domain of Ernie's breakfast.

(20 points) Parsing and its friends.
1. Run the Earley algorithm (either by hand or using the parser you wrote this term) on the sentence "Ernie eats a pancake." Show me the resulting chart.
2. How many lines of the form "# -> S ." does your chart contain? Explain what they mean.
3. Give two sequences of words that are accepted by our grammar/lexicon/algorithm but that are not grammatically correct English sentences.
4. How can you modify the Earley algorithm as presented in class so as to recognize any S, NP, or VP? For example, if you want the algorithm to accept "Ernie eats a pancake," "eats a pancake," and "a pancake," what do you do? (Hint: This requires a very simple change.)
5. After the fashion of the syntax-driven semantic analysis presented in Chapter 18, show semantic attachments for each of the rules in our grammar and lexicon. Then, show the parse tree for "Ernie eats a big pancake" with semantic attachments next to each of the non-terminal constituents in the tree. These attachments should reflect not just the rule whose left side is the constituent in question, but rather the semantic description of the entire subtree rooted at the constituent. Note that this problem will require you to figure out suitable semantic attachments for a couple grammar rules that are not included in the chart on page 591 of the book.
6. Suppose we want to model the difference between mass nouns and count nouns ("some food" vs. "three pancakes") using feature structures and unification. Our goal in the context of Ernie's breakfast is to prevent "Ernie eats pancake" and "Ernie eats a food" from being accepted.
  
  Add suitable feature structures to the grammar and lexicon. Note that you will not need to add feature structures to every rule.
  
  Show the parse trees for "Ernie eats a pancake" and "Ernie eats pancake" with unified feature structures drawn next to each constituent in the tree. Again, this is the tree where all the unification operations have been performed from the bottom up, rather than a simple labeling of the non-terminals with their associated feature structures from the grammar and lexicon. Note that some constituents will have no feature structure, while some may be labeled "failure to unify" or something like that.
(12 points) N-gram smoothing.
We didn't discuss smoothing in much detail this term, so here's an exercise to give you a little hands-on experience with smoothing. These questions refer to the hickory corpus.
1. Show a complete chart of the bigram frequencies found in this corpus.
2. Show a complete chart of the bigram probabilities found in this corpus.
3. Show a complete chart of the bigram probabilities obtained by applying add-1 smoothing the bigram counts in question (a).
(12 points) Wordnet.
We have installed WordNet 3.0 at /usr/local/WordNet-3.0. Before working on these exercises, you should add /usr/local/WordNet-3.0/bin to your path.

Using the WordNet command-line tools

Using the "User Commands" documentation in the WordNet 3.0 Reference Manual, try to answer the following questions.
1. List the senses of the noun "cat".
2. List the synonyms of the noun "cat".
3. List the hypernyms of the noun "cat".
4. List the hyponyms of the noun "cat".
5. What is a meronym? Does the noun "cat" have an meronyms in the WordNet database? If so, show them. If not, find a word that does have meronyms and show them.
Using PyWordNet

PyWordNet is a Python module for WordNet, and it is installed on the Macs in our labs. The PyWordNet home page has examples, but not much documentation beyond that. If you are so inclined, you can study the source code in /Library/Python/2.5/site-packages/wordnet.py and /Library/Python/2.5/site-packages/wntools.py.

You may find it helpful to read and run this small test of PyWordNet.
1. Show how to find a list of car parts using PyWordNet code.
2. Write a Python function that takes two words as parameters, determines whether they share an parent hypernym (i.e. exactly one level up the hypernym hierarchy), and if so, report what it is. For example, if the words are "feline" and "canine", then they certainly should share a parent hypernym, but "moose" and "thought" probably do not.
(3 points) The Continuing Education of Jeff.
I like to keep track of what college students think is interesting and cool, even though I'm old and clueless. I'd like your help in my on-going education. What do you think I should read or watch or listen to if I want to know what's interesting in the world these days?
(8 points) Yoda.
I promised long ago that we would investigate the mechanisms involved in transforming ordinary sentences into sentences more likely to be spoken by either Yoda or Rob Oden. Unfortunately, I have not been able to collect a sufficient number of really characteristic Oden sentences, so I'll have to keep working on that for the next time I teach this course.

Fortunately, there are plenty of Yoda quotes available. Here are a few interesting sentences:
- Size matters not.
- Judge me by size, do you?
- Ready are you?
- My own counsel will I keep.
- Named must your fear be.
- The dark side I sense in you.
Your job for this problem is to create a detailed outline of a computational process you would use to translate normal sentences into Yoda sentences (or vice versa, if you prefer). You will probably need to describe relevant grammar rules, and some sort of transformation operations to be performed on parse trees once they are created. Also, you should make note of the kinds of errors your process is likely to make.

CS 322: Natural Language Processing

Exam

Using the WordNet command-line tools

Using PyWordNet