Morphological Parsing
Due Wednesday 4/4/01, by class time. Submit your code via
HSP .
Be prepared to do a brief (five minutes) demo of your program
in class, including explanation of your approach to the problem
and any features of your solution that strike you as interesting.
I encourage you to work with a partner or two on this and
all non-exam assignments for this class.
The Input
Your program will take three text files as input.
- A file consisting of nouns whose plural forms are irregular.
Each line of this file will contain a singular noun followed by
a comma, followed by the plural of the noun.
- A file consisting of one noun per line. The nouns may be
singular or plural, regular or irregular. For effective testing
of your program, you'll want some of each.
- A file consisting of one lexical representation of a noun
on each line. For example, one such line might be
"goose +N +PL".
The Output
Your program should take the names of the three input files as
command-line arguments (for your users' convenience, please provide a
correct usage statement if the program is executed with other
than three arguments). The program should then deliver to standard
output (cout) first a list of the lexical representations of
the lines in file number 2, followed by the surface representations
(i.e. the correct spellings) of the lines in file number 3.
For example, if file 2 is
ox
oxen
ax
axes
socks
sock
and file 3 is
goose +N +PL
moose +N +PL
excuse +N +SG
truce +N +PL
truss +N +PL
then the output should be something like
ox -> ox +N +SG
oxen -> ox +N +PL
ax -> ax +N +SG
axes -> ax +N +PL
socks -> sock +N +PL
sock -> sock +N +SG
goose +N +PL -> geese
moose +N +PL -> moose
excuse +N +SG -> excuse
truce +N +PL -> truces
truss +N +PL -> trusses
Advice
Try to develop a single finite state transducer that can be used
in both directions. It's good to think of FSTs as two-way
translators (among other things--see page 72 of Jurafsky and Martin).
Also, I would develop a small FST class interface to isolate the
translation code inside a small number of well-chosen functions.
The non-deterministic FSA search code on page 44 might be handy
(in modified form, to be sure).
Work together. Need help finding a compatible work partner?
Let me know.
Need an example to help you handle command-line arguments? Try
slowmapwordcounter.cpp (developed
to illustrate the C++ Standard Template Library "map" template, but
it includes a small amount of command-line handling code).
Ignore my advice if it seems appropriate to do so.
Have fun, start early, and keep in touch.
Jeff Ondich,
Department of Mathematics and Computer Science,
Carleton College, Northfield, MN
55057,
(507) 646-4364,
jondich@carleton.edu