CS 117, Winter 1997
Assignment 2: Words, due Wednesday 4/8/98
You may work with a partner for this assignment. No groups of
three or more, please.
When you are done with this assignment, please submit it using the
Homework Submission Program.
The problem
I often want to collect some statistics on my writing. If I'm writing
a proposal that has to be shorter than 2000 words, I will want
to count the number of words in each of my drafts. If I'm trying
to write software documentation at a sixth-grade reading level (tragically,
that's what the software experts usually recommend), I'll want to keep
an eye on the number of letters and syllables in my words, the number
of words in my sentences, and the number of sentences in my paragraphs.
Actually, I just like fooling around with this kind of thing. It pleases
me to know how many words there are in Middlemarch, or
how many times the word "whilst" appears in Hamlet (326398 and 4,
respectively, by the way).
For this assignment, you will write a program that takes a
text file as input and reports
- the number of words in the file,
- the average number of letters per word,
- the longest word and its length,
- the shortest word and its length,
- the number of occurrences of the words "the",
- and the word from the file that is alphabetically last
(for example, "zyzzyvas" would probably win if it were in your
file).
For now, we won't look at syllables, sentences, or paragraphs.
A little more detail
When you compile your
program (g++ -Wall -o words words.cpp or something
similar) and run it like so:
words < somefile.txt
your program should produce an easy-to-read report that looks something
like this:
Number of words: 1729
Average word length: 5.72 letters
Longest word: electroencephalography (22 letters)
Shortest word: a (1 letter)
Occurrences of "the": 62
Alphabetically latest word: zygomorphic
A small offering
You might be surprised to learn that the hardest part of writing this
program is reading one word at a time out of the input file.
Punctuation, extra spaces, ends of lines, and the end of the file
itself can all cause trouble if you aren't careful. With this
in mind, I have written a C++ function GetWord to help
you out. You can see how it is used in the laboratory exercise
getword.cpp. Here
is GetWord itself:
//////////////////////////////////////////////////////////////////
//
// GetWord retrieves the next word from standard input (cin),
// returning true if there is a word remaining in the input,
// and false if not. In the former case, the word is
// returned via the reference parameter "word".
//
// GetWord defines a "word" to be any contiguous block of
// letters, hyphens, and apostrophes. Thus, "don't" and
// "willy-nilly" are both considered words (as they
// should be), but so is "--'--xxxx-'". Defining "word"
// better than this is possible, but complicated.
//
// We'll talk about how GetWork works some time in class.
//
//////////////////////////////////////////////////////////////////
bool GetWord( string& word )
{
char c;
cin.get( c );
while( !cin.fail() && !isalpha(c) && c != '-' && c != '\'' )
cin >> c;
if( cin.fail() )
return( false );
word = c;
cin.get( c );
while( !cin.fail() && (isalpha(c) || c == '-' || c == '\'') )
{
word = word + c;
cin.get( c );
}
return( true );
}
That's all
Make sure to include a comment at the top of your program giving
your name, the date, and a brief description of what your
program does.
Start early, keep in touch, and
have fun.
Jeff Ondich,
Department of Mathematics and Computer Science,
Carleton College, Northfield, MN
55057
(507) 646-4364,
jondich@carleton.edu