CS 111: Introduction to Computer Science

Word Statistics

Your program should be called wordstats.py. Please submit it by copying it to your hand-in folder (Courses/f07/cs/cs111-00-f07/Student Work/yourusername/hand-in/).

The problem

I often want to collect some statistics on my writing. If I'm writing a proposal that has to be shorter than 2000 words, I will want to count the number of words in each of my drafts. If I'm trying to write software documentation at a sixth-grade reading level (tragically, that's what the software experts usually recommend), I'll want to keep an eye on the number of letters and syllables in my words, the number of words in my sentences, and the number of sentences in my paragraphs.

Actually, I just like fooling around with this kind of thing. It pleases me to know how many words there are in Middlemarch, or how many times the word "whilst" appears in Hamlet (326398 and 4, respectively).

For this assignment, you will write a program that takes a text file as input and reports

For now, we won't look at syllables, sentences, or paragraphs.

A little more detail

When you run your program, the session should look something like this:

For what file do you wish to collect statistics? somefile.txt Number of words: 1729 Average word length: 5.72 letters Longest word: electroencephalography (22 letters) Occurrences of "and": 62 Alphabetically latest word: zygomorphic

A small offering

You might be surprised to learn that the hardest part of writing this program is reading one word at a time out of the input file. Punctuation, extra spaces, ends of lines, and the end of the file itself can all cause trouble if you aren't careful. With this in mind, I have written a Python module words.py and a test program printwords.py that uses words.py. You may use these programs as a starting point for your own program.

That's all

Make sure to include a comment at the top of your program giving your name(s), the date, and a brief description of what your program does.

Start early, keep in touch, and have fun.