CS 117, Winter 1997

Assignment 2: Words, due Wednesday 1/15/97

You may work with a partner for this assignment. No groups of three or more, please.

When you are done with this assignment, please submit it using the Homework Submission Program.

The problem

I often want to collect some statistics on my writing. If I'm writing a proposal that has to be shorter than 2000 words, I will want to count the number of words in each of my drafts. If I'm trying to write software documentation at a sixth-grade reading level (tragically, that's what the software experts usually recommend), I'll want to keep an eye on the number of letters and syllables in my words, the number of words in my sentences, and the number of sentences in my paragraphs.

For this assignment, you will write a program that takes a text file as input and reports

For now, we won't look at syllables, sentences, or paragraphs.

A little more detail

When you compile your program (gpc -o words words.p or something similar) and run it like so:

  words < somefile.txt

your program should produce an easy-to-read report that looks something like this:

   Number of words: 1729
   Average word length: 5.72 letters
   Longest word: electroencephalography (22 letters)
   Shortest word: a (1 letter) 

A small offering

You might be surprised to learn that the hardest part of writing this program is reading one word at a time out of the input file. Punctuation, extra spaces, ends of lines, and the end of the file itself can all cause the Pascal procedures read and readln to behave surprisingly if the programmer isn't careful. With this in mind, I have written a Pascal function ReadWord to help you out. You can see how it is used in the laboratory exercise readingWords.p. Here is ReadWord itself:

{====================================================
	ReadWord reads the next "word" from standard
	input (i.e. the keyboard, or the input file
	specified by "< file" at the Unix command line),
	returning it via the variable parameter "word".
	Here, a word consists of a block of contiguous
	letters and/or apostrophes.  Note that
	ReadWord will read past eoln when it is searching
	for the next word.

	We'll discuss this rather complicated code in
	class sometime in the next few weeks.  
 ====================================================}
procedure ReadWord( var word : string );

const	apostrophe = chr(39);     {The apostrophe character has ASCII value 39}

type	stateType = (beforeWord,duringWord,done);
  {See pages 198-200 of Abernethy & Allen to see what
   this is about.}

var		ch : char;
		state : stateType;

	function IsAlpha( c : char ) : boolean;
	begin
		IsAlpha := ((c >= 'a') and (c <= 'z')) or ((c >= 'A') and (c <= 'Z'));
	end;

begin
	word := '';
	state := beforeWord;

	while state <> done do
	begin
		if eof then
			state := done

		else if eoln then
		begin
			if state = beforeWord then
				readln
			else
				state := done
		end
		
		else
		begin
			read( ch );
			if (ch = apostrophe) or (IsAlpha(ch)) then
			begin
				word := word + ch;
				state := duringWord
			end
			
			else if state = duringWord then
				state := done
		end
	end
end;  {end of ReadWord}

That's all

Make sure to include a comment at the top of your program giving your name, the date, and a brief description of what your program does.

Start early, keep in touch, and have fun.



Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057
(507) 646-4364, jondich@carleton.edu