CS 117 Assignment, due 1/27/97

Word Lengths

I want you to work alone on this assignment. You may discuss your work with fellow students, but write your own code.

The program

Your goal is to write a program that will read the words from a text file and report the frequencies of the various word lengths.

For example, if the file contained the following:


  program: (noun) A magic spell cast over a computer allowing
  it to turn one's input into error messages.

your program should report that there were 2 words of length 1, 2 of length 2, 0 of length 3, 5 of length 4, etc. One way to display this information would be:


Length  Frequency
-----------------
1       2
2       2
3       0
4       5
5       5
6       0
7       1
8       3

(Note that I have considered "one's" to be a 5-character word rather than a 4-letter word. Do whatever is easiest when handling words containing apostrophes.)

Suggestions

Use an array of integers to keep track of how many words of each length there are. You might declare this array like so:

type     intarray = array[1..30] of integer;

var      frequency : intarray;
At the start of your program, you should set frequency[1], frequency[2], etc. all equal to 0. Then, read the words one at a time. For each word, add 1 to frequency[?], where ? is the length of the word. By the end of the run of the program, frequency[1] should contain the number of 1-letter words in the input file, frequency[2] the number of 2-letter words, etc. You may use ReadWord again.

You may assume that no word is longer than 30 letters.

You should probably create your own small text file for early testing of your program. Once things are working pretty well, you should try your program on the dictionary file words.txt, or on /LocalLibrary/Intel_LocalLibrary/Literature/ByTitle/CanterburyTales/Group_A/The Milleres Tale, or on any other file you might find in /LocalLibrary/Intel_LocalLibrary/Literature or on the Web. An interesting source of text files is http://www.promo.net/pg/, the home page for Project Gutenberg.

Something extra

If you get the above working, then you might want to try displaying your data in histogram form. For example, if you run your program on a dictionary file in which there are 2 1-letter words, 52 2-letter words, 514 3-letter words, 2011 4-letter words, 3275 5-letter words, etc., you could make one "x" count for 250 words, and produce a histogram like so:


1:2
2:52
3:514   xx
4:2011  xxxxxxxx
5:3275  xxxxxxxxxxxxx

etc.


We'll talk more about this program in class. Start early, keep in touch, and have fun.



Jeff Ondich, Department of Mathematics and Computer Science, Carleton College, Northfield, MN 55057
(507) 646-4364, jondich@carleton.edu