CS 111: Introduction to Computer Science

Takehome exam. Due 9:40AM Friday, November 13, 2009.

This is an exam. You may not speak with anybody other than Jeff Ondich about its contents. You may, however, use your book, your notes, the books in the library, the Internet, and divine guidance (should any be offered to you). If you obtain information from some source other than your own brain, cite your source and use quotation marks where appropriate.

Hand your exam in on paper.

(8 points) Sections 11.6.1 through 11.6.3 of your textbook include a discussion of an important Python structure called a dictionary. Your job for this problem is to read about dictionaries, and then write a Python program that does the following:
- Opens the file names.txt, which consists of lines of the form "Last name,First name,Middlename", where the middle name is usually but not always the empty string. (These names happen to be American film actresses taken from a Wikipedia page on that topic, though your program will work with any similarly formatted file of names.)
- Reads the file one line at a time, keeping track of how many times each first name appears in the file. (This part uses a dictionary.)
- Closes the file.
- Prints the first names in descending order of the number of times they appeared in the file.
Hand in a copy of your program on paper, but also hand it in electronically in the usual way. Please call it dictionary.py.
(6 points) I have a list of positive integers, and I want to know the largest product that can be formed by two of the integers. For example, if the list is [3, 5, 1, 8, 10, 6], then the product I'm looking for is 80 (i.e. the product of 8 and 10). Elmo suggests the following code, which does the job:
```
    def elmo(theList):
        N = len(theList)
        biggestProduct = 0
        for k in range(N):
            for j in range(N):
                if theList[j] * theList[k] > biggestProduct:
                    biggestProduct = theList[j] * theList[k]

        return biggestProduct
    
```
- When I used Elmo's code on an array of 10000 integers on my slow old computer at home, it took 4.0 seconds (I tried it several times--4.0 seconds every time). Approximately how long will Elmo's code take to run if N is 35000 (assuming I use the same computer)? Justify your answer. (It is not sufficient to tell me that you ran the code yourself and it took a certain amount of time. I want you to explain why it will take as long as you say it will.)
- Not surprisingly, given that it's written by Elmo, Elmo's code is silly. Describe an algorithm that will run significantly faster than Elmo's, and explain why it's faster.

(8 points) In this problem, you will use sorter.py to compare the performance of Selection Sort and Insertion Sort.

As we discussed in class, selection sort and insertion sort are members of a class of sorting algorithms whose performance is usually measured by counting the number of list element comparisons that need to be done to complete the sorting of the list. That is, we count the number of times something like "a[j] < a[k]" gets executed during the run of the algorithm. When we count comparisons, we are counting the number of iterations of the inner loop, and by counting inner loop iterations, we are counting the most time-consuming portion of the algorithm.

To see how this plays out, try running "python sorter.py selection 10 verbose" or "python sorter.py selection 1000" to see a report of the sorting times and number of comparisons for a 10-element list or a 1000-element list. (The "verbose" causes the program to print out the list before and after sorting, to help you believe that sorting is actually taking place.) To try this with Insertion Sort, replace "selection" with "insertion".

In what follows, you're going to count comparisons and measure running times for the two algorithms and several values of N.

Fill in the following chart. For the "Unshuffled" columns, you should comment out the line that calls the shuffle function. This will measure the running times and comparison counts when you sort a list that is already sorted before you begin (an admittedly odd thing to do).

N	Comp. count for Sel. Sort	Running time for Sel. Sort	Comp. count for Sel. Sort Unshuffled	Running time for Sel. Sort Unshuffled	Comp. count for Ins. Sort	Running time for Ins. Sort	Comp. count for Ins. Sort Unshuffled	Running time for Ins. Sort Unshuffled
5000
10000
15000
20000

Can you use the comparison counts to predict the running times? If so, how? If not, why not?
Is the comparison count for selection sort dependent on the initial data? (For example, if you run selection sort several different times on different randomly generated arrays of 10000 items, is the comparison count the same every time?) How about insertion sort?
If you knew that your starting data would be nearly sorted (say, from a sorted list that had a few new items added to the end), which algorithm would you use?

(2 points) I'm going to do a little relaxing over break. Could you recommend a book for me to read?
(8 points) Steganography is the art of hiding messages in places where people will be unaware there is a message at all, let alone an encoded message. In this exercise, you will work with a form of steganography that involves hiding messages in digital images.

The image ox-with-message.png is my usual Babe the Blue Ox photograph, with a slight modification. The red values of the pixels in the leftmost column have been altered to hold a message consisting of a sequence of 8-bit ASCII characters. Each character is represented by 8 pixels, where an even red value represents a 0 bit, and an odd red value represents a 1 bit. For example, if the top 8 pixels in the left column of the photograph have red values equal to 50, 51, 48, 48, 50, 52, 52, 51 (i.e. even, odd, even, even, even, even, even, odd), then the first character in the hidden message is the character whose binary ASCII representation is 01000001 (that is, the capital letter A).

Your job is to write a program to extract the hidden message from the modified ox photograph. Write the message on your test paper, and submit your program as usual. Please call the program problem5.py.