CS 204: Software Design

Due 5:00PM Monday, 8 June 2009

This is not a takehome exam, so you may speak with each other about it. However, please do the work without a partner.

A miscellany

Processing XML files

A lot of the data on computers these days is stored in XML form. XML is many things, but at its simplest level, it is a system of "tags" (<tagname>...</tagname>) that may be nested around data (<a>data in a <b>data in b in a</b>more data in a</a>) and may include "attributes" (like the href attribute in <a href="http://carleton.edu">Carleton College</a>). If you search your computer for "*.xml" files, you'll find tons of them. Your iTunes library, for example, or left-over RSS feed files or various inscrutable system files.

Python comes standard with a nice XML parsing library. Here is a sample program that just counts the opening tags in an XML file.

For this assignment, I want you to extend my sample program (or rewrite it completely, if you prefer) to perform the following tasks on XML files.

Profiling

Profilers are programs that help you determine which portions of you programs are taking how long to run. If you write a program that doesn't run fast enough, and you've done everything you can think of to squeeze out better performance, it's time to pull out the profiler and see if it can point you to trouble spots.

Python comes standard with a profiler module called cProfile.

For this part of your assignment, I want you to try to improve the performance of some Python code by using cProfile. One possible challenge would be to improve the performance of John Zelle's graphics.py library that we use in CS 111. For example, the graphics lab from my current CS 111 section included an animation of a bouncing Circle, a polygon drawer, etc. These all depend on Zelle's library. Could they be made more efficient by modifying the library?

Another possible target would be your own RSS filter from earlier this term. The network access may be the slowest part of the program, but you could use cProfile to profile your filtration code to see if there are ways to speed it up.

Whatever program you choose to profile and improve, here's what I want you to hand in:

Hungarian notation

Six months after graduating in 1998, one of our CS majors who went off to become a programmer wrote to me raving about a paper that I needed to read and then start teaching to my students. The paper, Hungarian Notation by Charles Simonyi of Microsoft, describes Simonyi's naming system for identifiers (variables and constants, in particular).

Read Simonyi's paper, and then answer the following questions.