CS 204: Software Design

Mailing addresses and automated testing

Test system due 5:00PM Friday, April 24. Program due 11:59PM Monday, April 27.

The problem

When you order software on-line, you are usually asked for your mailing address, even though you're going to download the software. Some people reject this request, considering it none of the vendor's business. Others enter gibberish or false addresses. But in my experience, most people dutifully fill in their addresses. From the vendor's point of view, there are at least two good reasons to ask for an address: taxes and market research. (My company pays less tax on non-US sales, and more tax on Minnesota sales, so we need documentation. Furthermore, we can learn a lot about the most promising places to advertise by tracking sales by country.)

For this assignment, I want you to imagine the following problem. You are given a text file, each line of which consists of:

e-mail address [tab] mailing address

The task is to identify the country represented by each line, along with the state or province if the country is the US or Canada. This task can be trickier than it sounds. For example, there's a France Avenue in the Twin Cities, and the two-letter abbreviations of Canada and California are the same. Furthermore, people often leave off the country name, especially if they're in the US.

You should assume that the output format for this program looks like this:

e-mail address [tab] mailing address [tab] country abbreviation [[tab] state/province abbreviation]

That is, each line will repeat the data from the corresponding line of the input file, and then add the ISO two-letter country abbreviation computed by your program, and then (if the country is the US or Canada) the two-letter state or province code. If the program is unable to identify the country, use ?? as the country code. If the program identifies the US or Canada, but is not able to identify the state or province, use ?? for the state/province.

Country names (in English and French) and the corresponding abbreviations can be found on this list of ISO country codes. You can search for state and province abbreviations, too.

Your jobs

  1. Your first job is not to write a program to solve this problem. Instead, I want you to develop test data and an automated system for performing your tests. You may assume that the program you are testing reads from standard input and writes to standard output, and that the input and output formats are as described above. You may also assume that the program is invoked by the command "python getcountry.py".

    Your testing system is going to be simple. First, create one or more text files consisting of input data. Second, create a corresponding file or files showing the expected output from running your input file(s) through the getcountry.py program. Finally, create a Makefile that will run getcountry.py on the input data and compare the output to the expected output data, reporting any differences. (I wonder if there's a Unix command that compares files and reports differences between them...?)

    By designing the test system as described in the previous paragraph, you can focus most of your attention on coming up with a thorough set of test cases. You'll want to test boundary cases, ambiguous cases, respectably normal cases, cases typed by evil people trying to break your software, etc. Spend some time thinking about ways your program could fail (even though you haven't written it yet), and create test cases to probe those areas of potential failure.

    Turn in your test code and data in a folder called "addresstest", and include a short readme.txt file describing what the tester needs to do to use your test system.

  2. Your second job, not surprisingly, is to write a Python program to solve the country/state identification problem described above. Turn in your source code, any required data (e.g. a file of ISO country codes) plus a readme.txt in a folder called "addresses". In the readme.txt, describe your program's command-line syntax, and write a brief paragraph about whether you found your test system helpful in the development of the program.

Have fun.