CS 257 Assignment

Work with a partner (or two, if that's how it shakes out). You should plan to stick with this partner for phases 1, 2, and 3 of the Books assignment.

Goals

Learn how to work with command-line arguments in Python
Learn the basics of reading comma-separated values files using Python's csv module

The data: books, authors, and comma-separated values

Take a look at books.csv, a file full of data about books and authors. There are only a couple dozen books in this dataset, so you wouldn't want to base an important book-related application on it. But for learning about how to manipulate datasets like this, a couple dozen books will be plenty.

The format of books.csv is known as comma-separated values (CSV). CSV is a very simple format used to store tables of textual data. Each line of text represents a row in the table, and the fields/columns in each row are separated by commas. These few lines illustrate the principle: "title,publication year,author"

Jane Eyre,1847,Charlotte Brontë (1816-1855) To Say Nothing of the Dog,1997,Connie Willis (1945-) The Stone Sky,2017,N.K. Jemisin (1972-)

The only thing that makes CSV at all tricky is when the data in one of the table cells contains either a comma or a newline character. For example, consider the novel "Right Ho, Jeeves" by P.G. Wodehouse. If you just comma-separate the fields, you get this:

Right Ho, Jeeves,1934,Pelham Grenville Wodehouse (1881-1975)

which will make software misinterpret " Jeeves" as the second column of this row, instead of the tail end of the first column. CSV solves this problem with quotation marks:

"Right Ho, Jeeves",1934,Pelham Grenville Wodehouse (1881-1975)

But of course now you have the question of what happens if the title of your book includes quotation marks. You should read up on how CSV handles these situations.

For the Books assignment, phases 1, 2, and 3, you'll be using the books.csv file as your database. Your programs will read data from this file as needed to satisfy the requirements of the assignment. To do this, you'll use Python's csv module.

Command-line arguments in Python

Writing programs that use command-line arguments to determine their behavior is an important skill. In my day-to-day life as a programmer, I write a lot of short programs (and some long ones) to do all manner of tasks for me. Sometimes in very very short programs that I plan to run exactly once, I'll hard-code input values into the program. Those programs are often like: "Open file something.txt, read its contents, do something with the contents, and print out the results". In cases like this, I'll often just put the "something.txt" right in the code.

But even when I expect to run a program only once, I generally have to run it a few times during debugging, and then I often find that it's more useful than I thought, and I end up running it on multiple different input files, sometimes sorting the output one way, other times sorting the output another way, and so on. In such cases, I always wish I had taken the one or two minutes it would have required me to set up a sensible command-line argument syntax for the program.

For the Books assignment, all three phases, I will specify a command-line syntax your code must implement. Among other things, this will enable me to automate the testing of the whole class's programs because they'll all follow the same syntax. But more important, a well-designed and correctly implemented command-line syntax will make it easier for you and your users to use your programs, which in turn will make your software more useful to human beings, which is the point of software development.

So what's the assignment?

For phase 1 of the Books assignment, you will:

Choose one partner's git repository and create in it a top-level folder called books
Name your Phase 1 program books1.py and put it in your books folder.
Implement this command-line syntax for books1.py:
python3 books1.py input-file action [sort-direction]
where
- input-file is any CSV file with format like that of books.csv. This will enable you to make even smaller data files for testing purposes.
- action can be either "books" or "authors".
- sort-direction can be "reverse" or "forward" or nothing at all. Note that the square brackets around [sort-direction] in the syntax above are Unix documentation's way of indicating an optional command-line argument.
- If action is books, then your program will print out the titles of all the books, one per output line, sorted in case-sensitive lexicographic order (i.e. the default order used to sort strings in Python's sort method and sorted function).
- If action is authors, your program will print the authors' names, one per output line, just as they're included in the data, but without the dates (e.g. "Connie Willis", not "Connie Willis (1945-)"), sorted by last name. For this phase, we'll assume that the last word in the name is the last name (e.g. "Gabriel García Marquez" will be treated as having last name "Marquez", not "García Marquez").
- If sort-direction is reverse, then your books or authors list will be printed in reverse lexicographic order (based on title or last name, as appropriate to the action). Otherwise (either sort-direction is forward, or it's absent), print the output in forward lexicographic order.
- If the command-line syntax is incorrect, your program should print a usage statement that briefly describes the correct command-line syntax. This statement should be printed to standard error:
  print('Usage: blah blah blah', file=sys.stderr)
When you're satisfied with your program and everything is committed, tag your repository with the tag books_phase1.

Suggestions

The official Python documentation for the csv module includes some good, simple example code.
It's also good to search the internet for things like "python csv examples", but be careful to pay attention to cues about the credibility of whatever websites you land on.
When you run a Python program and import the sys module, you have access to a list of strings called sys.argv. This list includes all the command-line arguments: sys.argv[0] is the name of your program file (e.g. "books1.py"), sys.argv[1] is the first argument after that (e.g. the input file name for this assignment), etc. I recommend using sys.argv directly for this assignment.
Do not be seduced by the many tutorials and blogs that will encourage you to use the getopt module or the argparse module to do your command-line parsing. Once you get the hang of using sys.argv directly, and once you start designing complicated command-line syntaxes for more complicated programs, then argparse can be a good choice. But for now, just use your existing knowledge of how to deal with a list of strings, which is all sys.argv is.
Don't try to make this program in any way more complicated than what I've described above. If you're inclined to keep working once you have the program functioning, use your extra energy to make your program as simple as possible.

Start early, ask questions, and have fun!

(And don't forget Slack—our #questions channel is meant for you!)

CS 257: Software Design

Books, Phase 1: CSV and command-line arguments

Folder name: books

Goals

The data: books, authors, and comma-separated values

Command-line arguments in Python

So what's the assignment?

Suggestions