CS 257 Assignment

Work with the same partner(s) you worked with for Phase 1.

Relevant files

For your convenience, here are the files you'll need for this assignment.

booksdatasource.py
books.csv
authors.csv
books_authors.csv
primecheckertests.py (a model you might use as a basis for your unit tests)

Goals

Discuss design considerations when creating an interface for a books data source.
Understand the benefits of creating such an interface.
Create unit tests for the data source interface using the unittest module (also known as PyUnit).
Think about how to create unit test collections that cover a wide range of inputs, contexts, and possible errors.

Test-driven development

The purpose of Phases 2 and 3 of the Books assignment is to give you an introduction to test-driven development (TDD). Roughly, the process goes like this:

You start with a class whose interfaces have been written and agreed upon. (For our purposes, an interface will refer to a method signature plus the descriptive comment that goes with it.)
You write a collection of unit tests to thoroughly test the agreed-upon interfaces. You do this before implementing the interfaces.
You implement the interfaces, using the unit tests to help you debug and to give you a way to determine whether you're done with the implementation.

For us, the class will be called BooksDataSource, and its purpose will be to provide Python programmers with convenient access to the data in our books dataset.

The trick to writing good unit test suites is to think deeply about the many ways your interfaces might be called. Your tests should, for example, test typical cases, weird cases, and illegal cases. (For a really simple example, a unit test suite for a square-root function ought to include attempts to compute the square-roots of positive integers, positive non-integers, negative numbers, and zero, and depending on the language and the completeness of the interface specification, maybe non-numerical input.) You should think hard about the mistakes programmers can make, but also about the ways malicious programmers might try to exploit errors or omissions in your code.

Your job for Phase 2

In class on Monday, we started talking about how we might design an interface for a BooksDataSource class. On Wednesday, we will discuss my choices for this interface, why I made them, and the pros and cons of my choices. Regardless of those pros and cons, this is the interface we will use for phases 2 and 3 of the books project. Since you can't do a credible job of testing or implementing an interface until you know its specifications, you should read this whole Python file carefully before you begin.
Copy booksdatasource.py to your books folder in your git repository. Then leave it untouched for the remainder of Phase 2.
Create a new Python file called booksdatasourcetest.py, with a class called BooksDataSourceTest inheriting from unittest.TestCase. You may use primecheckertests.py as a template to get started.
Implement a thorough collection of unit tests for the non-constructor methods in BooksDataSource. The goal of these tests is to provide as wide a range of unit tests as you can think of. Don't repeat yourself (e.g. you probably wouldn't need to test both square_root(3.0) and square_root(5.0)), but also don't be shy about writing lots of tests. Effectively probing the potential vulnerabilities of an interface usually takes lots of little tests.
Include all the CSV files containing your test data books directory.
Put a Makefile in your books directory, and include in it a test target so you (and I, and the grader) can just type "make test" in your books directory to run your unit tests. Note that this won't work unless your data files are present.
Once you're happy with your unit tests, tag your repo with the tag books_phase2.

IMPORTANT: During Phase 2, you may not change BooksDataSource in any way. (The graders and I will test this by using the Unix command diff, by the way.) In Phase 3, you will implement the methods in the BooksDataSource class, during which process you may add new methods to BooksDataSource if you wish, but you may not change BooksDataSource's original method signatures during either Phase 2 or Phase 3.

The data files

Take a look at the description of the data files in the comment to the __init__ method in booksdatasource.py. You'll see that I have separated out the books and authors into three (yes, three) CSV files: books.csv, authors.csv, and books_authors.csv. This structure gives each book and ID number, each author an ID number, and provides a more consistent and generalizable way of connecting books to authors. Here's a snippet of each file to give you the idea:

books.csv: 41,Middlemarch,1871 6,Good Omens,1990 ... authors.csv: 5,Gaiman,Neil,1960,NULL 6,Pratchett,Terry,1948,2015 22,Eliot,George,1819,1880 ... books_authors.csv: 41,22 6,5 6,6 ...

That is, author 22 wrote book 41, and authors 5 and 6 wrote book 6. There are many benefits to this data organization, and we'll revisit those benefits in a week or two when we start talking about relational databases. But for now, these are simply CSV files with a new format that you'll need to deal with.

You may use my data files as-is, or you may prefer to create your own data files with the same format but data more specifically tuned to your unit tests.

Good luck

Start early, ask lots of questions, and have fun!

CS 257: Software Design

Books, Phase 2: unit tests

Folder name: books