Work with the same partner(s) you worked with for Phase 1.
Relevant files
For your convenience, here are the files you'll need for this assignment.
Goals
- Discuss design considerations when creating an interface for a books data source.
- Understand the benefits of creating such an interface.
- Create unit tests for the data source interface using the
unittest module
(also known as PyUnit).
- Think about how to create unit test collections that cover a wide range of inputs, contexts,
and possible errors.
Test-driven development
The purpose of Phases 2 and 3 of the Books assignment is to give you an introduction to
test-driven development (TDD).
Roughly, the process goes like this:
- You start with a class whose interfaces have been written and agreed upon. (For our
purposes, an interface will refer to a method signature plus the descriptive
comment that goes with it.)
- You write a collection of unit tests
to thoroughly test the agreed-upon interfaces. You do this before implementing
the interfaces.
- You implement the interfaces, using the unit tests to help you debug and to give you
a way to determine whether you're done with the implementation.
For us, the class will be called BooksDataSource, and its purpose will
be to provide Python programmers with convenient access to the data in our books dataset.
The trick to writing good unit test suites is to think deeply about the many ways your
interfaces might be called. Your tests should, for example, test typical cases,
weird cases, and illegal cases. (For a really simple example, a unit test suite for a square-root function
ought to include attempts to compute the square-roots of positive integers, positive non-integers,
negative numbers, and zero, and depending on the
language and the completeness of the interface specification, maybe non-numerical input.) You
should think hard about the mistakes programmers can make, but also about the ways
malicious programmers might try to exploit errors or omissions in your code.
Your job for Phase 2
- In class on Monday, we started talking about how we might design an
interface for a BooksDataSource class. On Wednesday, we will discuss
my choices for this interface, why I made them, and the pros and cons of my choices.
Regardless of those pros and cons, this is the interface we will use for phases 2 and 3
of the books project. Since you can't do a credible job of testing or implementing an
interface until you know its specifications, you should read this whole Python file carefully
before you begin.
- Copy booksdatasource.py to your books folder in your git repository. Then leave it untouched
for the remainder of Phase 2.
- Create a new Python file called booksdatasourcetest.py, with a class called
BooksDataSourceTest inheriting from unittest.TestCase. You may use
primecheckertests.py as a template to get started.
- Implement a thorough collection of unit tests for the non-constructor methods in BooksDataSource.
The goal of these tests is to provide as wide a range of unit tests as you can think of. Don't
repeat yourself (e.g. you probably wouldn't need to test both square_root(3.0) and square_root(5.0)),
but also don't be shy about writing lots of tests. Effectively probing the potential
vulnerabilities of an interface usually takes lots of little tests.
- Include all the CSV files containing your test data books directory.
- Put a Makefile in your books directory, and include in it a test target
so you (and I, and the grader) can just type "make test" in your books directory to
run your unit tests. Note that this won't work unless your data files are present.
- Once you're happy with your unit tests, tag your repo with the tag books_phase2.
IMPORTANT:
During Phase 2, you may not change BooksDataSource in any way.
(The graders and I will test this by using the Unix command diff, by the way.)
In Phase 3, you will implement the methods in the BooksDataSource
class, during which process you may add new methods to BooksDataSource if you wish,
but you may not change BooksDataSource's original method signatures during either
Phase 2 or Phase 3.
The data files
Take a look at the description of the data files in the comment to the __init__ method
in booksdatasource.py. You'll see that I have
separated out the books and authors into three (yes, three) CSV files:
books.csv, authors.csv,
and books_authors.csv. This structure gives
each book and ID number, each author an ID number, and provides a more consistent and
generalizable way of connecting books to authors. Here's a snippet of each file to give
you the idea:
books.csv:
41,Middlemarch,1871
6,Good Omens,1990
...
authors.csv:
5,Gaiman,Neil,1960,NULL
6,Pratchett,Terry,1948,2015
22,Eliot,George,1819,1880
...
books_authors.csv:
41,22
6,5
6,6
...
That is, author 22 wrote book 41, and authors 5 and 6 wrote book 6. There are many
benefits to this data organization, and we'll revisit those benefits in a week or two when we
start talking about relational databases. But for now, these are simply CSV files
with a new format that you'll need to deal with.
You may use my data files as-is, or you may prefer to create your own data files with the
same format but data more specifically tuned to your unit tests.
Good luck
Start early, ask lots of questions, and have fun!