CS 334: Design Theory: Programming

Administrative info
- Pseudocode help
Compute closure
Perform decomposition
Optional (not for grading)

Administrative info

This assignment should be done in pairs, if you are working in a pair.

Note carefully: if you are working in a pair, this must be done with the two of you working side-by-side. Take turns typing; perhaps for each query you might swap the keyboard back and forth, or perhaps you might set a timer and swap every 15 minutes. Do not work on queries separately from each other and combine at the end.

This homework will be graded anonymously via Moodle's anonymous marking feature, so do not include your name anywhere in your submission. We will assume that you are working in your assigned pair, if you have one; if you end up doing the assignment individually for some reason, indicate so in a comment in your submission without actually referring to your own name. (You can also contact me separately if need be.) If you are working in your regularly assigned pair, only one of you should submit the assignment.

Pseudocode help

Make sure to notice that pseudocode for all of these exercises is in chapter 8 of your textbook, so make sure to look there. In some cases, multiple versions are presented in varying levels of difficulty/efficiency.

Compute closure

Write a Python function to calculate the closure of a list of functional dependencies for a specific relation schema. In order to keep everything easier to read and avoid tons of quotation marks, we'll use numbers for attributes instead of letters. For example, a typical functional dependency might be 1 2 -> 3 4 5.

To further make your life easier,I'm going to supply some starter code:

Attributes are handles purely as integers.
sets of attributes are handled via my class atset, which is a subclass of a Python frozenset. A frozenset works pretty much the same as a Python set, but it's immutable, and more imoprtantly for our task at hand, it's hashable which means we can use them in dictionaries.
functional dependencies are handled via my class fd, which just keeps track of the left-hand-side (LHS) and right-hand-side (RHS) of the functional dependency each as an atset.
sets of functional dependencies are handled via my class fdset, which is just a subclass of set that makes output prettier.

Why all of this starter code? Try it without it. The catch is that you're going to want to do things like maintain a set of FDs. The problem is that Python sets can only store items which are hashable. My experience with this assignment is that without some sort of framework like the above, your code devolves into a tangled mess of converting back and forth between lists, tuples, and sets.

So to represent the FD 1 2 -> 3 4 5 and store it in a set of FDs, you would do so as:

fds = fdset()
fds.add( fd(atset([1,2]), atset([3,4,5])))

The textbook provides a number of different algorithms for calculating the closure of a set of FDs. You do not need to be concerned about efficiency; you can choose whichever algorithm you like. If you are looking for an extra challenge, however, feel free to implement this as efficiently as you can.

Your function should specifically be called closure and should be used in the following form:

fds = fdset()
fd1 = fd(atset([1]),atset([2]))
fd2 = fd(atset([2,3]),atset([4,5]))
fds.add(fd1)
fds.add(fd2)
print(closure(atset([1,2,3,4,5]),fds))

Perform decomposition

Write a Python function to decompose a relation into BCNF via a lossless-join decomposition. You'll specify the attributes in the relation, and a set of functional dependencies. For example:

fds = fdset()
fd1 = fd(atset([1]),atset([2]))
fd2 = fd(atset([2,3]),atset([4,5]))
fds.add(fd1)
fds.add(fd2)
print(bcnf(atset([1,2,3,4,5]),allfds))

The bcnf function should return a relnset, which is another class I'm providing. It's also a subclass of set that prints things out in a readable way. It's my intention that each relation that you add to a relnset is simply an atset.

The algorithm that you use will be an iterative algorithm that finds an FD, then uses that FD to decompose a relation. To help the grader: at each iteration, display the relation you are decomposing, the FD being used, and the two new relations that result.

Optional (not for grading)

Looking for an extra push? Write a function called thirdnf with the same specifications as above, but it decomposes the original relation into relations in third normal form such that the decomposition is both lossless-join and dependency-preserving.