CS 334: B+ trees

Table of Contents

Overview

Implement a simulation of a B+-tree in Python.

This is a pair assignment.

Details

Structure of simulation

In a real B+-tree, the leaves contain keys and pointers to the corresponding records in pages. To keep this assignment focused on just the B+-tree, we will instead have the pointer in a leaf point to a specific Python value. In that sense, the B+-tree will work like a traditional map in a data structures sense; it will store key/value pairs. That value can even be a Python tuple, so it can simulate a B+-tree that way. Think of this as a secondary index.

Duplicate keys

Your tree should not store duplicate keys. If someone tries to store a key that is already in the B+-tree, raise an exception.

Types

You may assume that all keys are integers. Values can be any type.

Split details, and differences from textbook

When a node splits, if the number of keys is uneven, the node on the right after the split should contain one more key than the node on the left. The data structure works either way, of course, so long as one is consistent. I have chosen to go with this version because it is easier to implement in Python, simply due to the way Python handles 0-indexing, slicing, and truncation.

Also note that I am focusing on the number of keys in describing the tree, rather than the number of pointers. The number of pointers is always one more than the number of keys, so one can always determine one from the other. In my experience, it is more convenient to think in terms of keys rather than pointers, but you'll need to handle both regardless.

Leaf level

You do not need to implement the linked list of pointers that one typically finds in a B+-tree.

Code structure

Create a Python module called bptree (i.e., a file names bptree.py), inside of which is a class called Bptree. The constructor (i.e., the __init__ method) should take a parameter indicating how many keys can be stored in each node. In our tests, the number of keys will always be at least 2 (clarification added on 5/14). Your leaves and your internal nodes should have the same number of keys. This means that your leaves will have one pointer fewer than your internal nodes.

There are only three methods that your Bptree class must have (though you will undoubtedly want others as helpers):

  • insert(self,key,value) inserts a key-value pair into the tree.
  • getValue(self,key): returns the value associated with a particular key. Returns None if the key is not in the tree.
  • printTree(self) prints out a text version of the tree to the screen. The easiest way I found to do this was to rotate the tree vertically, showing the root on the left hand side of the terminal window, expanding to the right, and indenting further for deeper levels. Doing a recursive depth-first traversal of the tree made this pretty easy.

Running your code

When your code is working, I must be able to execute this test program from the same directory. In other words, we must be able to take your program and our program, place it in a directory with no other files, and issue one of the following command:

python3 bptreetest.py

(Note that this is not a sufficient test for your code. We will be running it with more complicated data than this.)

If your code does not work with the above test, we will not grade it. You will need to resubmit a version that does, at a loss of 25% of the credit for the assignment. (I'm being so strict on this because in the past, carefully specifying these guidelines alone wasn't enough, and it literally added many hours to the time it took to grade.)

Deletion

You do not need to delete anything from the B+-tree; you only need to insert. Of course, if you want to implement deletion for fun, go for it!