Search Engine via Binary Search Tree

Overview
Part 1: Create a Map
Part 2a: More map
Part 2b: Fill up the Map with data from IMDb, and run queries
- Evaluation of how well it works
What to turn in
Acknowledgements

This is a pair programming assignment. If you are working in a pair, this means that you and your partner should be doing the entirety of this assignment side-by-side, on a single computer, where one person is "driving" and the other is "navigating." Set a timer to swap every 15 minutes. You can choose your favorite from this online timer page. Make sure your sound volume is audible, but not so loud to disturb the people around you.

We will use anonymous grading on Moodle, which means that the grader won't see your name until after the grading is done. This is an easy way to help add an extra element of fairness to the grading. Therefore, make sure your name doesn't appear on your actual submission. When you submit via Moodle, it will know you are. Thanks!

Overview

This assignment is a repeat of the last one, with the difference that you'll be using a binary search tree instead of a hash table. Additionally, you'll be able to do a prefix search, since binary search trees allow you to search efficiently within a range.

Part 1: Create a Map

Create a class called TreeMapForStrings that will be used to map Strings to Strings. (The intention is to use it to map performer names to biographies.) Implement this map as a binary search tree. For this assignment, you should not the built-in Java classes that do this task such as TreeMap, HashMap, TreeSet, or HashSet. Your map should have the following methods:

void put(String key, String value)

add a key and a value to your map. If the key is already present in the map, replace the old value with the new one.
String getValue(String key)

get the value associated with a particular key. Return null if the key is not in the map.

Make sure to test your map appropriately. A main method would be a nice way to do this.

Java reminder: since you are comparing strings, make sure to use the equals and compareTo methods instead of the <, >, and = operators.

You should not be searching the web or other sources for similar code to this. You are welcome to look at the textbook.

Part 2a: More map

Add this method to your TreeMapForStrings class:

ArrayList<String> getKeysForPrefix(String prefix)

Return a list of all keys that start with prefix. You should do this efficiently: specifically, you should not traverse the entire tree in order to find these keys.

Part 2b: Fill up the Map with data from IMDb, and run queries

This part of the assignment is a complete repeat of Part 2 from the previous hashing version, so you should hopefully be able to use the code that you wrote from the previous assignment and make only minimal changes necessary in order to use your tree instead of the hash table. You'll create a class called TreeSearchEngine that works essentially the same as HashSearchEngine did in the previous assignment. The key difference is that you should be able to do a prefix search, based on your getKeysForPrefix method. At the user prompt, if a user types in a performer name such as Rech, you should return back all titles and summaries that start with that prefix. Do this repeatedly until the user types ####, in which case the program should end.

Evaluation of how well it works

Loading up the entire database is slow. Is that because reading the file itself is slow, or is because adding to the tree is slow? Experimentally determine which is the bottleneck, and explain your results. You may find the method System.nanoTime() very useful here. Submit your results, and an explanation of how you got them and what the relative measurements are, in a readme.txt file.

What to turn in

Part 1: Turn in your TreeMapForStrings class, which should include a main method for testing with some small data.

Part 2: Turn in your updated TreeMapForStrings class and your TreeSearchEngine class, as well as your readme.txt file describing performance measurements.

Acknowledgements

According to IMDb licensing, I should add:

Information courtesy of IMDb (http://www.imdb.com/). Used with permission.