Search Engine via Binary Search Tree
Table of Contents
This is a pair programming assignment. If you are working in a pair, this means that you and your partner should be doing the entirety of this assignment side-by-side, on a single computer, where one person is "driving" and the other is "navigating." Set a timer to swap every 15 minutes. You can choose your favorite from this online timer page. Make sure your sound volume is audible, but not so loud to disturb the people around you.
We will use anonymous grading on Moodle, which means that the grader won't see your name until after the grading is done. This is an easy way to help add an extra element of fairness to the grading. Therefore, make sure your name doesn't appear on your actual submission. When you submit via Moodle, it will know you are. Thanks!
Overview
This assignment is a repeat of the last one, with the difference that you'll be using a binary search tree instead of a hash table. Additionally, you'll be able to do a prefix search, since binary search trees allow you to search efficiently within a range.
Part 1: Create a Map
Create a class called TreeMapForStrings
that will be used to map Strings to
Strings. (The intention is to use it to map performer names to biographies.)
Implement this map as a binary search tree. For this assignment, you should not
the built-in Java classes that do this task such as TreeMap
, HashMap
,
TreeSet
, or HashSet
. Your map should have the following methods:
void put(String key, String value)
add a key and a value to your map. If the key is already present in the map, replace the old value with the new one.
String getValue(String key)
get the value associated with a particular key. Return
null
if the key is not in the map.
Make sure to test your map appropriately. A main method would be a nice way to do this.
Java reminder: since you are comparing strings, make sure to use the equals
and compareTo
methods instead of the <, >, and = operators.
You should not be searching the web or other sources for similar code to this. You are welcome to look at the textbook.
Part 2a: More map
Add this method to your TreeMapForStrings
class:
ArrayList<String> getKeysForPrefix(String prefix)
Return a list of all keys that start with
prefix
. You should do this efficiently: specifically, you should not traverse the entire tree in order to find these keys.
Part 2b: Fill up the Map with data from IMDb, and run queries
This part of the assignment is a complete repeat of Part 2 from the previous
hashing version, so you should hopefully be able to use the code that you wrote
from the previous assignment and make only minimal changes necessary in order to
use your tree instead of the hash table. You'll create a class called
TreeSearchEngine
that works essentially the same as HashSearchEngine
did in
the previous assignment. The key difference is that you should be able to do a
prefix search, based on your getKeysForPrefix
method. At the user prompt, if
a user types in a performer name such as Rech
, you should return back all
titles and summaries that start with that prefix. Do this repeatedly until the
user types ####
, in which case the program should end.
Evaluation of how well it works
Loading up the entire database is slow. Is that because reading the
file itself is slow, or is because adding to the tree is slow? Experimentally
determine which is the bottleneck, and explain your results. You may find the
method System.nanoTime()
very useful here. Submit your results, and
an explanation of how you got them and what the relative measurements are, in
a readme.txt
file.
What to turn in
Part 1: Turn in your TreeMapForStrings
class, which should include a main
method for testing with some small data.
Part 2: Turn in your updated TreeMapForStrings
class and your
TreeSearchEngine
class, as well as your readme.txt
file describing
performance measurements.
Acknowledgements
According to IMDb licensing, I should add:
Information courtesy of IMDb (http://www.imdb.com/). Used with permission.