This is an individual assignment. You may chat with others in the class and trade ideas, but the work that you submit should be your own.
This assignment is a repeat of the previous assignment, but done by using a hash table instead of a binary search tree. Hash tables are typically much, much faster than binary search trees, but you can't efficiently look up values in a range. So for this assignment, we won't be reimplementing the prefix portion of the search. You'll then do a comparison between the two approaches.
Because this assignment is a rework of the previous assignment, you are free and welcome to start with the code that you submitted for the last one if you wish. Because this assignment is an individual one, you should clearly indicate in program comments which portions of your code have been lifted / modified from the previous pair assignment, and which portions you've added yourself.
If you had trouble with the previous assignment, don't panic. There's no need to use the previous code at all if you don't wish to. You can read on for more details.
Create a class called HashMapForStrings that will be used
to map Strings to Strings. (The intention is to use it to map movie
titles to movie plots.) Implement this map as a hash table with
chaining. For this assignment, you should not the built-in Java
classes that do this task such
as TreeMap
, HashMap
,
TreeSet
, or HashSet
. You should also build
your linked list chains from scratch: you should not use arrays or the
built-in LinkedList
or ArrayList
classes for
managing your chains. Your map should have the following methods:
void put(String key, String value)
: add
a key and a value to your tree. If the key is already present in the
tree, replace the old value with the new one.
String get(String key)
: get the value
associated with a particular key. get
should
return null if the word is not in the tree.Make sure to test your map appropriately. A main method would be a nice way to do this.
In a similar fashion to the previous assignment, create a class called HashSearchEngine with a main that creates a HashMapForStrings object, then loads up a file of movie data into it. Specifically, the movie titles should be the keys, and the plot summaries should be the values. Note that this data is messy in a few ways: this is typical for data that you get from other sources. You'll need to think about what kinds of processing you may want to do on the data to make the titles usable.
Since you'll be using more than one input file (the big one, and your small testing one), don't hardcode the name of your file into your program. Instead, add it as command-line parameter. When you run your program, do so like this:
java -Xmx1g HashSearchEngine plot.listsor
java -Xmx1g HashSearchEngine plotsmall.lists
Use the array of args in main to grab the filename and open it appropriately.
The "-Xmx1g" tells Java to allow a maximum about of memory of 1 gigabyte, which you'll need for loading up this much data. Without doing so, the default maximum amount of memory that Java uses is something like 128 MB or 256 MB (depending on the version of Java).
Loading up the data can take a while on the big file. I found it convenient to keep a running counter of how many movies I had loaded, and printing out a status update every 1000 movies.
After loading up the data, you should enter a loop that prompts the user to enter in a movie title. For example, if I enter in Star Wars (1977), I should see the plot summary for that movie. Happily, if you've completed part 1, you've got a method that helps you out directly in doing this. After displaying the plot summaries, prompt the user again for another prefix. Do this until the user types ####, in which case the program should end. I've checked that there are no movies with that title.
Perform a comparison of how hashing scales with the size of the index array, and also with binary search trees, regarding the time it takes to do a lookup. I want you to take some time to do this carefully and prepare a detailed writeup on this: this is the assignment that is being used to enable this course to qualify as a "Quantitative Reasoning Encounter."
In order to do this, I want you to compare the time it takes to do build the map, and how long it takes to do a lookup, under the following scenarios:
Make sure to discuss and interpret the results that you see. Explain how it fits in with the theoretical expectations.
One challenge that you'll face is that you can't get accurate timing measurements by seeing how long it takes to look up a single movie. There are two reasons for this: a single movie might not be representative of the dataset as a whole, and it happens so quickly that the timing measurements are not especially accurate or relevant. A much better way to do it is to do a large number of lookups, and consider either the total time or average time for each within. To do a large number of lookups, you'll need to write code that takes a large number of movie titles and "looks them up" in your map. As with the previous assignment, the method System.currentTimeMillis() is very useful here.
In addition to submitting the code that you write, you should turn in your plots and your explanations in something that resembles a lab report. It should be a coherent document that explains what you did, what results you find, and what conclusions you draw. This should also be submitted electronically: formats such as pdf or standard word processing document formats are fine.
Part 1: Turn in your TreeMapForStrings class, which should include a main method for testing it with some small data.
Part 2: Turn in your HashSearchEngine class, which should be successfully working for querying individual movies. include a main method for testing it with some small data.
Part 3: Turn in your lab report and additional code that you have written in order to generate your experimental results.
Have fun, and good luck!