Type: Group Assignment

Due: by 9:30 PM on Tuesday, October 30, 2018

This assignment is worth 16 points.

Goals

The goal of this assignment is to study the performance characteristics of several sorting algorithms and to get practice and how to analyze algorithms. You will also get practice articulating what an algorithm is doing in prose.

Setup and Requirements

This is a partner assignment (unless you were not assigned a partner). You may certainly discuss the assignment and how to get to your final solutions with your other classmates, but make sure to cite any sources used or classmates consulted!

In this assignment, you’ll be doing a mixture of coding, designing timing experiments, and analyzing your results. You’ll be thinking through what timing data makes sense to collect, and writing your reasoning in a short report. Your report should be typed, and must be a PDF. The easiest way to produce a PDF is to write your text in a text editor or a word processing program, then export or print it to PDF. On a Mac (such as the Macs in the lab), you can choose “Print” and then in the lower left corner there will be a “PDF” button. Click on that button and choose “Save as PDF”. No format other than PDF is acceptable. If you are familiar with LATEX, the PDF created from that is also a good option.

There are two starter files included with this assignment:

Investigation 1: Quicksort and Small Lists

As we discussed in class, quicksort implementations often switch to using a different sort for small arrays. This is because the overhead of the recursive calls and the constant time factors in quicksort dwarf the poorer asymptotic performance of the other sorting algorithm when n is small. The sorting algorithm that we’ll use for these small arrays is insertion sort.

I’ve provided you with a version of quicksort code in QuickSort.java. Your job is to investigate how the performance of quicksort varies depending on when we switch to insertion sort. This investigation would be useful in determining, for instance, what a reasonable cutoff would be for the base case in our code.

Write code in main function of this class that performs experiments varying when this switch to insertion sort occurs and collecting timing data. I’ve included an example of how to collect timing data. You may first want to use one size of array to compare a broad range of possible switch points, and then use another array to gather more fine-grained data differentiating between a smaller range of switch points.

The second section of your report for this assignment should more broadly relate the size of the data being sorted to quicksort’s overall performance (outside of the question of when to switch to insertion sort). Describe what data you collected and why it was appropriate, and present the data in a table or a graph. Explain any patterns you see in your data and any hypotheses you have for why these patterns occur.

Investigation 2: ShellSort and Gap Distance

ShellSort performs a sequence of insertion sorts at varying “gaps”, starting with something large and eventually ending with a gap of one. Any sequence that contains a gap of one will result in a correct sort, but different gap sequences vary significantly in their performance.

A common gap sequence involves creating a sequence where each element differs from the last by some multiplicative factor m. For example, if m is equal to 2, and you start with a gap of 14, then in the next iteration the gap size with be 14/2 = 7. Then in the following iteration you should have a gap of 7/2 = 3.5. However, you need to have integer gap sizes. So, to get the actual gap, you will take the ceiling of this number (that is you will round up). You’ll find the ceil function (part of the java.lang.Math package) helpful. So the next gap is actually 4. Lastly, you never run with a gap of 2. If you calculate a gap of 2 , you should instead automatically change this to a gap of 1. So, for our example where we start with a gap of 14, we would have the following sequence of gaps: 14, 7, 4, 1.

I’ve provided you with a version of ShellSort code in ShellSort.java. Your job is to explore how changing the value of the multiplicative factor m affects the performance of ShellSort. Specifically, you should try out gaps of 2, 2.25, and additional values of your choice to explore the pattern that arises as the gap changes. I’m asking you to try 2.25 as it is used in many actual implementations of ShellSort; this is called Tokuda’s sequence, for its inventor Naoyuki Tokuda. Since 2.25 is a decimal, make sure you are taking the ceiling to get your gap values. Include your timing experiments in the main method of ShellSort.java.

In your report, describe what you find about how the performance of shellsort changes as the multiplicative factor used for the gap changes. For this section, you may have less clear hypotheses about why you see the performance you do and that’s okay; do try to articulate what factors you believe may be leading to differences in timing based on what you know about shellsort and why its performance can be better than insertion sort.

Report Guidelines

Your report should describe the timing experiments you performed, the timing data that you collect, and answers to the questions posed in the two individual investigations. You should make sure to explain how the evidence you collected supports your claims, and explain your choices about what evidence to collect for each question. Your report should include tables and/or graphs to support your points.

Bring the analytical skills you’ve learned elsewhere to bear on this assignment. I encourage you to think carefully about how to connect evidence and explanations. Imagine your report is intended to help someone make decisions related to these three investigations (e.g., when to switch to insertion sort for small lists). What information do they need to know? Why do you think the timing data looks the way it does based on what you know about how the algorithms work?

You should try to design your experiments in a way that will be convincing to the reader. How could you compensate for the fact that timing data is “noisy,” with the same code sometimes taking a little longer or shorter to run? You and/or your partner are welcome to chat with me about your approach in office hours.

Your report should be a PDF file no longer than 5 pages, including any charts and graphs. Note that no format other than PDF is acceptable.

Other Tips and Suggestions

  • You may change the starter code I gave you in any way you please, although you should ensure you don’t break the correctness of any of the sorts.
  • Make sure your timing code times only the thing you actually want to time - i.e., the sorting. It shouldn’t include file loading or printing to the console, as both of these can take long and variable amounts of time that may swamp true differences in execution time.
  • You can use many different array sizes to explore the questions in this assignment. Make sure the arrays you’re choosing are big enough to show real differences. If all of your experiments take only 1 or 2 milliseconds, it will be very hard to understand how your variations are related to differences in timing. Instead, use an array that’s big enough to take at least hundreds of milliseconds to sort.
  • Think carefully about what evidence to collect, and explain your hypotheses about why you see the data that you do. Any time you make an assertion, think about how to support it with evidence from your timing or from the way the code works.
  • If you’re working with a partner, you should both contribute to all parts of the assignment - experimenting and writing things up. Your timing and experiment code should be done in paired programming style, and you should collaboratively decide what timing data to collect; either of you should be able to explain the results that you got from your timing experiments. You each may write drafts of parts of the writeup by yourselves, but you should edit each other’s drafts and both of you should be aware of everything that’s in the writeup; if you aren’t able to explain why a particular section of the report includes the content it does and what the content means, then you have violated the collaboration guidelines for this assignment.
  • You may find it helpful when working on this assignment to consult with the Writing Center, located at the west end of the fourth floor of the library. The consultants there are trained to help with writing in multiple disciplines, including STEM fields like CS!

Submission and Grading

Submit your assignment by placing the following files into an assignment7 directory on the courses drive. Note that only one partner from each pair needs to submit the assignment.

  • Quicksort.java, with main() running your timing experiments
  • ShellSort.java, with main() running your timing experiments
  • Your report. Remember: Your report should be a PDF file no longer than 5 pages including charts and graphs. You can turn any type of a file on a Mac (like the lab computers) into a PDF by selecting “Print…” and then choosing the “PDF” option in the lower right corner. Just changing the file extension does not change the type of a file.

This assignment is worth 16 points. 12 points for the write-up and 4 points for your associated testing code.

Start early, ask lots of questions, and have fun! The instructor, the lab assistants, and the prefect are all here to help you succeed—don’t hesitate to ask for help!

Acknowledgments
This assignment modified from one originally written by Anna Rafferty and Layla Oesper.