CS 334 Final Assignment

Table of Contents

This is a team assignment (unless you're working individually).

We've discussed that a number of new database systems have arisen to challenge the dominance of the relational model. They're loosely coupled under the category "NoSQL", which is a vague name that is generally used to mean "something other than relational." For this assignment, you will choose a NoSQL database system and write a paper about it comparing and contrasting it with relational database systems.

Your paper should be a minimum of 2500 words, not including references. That's a little under 5 single-spaced pages. It's fine to distribute this effort between you and your partner. You don't need to "pair-program" every word in your paper. That said, you should make sure that your paper does read coherently with a single voice.

Choices

There are a large number of systems out there to choose from. To help narrow things down some, here are some major categories of NoSQL database systems. You can click the link on each to see which ones are the most popular. Note that we'll be spending some time in class on MongoDB in particular. If you choose MongoDB, I will be expecting you to research content above and beyond what happens in class.

Key-value stores

The starting idea for a key-value store is essentially a permanent version of a hash table; given a key, be able to look up a stored value. Useful for quickly looking up images, objects, and so on.

Column-family stores (also known as wide column stores)

These are designed for storing sparse matrices, where it would make sense for a row and a column together to serve as a key. Useful for results from web crawlers, recommendation systems, and other forms of sparse data.

Graph databases

These are database systems designed to hold graphs, i.e. databases of data and connections between them. Useful for a variety of kinds of social networks (including co-authorship, fraud detection, etc.)

Document stores

These resemble key-value stores, but the values themselves have a hierarchy and structure of their own. Useful for web content, publishing, document search, etc.

Native XML databases

These are systems that store the data directly in XML as opposed to some other internal format, which has signficant advantages regarding interoperability with other systems. You can use the large set of XML tools available to operate directly on the data, in addition to what the database does for you.

What the paper should contain

I want you to show me that you can take everything that you've learned about database systems this term, and apply that knowledge to learning about a new one. In my mind, the perfect paper would be one that effectively repeats the content of our course, but does so instead in the context of a different database system. Specifically, here are questions I would like to see answered within your paper:

  • Why does this database system exist? What does it try to do differently from a relational database system?
  • How does its fundamental data model differ from the relational model?
  • If your database system has a query language and if it is different from SQL, how is it different? Why?
  • How does the database system work internally? How is data stored? Does it index? How so? What sorts of algorithms dominate?
  • How are queries to the database system evaluated and optimized?
  • Does the database system support transactions? How is it different?

The above is likely more than you can cover in your paper in the time that you have. I would therefore like you to be able to say at least something in passing about all of the above topics, but your paper should focus in depth on how the database works internally, and how that differs from a relational system. At least 1000 words or so of your paper should be specifically dedicated to discussion of how the internals of the system work.

Optional video component

If you wish, you may install the database system yourself, and experiment with it. You can import some data, write some queries and/or other code, and interact with it. If you go this route, you would submit a video that you make where you narrate your use of the system, and describe what you're doing. This portion of the project, if you wish, could be used to replace portions of the paper that would cover the user side of the database, such as its data model, query language, how you interact with it, and so on. The video portion would not replace the discussion of the database internals, which you would cover in the paper. If you submit a video component, it should be five minutes long, and then you would only need to 1500 words in your paper instead of 2500 words.

You are free to use any tool you like to make the screencast. Carleton recommends Zoom; in particular, here is a video with instructions on how to use Zoom for screensharing. When you're done, you'll then need to upload your video to YouTube. (I'd recommend choosing "unlisted" for a privacy setting, but that's up to you.) Make sure to include a link to your video in your paper, and verify that it works in a browser where you are not logged into Google via your account.

References

Your paper should have at least three references. Wikipedia is all over the map on these systems; sometimes it's useful, sometimes it's exactly what you need, sometimes it's too brief, and sometimes it's just flat out wrong. You are welcome to take a look at Wikipedia to get yourself oriented, and possibly to find more sources. That said, my own impression from a quick sampling of the Wikipedia articles for some of these database systems is that they are very high level and lacking in precision.

Deadlines

  • On Tuesday of the last week of classes, you should turn in to me via Moodle a list of the references that you intend to use. I will get you feedback within 24 hours. If you want feedback sooner than that, I'm happy to do so; submit your references sooner, and I'll respond within 24 hours from whenever you submit them.
  • The paper itself, and optional video, are due at the end of the last final exam. The paper should be submit via Moodle, and the video should be uploaded to YouTube. Include the link to the screencast in your paper.

Grading

Here is how grading will be done:

Content: (12 points total = points below * 3)

  • Excellent (4 points): Appropriate spread of content about database, both from a usability perspective as well as internals. Required content on internals is there, and shows that the student processed and interpreted them appropriately.
  • Good (3 points): Overall good effort at covering both database usability and internals, but some details vague, missing, or not as clearly defined.
  • Fair (2 points): Covered usability and internals, but significant points missing or not defined very well
  • Poor (1 point): Paper is hard to follow, and/or it seems that the content has been parroted from other sources without really being interpreted at all

Sources (4 points total)

  • Excellent (4 points): sources are appropriately integrated into paper, and it is clear they are used properly
  • Good (3 points): sources generally done correctly, but some citations are awkward
  • Fair (2 points): sources are ok, but some significant issues in how they are integrated
  • Poor (1 points): little to no sources, or thoroughly not integrated with work

Organization (4 points total)

  • Excellent (4 points): paper/video is clearly laid out and organized in a fashion that makes sense
  • Good (3 points): organization is there, but does not flow as well as it could
  • Fair (2 points): there is some sense of organization, but it is more difficult to follow
  • Poor (1 point): structure is nearly or completely non-existent

Format and Style (4 points total)

  • Excellent (4 points): follows given length/format instructions, professional tone, no noticeable spelling/grammar errors
  • Good (3 points): mostly adheres to instructions, and/or may occasionally use awkward language, and/or few spelling/grammar errors
  • Fair (2 points): noticeable deviation from length/format instructions, and/or written in casual tone, and/or some spelling/grammar errors
  • Poor (1 points): violated length/format instructions considerably, and/or sloppy/unfocused language, and/or many spelling/grammar errors

Good luck, and have fun!