CS 257 Assignment

If you haven't read the general description of the web application project, do so now.

We're going to do this project in lots of little phases—1 or 2 class days of separation between phases, in general. Please pay attention to the specific mechanisms for handing in your work, since I'm going to need to run pretty fast to keep getting you the feedback you need.

Phase 1: partners and data

[By 11:59PM 10/1] Private-message me on Slack with your partnership preferences.
- Who do you want to work with? (You may, but do not need to, name specific people here.)
- Who did you work with on previous assignments, and how did it go?
- Anything else you want me to know?
[Bring to class 10/3] Find one or two datasets based on our 10/1 discussion of what constitutes a good dataset. Make sure you can download the data in CSV form. (You may collaborate with your partner(s) on this if you have time to do so after I send out the partner lists Thursday morning, but you may also do it on your own and then negotiate with your partner(s) about whose data to use.)

What makes good data?

Choosing suitable data for this project can have a big effect on how fun the project is, how useful the final product is, and how much you learn. It's pretty easy to pick well. Here are a few things to keep in mind.

Interesting data. You're going to be working with this data for a few weeks, so make sure you're interested in what it has to say.
Rich enough data version 1: two or more object types.

Occasionally, I'll have students choose data that's too simple. Suppose, for example, you had a dataset of dog breeds consisting of a list of breed names plus simple attributes (size, barking level, aggression, trainability, etc.). You can surely make a database out of this list, but it really only has one key object type: Breeds. The resulting database structure and user features tend to be extremely simple, which is usually frustrating for the students working on the project.

My Books & Authors example has two core object types (Books and Authors, shockingly). Each Book has one or more Authors. Each Author has one or more Books. Combine this with some interesting attributes (dates, titles, etc.), and you get a nice variety of ways to search, sort, and display the data.

Sports statistics are another good example. Consider, for example, Major League Baseball statistics. What are your core object types? You have Players, Teams, Divisions, and Leagues. Players are on Teams and Teams consist of Players. Divisions consist of Teams, so each Player also has a Division. Lots of ways to slice the data and present the results.

Any data where you have things that are associated with creators can work well. For example, there are movies (Movies, Actors, Directors,...), music (Songs, Artists, Albums,...), paintings (Paintings, Painters, Museums,...) etc.

It also goes well to have data where there's a natural grouping of the main objects can work. The baseball stats are like that (Teams, Divisions, etc.). You could even make the dog breeds example work if your data included breed type groupings like "working dogs", "toys", "herders", etc. One dataset last year that fit this model was a dataset of Magic: The Gathering cards, because the cards come in Sets, every card has one or more CardTypes, etc.
Rich enough data version 2: interestingly searchable attributes.

Some of my past students have used crime statistics for this project, and some of their datasets have had really just one core object type: Crimes. But the datasets have worked well because some of the Crime attributes are complex enough in their own right. They might have Cities, which are in States. They have dates, which can lead to interesting searches (e.g. show me all the crimes in Minneapolis in the first week of April, 2018).

Weather and energy statistics can be like this, too. Measurements taken in a variety of locations (which can give you City, State, Country...) at a variety of dates and times can then support interesting aggregate statistics based on location and time.
Obtainable in CSV form.

For our purposes, it's essential that you be able to obtain your data in comma-separated value (CSV) form. Lots of sites (government sites, notably) let you download directly as CSV or Excel files (which can then be saved as CSV). For other data, you can copy-and-paste into a spreadsheet and then save as CSV. It just depends.

The reason I insist on this is to unify some of our in-class and project-development activities, so everybody is doing basically the same steps. It's way easier for me to help you design and populate your database if everyone's data is in a similar form.

CS 257: Software Design

Web Application, Phase 1: partners and data

Phase 1: partners and data

What makes good data?