Database design and population
You will work with your web application team for this assignment.
Goals
- Learn about database design, including a little bit about normal forms and a few ideas about how best to organize relational database tables.
- Design a database for your team's data.
- Practice writing conversion code to transfer raw data from some external source to your database.
Rubric
Your tasks
- Read the database readings and watch the database videos included on the course page.
- Make sure you still have a directory named
datain your repository, and put all files mentioned below this directory. - Design a database to represent your project's dataset.
Write your design in the form of CREATE TABLE statements, and save those
statements (and only the CREATE TABLE statements, including semi-colons)
in a text file called
database-schema.sqlinside yourdatafolder. - Write a Python program called convert.py that will read your dataset's file and write one CSV file for each of the tables in your database design. If you have designed a table named "animals", for example, the corresponding CSV file should be named "animals.csv", and should have the same columns as the table.
- Add each of your new CSV files to your repository.
- Load your CSV files into your database on stearns.
Write three SQL queries that answer different questions about your dataset. (You can use the queries in the database lab from last week as a model for the kind of thing you might ask your dataset.) Put your SQL queries in a text file named queries.sql.
Make sure to test your queries using
psqlon stearns.Once you are satisfied with your database and your queries, create a dump of your database by running this command at the stearns Unix command line (not at the psql prompt):
pg_dump --clean --no-owner --no-privileges -U YOUR_USER_NAME YOUR_USER_NAME > database-dump.sqlAdd this file to your repository.
If your dataset is really big and GitHub doesn't like it, you can zip up your data files and send them to me as Slack DM attachments.
Some hints and suggestions
You can test whether your conversion is correct for your design like this:
- Launch psql for your database
- Execute your CREATE TABLE statements (copy/paste from
database-schema.sqlwill work) - Use the \copy command to load each of your new CSV files into their corresponding tables (see the database lab for a reminder of how that goes).
- Do a quick "SELECT * FROM tablename LIMIT 5;" or something along those lines for each table to make sure the data you expect has been loaded into the table.
- A detailed discussion of database design principles and database normalization is too big for this course. That's why I created and assigned the "miniscule intro to database design" linked on the course web page. That video will help you understand what I am looking for.