CS 204: Software Design

The story of a simple database

This is what I did to create a small MySQL database out of some population data.

  1. Found the data in tabular form on the web at http://www.demographia.com/db-uscity1790.htm

  2. Copied and pasted the data into an Excel spreadsheet. The tabs in the data put the numbers into the right columns (lucky me!), so I just rearranged the two-part table into a single table by cutting and pasting the bottom table to the right of the top table. Compared the city names to make sure everything was lining up properly.

  3. Saved the spreadsheet as a comma-separated-values (CSV) file called populations.csv.

  4. Opened the CSV file with vi to take a closer look. Noticed some quirks:

    • Excel saved the lines with carriage returns instead of linefeeds as newline characters. This tends to play havoc with the Python scripts I write (since Python by default doesn't view carriage return (a.k.a. CR, a.k.a. \r) as an end-of-line character. I see this often enough that I wrote a dumb little shell script called "cr2lf" to fix it. I run this script by typing "cr2lf populations.csv". Here's the script:
      echo 'tr "\r" "\n" < $1 > $1.tmp'
      tr "\r" "\n" < $1 > $1.tmp
      echo mv $1.tmp $1
      mv $1.tmp $1
      
    • Brooklyn has some weirdness in the table, since its data got merged into New York's data at some point. I fix this by deleting Brooklyn. I could have added the old Brooklyn data to the New York data, but it was easier for this example to just delete it.
    • I noticed that wherever the original chart was missing data (e.g. Minneapolis didn't exist in 1790), there were three spaces in front of every comma. I could have used a vi global search-and-replace command to replace " ," with ",", but I decided to just handle this problem in my up-coming Python script.
  5. Designed a simple-minded SQL table, and wrote the CREATE statement to build the table:

    DROP TABLE IF EXISTS populations;
    CREATE TABLE populations (
      city text,
      year int,
      population int
    );
    
  6. Wrote a Python script to convert the CSV file into a sequence of SQL INSERT statements.

  7. Combined the DROP/CREATE statements with the INSERT statements into a single file called populations.sql.

  8. Logged into cs-research1.mathcs.carleton.edu and Used the mysql command and the login information that Mike Tie sent me to load my table into my MySQL database, like this:

    mysql -u myusername -h localhost -p myusername < populations.sql
  9. Ran the mysql command prompt and tried a quick test:

    mysql -u myusername -h localhost -p myusername

    and then at the prompt:

    mysql> SELECT * FROM populations WHERE year=1900 ORDER BY population DESC;