General materials

Syllabus
Office hours
Piazza questions and answers
Textbook: Mining Massive Datasets, second edition
Online
Hardcover
Some comments about readings
Partner assignments

Week 1

Warmup.
Due Tues, Jan 6, at 11:55 pm.
Associated reading: Section 1.1 of MMDS textbook. ("What is Data Mining?")
Associated reading: Wikipedia article on data science
Associated reading: Controversy on what data science is, isn't, should, or shouldn't be. Read the comments too, they're interesting.
Reading 1: Read the Wikipedia articles on the K-nearest neighbors algorithm and on confusion matrices. Post a question or comment about each.
Due Wed, Jan 7, before class starts.
k-NN, part 1.
Due Fri, Jan 9, at 11:55 pm.

Week 2

Reading 2: Sections 3.1.1, 3.2 (opening), 3.2.1, 3.2.4, 3.3 (opening), 3.3.1, 3.3.2. Post a single question or comment.
Due Mon, Jan 12, before class starts.
k-NN, part 2.
Due Mon, Jan 12, at 11:55 pm.
Associated reading: throughout this week we'll be talking about chapter 3 through and including section 3.4.
Reading 3: Tutorials on matrix addition and scalar multiplication, and on matrix multiplication. Post a single question or comment.
Due Fri, Jan 16, before class starts.
Locality Sensitive Hashing, part 1
Due Sat, Jan 17, at 11:55 pm.

Week 3

Locality Sensitive Hashing, part 2
Due Wednesday, Jan 21, at 11:55 pm.
Exam 1: Friday, January 23. Read these study guidelines. Make sure to bring a calculator.

Week 4

Associated reading: we will be covering chapter 5 on PageRank and associated topics. Specifically, we'll be covering the entire chapter except for 5.2.
Reading 4: Sections 5.1.2, 5.1.3, and 5.1.4 in the textbook. Post a single question or comment.
Due Mon, Jan 26, before class starts.
PageRank, part 1. To be done individually (see assignment).
Solution to analytical parts
Due Wed, Jan 28, at 11:55 pm.
Reading 5: Chapter 6 intro, and all of 6.1 (including 6.1.1, 6.1.2, 6.1.3, and 6.1.4). Post a single question or comment.
Due Fri, Jan 20, before class starts.
PageRank, part 2. To be done individually (see assignment).
Solution to analytical parts
Due Sat, Jan 31, at 11:55 pm.

Week 5

Associated reading: we will be portions of chapter 6 on frequent itemsets and association rules. Specifically, we'll be covering 6.1, 6.2, and 6.4. This chapter, which is from an alternative textbook, is also a great place to look if you want to see the same ideas said differently.
Reading 6: Section 6.4 intro, 6.4.1, 6.4.2, 6.4.3. Post a single question or comment.
Due Wed, Feb 4, before class starts.
Association rules, part 1. To be done with partner if you have one; see end of assignment for part 1 breakdown.
Due Thu, Feb 5, at 11:55 pm.

Week 6

Reading 7: Clustering! In Chapter 8 of the Tan book, read the Chapter 8 intro, and all of 8.1. (This is the same material as the intro of Chapter 7 of our usual MMDS textbook, but I think the Tan book does this part better.) Then read section 7.1.3 in our MMDS text on "The Curse of Dimensionality." Post a single question or comment about something in any of the reading.
Due Wed, Feb 11, before class starts.
Association rules, part 2. To be done with partner if you have one; see end of assignment for part 2 breakdown.
Due Wed, Feb 11, at 11:55 pm.
Exam 2: Friday, February 13. Read these study guidelines. Make sure to bring a calculator.

Week 7

Associated reading: Chapter 7 of our usual MMDS (Mining Massive Datasets) textbook covers clustering, and we'll be covering section 7.1 (all of it), 7.2 (all of it), 7.3.1, 7.3.2, 7.3.3, and 7.4 (all of it). As with association rules, there is also a free chapter from the Tan et. al. book, and it covers some of the same material better than the MMDS book (but not all of it). I'll be picking and choosing approaches from each text when talking in class. In the Tan et. al. book, we'll be simultaneously covering sections 8.1, 8.2, 8.3, and possibly some bits of 8.5.
K-Means Clustering, part 1. To be done with partner if you have one.
Due Sat, Feb 21, at 11:55 pm.

Week 8

K-Means Clustering, part 2 (and optionally part 3). To be done with partner if you have one.
Due Mon, Feb 23, at 11:55 pm.
Final project proposal
Due Wed, Feb 25, at 11:55 pm.
Reading 8: In our usual MMDS textbook, read sections 9.1.1, 9.2 (prologue), 9.2.1, 9.2.2, 9.2.4, and 9.2.5. (You are welcome to read the skipped sections if you like.) Post a single question or comment.
Due Fri, Jan 27, before class starts.
Agglomerative Clustering. To be done with partner if you have one.
Due Sat, Feb 28, at 11:55 pm, which an automatic extension until Mon, March 2, at 11:55 pm, for anyone who wants it.

Week 9

Associated reading: We'll be covering essentially all of Chapter 9 from our usual MMDS (Mining Massive Datasets) textbook. We may deviate a bit in how we cover section 9.4 on dimensionality reduction.
Exam 3: Friday, March 6. Read these study guidelines. Make sure to bring a calculator.

Week 10

Recommender systems. To be done with partner if you have one.
Due Mon, Mar 9, at 11:55 pm.
Peer evaluations. Submit this form separately for each partner that you worked with. ( If you worked entirely alone all term, you do not need to submit.) If you forget to submit this, I'll treat it as if you received a negative evaluation from a partner. Don't forget!
Due Wed, Mar 11, before class.

Finals Week

Final project. To be done with partner if you have one.
Due Mon, Mar 16, at 9:30 pm (end of last final exam). I am forbidden by college policy to grant any extensions unless you gain approval from the Dean of Students office.