| TID |
Items |
| 1 |
K,
A, D, B |
| 2 |
D,
A, C, E, B |
| 3 |
C,
A, B, E |
| 4 |
B,
A, D |
(a) (8 points) Describe the way in which HITS will find for you a webpage that is authoritative on your search query but does NOT actually contain any of the words from your query.
(b) (8 points) For the heuristics mentioned on pages 7-8 of Chakrabarti et. al. paper (HITS), explain which of the three problems mentioned at the top of page 7 they address and how they help to resolve these problems. Indicate whether each of these heuristics would also be useful for PageRank.
(15 points) Suppose you are given the following ratings by
Carleton students on four different courses, where a ? indicates that
no rating was given:
| Student ID |
course 1 |
course 2 | course 3 | course 4 |
| 1 | 3 | ? | 1 | 2 |
| 2 | 1 | 2 | ? | 3 |
| 3 | 3 | 3 | 1 | 5 |
| 4 | ? | 1 | 3 | 5 |
Using the correlation technique described in the Breese et. al. paper (collaborative filtering), what would be the predicted vote for student 4 on class 2? What would be the vote using the vector similarity method? Assume that none of the extensions described in section 2.2 of the paper are used.