Item-based collaborative filtering

Item-based collaborative filtering is a model-based algorithm for making recommendations. In the algorithm, the similarities between different items in the dataset are calculated by using one of a number of similarity measures, and then these similarity values are used to predict ratings for user-item pairs not present in the dataset.

Similarities between items

The similarity values between items are measured by observing all the users who have rated both the items. As shown in the diagram below, the similarity between two items is dependent upon the ratings given to the items by users who have rated both of them:

Itembased collaborative filtering

Similarity measures

There are a number of different mathematical formulations that can be used to calculate the similarity between two items. As can be seen in the formulae below, each formula includes terms summed over the set of common users U.

Cosine-based similarity

Also known as vector-based similarity, this formulation views two items and their ratings as vectors, and defines the similarity between them as the angle between these vectors:

Cosine-based similarity

Pearson (correlation)-based similarity

This similarity measure is based on how much the ratings by common users for a pair of items deviate from average ratings for those items:

Pearson similarity

Adjusted cosine similarity

This similarity measurement is a modified form of vector-based similarity where we take into the fact that different users have different ratings schemes; in other words, some users might rate items highly in general, and others might give items lower ratings as a preference. To remove this drawback from vector-based similarity, we subtract average ratings for each user from each user's rating for the pair of items in question:

Adjusted cosine similarity

From model to predictions

Once we make a model using one of the similarity measures described above, we can predict the rating for any user-item pair by using the idea of weighted sum. First we take all the items similar to our target item, and from those similar items, we pick items which the active user has rated. We weight the user's rating for each of these items by the similarity between that and the target item. Finally, we scale the prediction by the sum of similarities to get a reasonable value for the predicted rating:

Weighted sum

Our implementation

We implemented item-based collaborative filtering using these parameters:

Challenges

We tried item-based collaborative filtering on the Movielens dataset, but as the Results page shows, it didn't perform very well in testing. In particular, we isolated two main problems, which were mainly due to the sparsity of the data:

References

[1] M.Deshpande and G. Karypis. Item-based top-n recommendation algorithms. ACM Trans. Inf. Syst., 22(1):143-177, 2004.

[2] B.M. Sarwar, G. Karypis, J.A. Konstan, and J. Reidl. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International World Wide Web Conference, pages 285-295, 2001.