Movie Recommendations via Item-Item Collaborative Filtering. You are provided with real data (Movie-Lens dataset) of user ratings for different movies. There is a readme file that describes the data format. In this project, you will implement the item-item collaborative filtering algorithm that we discussed in the class. The high-level steps are as follows: a) Construct the profile of each item (i.e., movie). At the minimum, you should use the ratings given by each user for a given item (i.e., movie). Optionally, you can use other information (e.g., genre information for each movie and tag information given by the user for each movie) creatively. If you use this additional information, you should explain your methodology in the submitted report. b) Compute similarity score for all item-item (i.e., movie-movie) pairs. You will employ the centered cosine similarity metric that we discussed in class. c) Compute the neighborhood set N for each item (i.e. movie). You will select the movies that have the highest similarity score for the given movie. Please employ a neighborhood of size 5. Break ties using lexicographic ordering over movie-ids. d) Estimate the ratings of other users who didn’t rate this item (i.e., movie) using the neighborhood set. Repeat for each item (i.e., movie). e) Compute the recommended items (movies) for each user. Pick the top-5 movies with the highest estimated ratings. Break ties using lexicographic ordering over movie-ids. Your program should output top-5 recommendations for each user.
Please write Python code.
Movie Recommendations via Item-Item Collaborative Filtering. You are provided with real data (Movie-Lens dataset) of user ratings for different movies. There is a readme file that describes the data format. In this project, you will implement the item-item collaborative filtering
a) Construct the profile of each item (i.e., movie). At the minimum, you should use the ratings given by each user for a given item (i.e., movie). Optionally, you can use other information (e.g., genre information for each movie and tag information given by the user for each movie) creatively. If you use this additional information, you should explain your methodology in the submitted report.
b) Compute similarity score for all item-item (i.e., movie-movie) pairs. You will employ the centered cosine similarity metric that we discussed in class.
c) Compute the neighborhood set N for each item (i.e. movie). You will select the movies that have the highest similarity score for the given movie. Please employ a neighborhood of size 5. Break ties using lexicographic ordering over movie-ids.
d) Estimate the ratings of other users who didn’t rate this item (i.e., movie) using the neighborhood set. Repeat for each item (i.e., movie).
e) Compute the recommended items (movies) for each user. Pick the top-5 movies with the highest estimated ratings. Break ties using lexicographic ordering over movie-ids. Your program should output top-5 recommendations for each user.
movieId_1 | movieId_2 | movieId_3 | movieId_4 | movieId_5 | movieId_6 | movieId_7 | movieId_8 | movieId_9 | movieId_10 | |
movieId_1 | 1 | 0.1431 | 0.07688 | 0.008127 | 0.09366 | 0.01457 | 0.1098 | 0.005078 | -0.05931 | 0.02976 |
movieId_2 | 0.1431 | 1 | 0.02305 | 0.04271 | 0.009544 | -0.00369 | 0.1035 | 0.1233 | 0.05253 | 0.193 |
movieId_3 | 0.07688 | 0.02305 | 1 | 0.01552 | 0.2279 | -0.00042 | 0.02262 | 0.1398 | 0.1029 | 0.02342 |
movieId_4 | 0.008127 | 0.04271 | 0.01552 | 1 | 0.09151 | -0.02098 | 0.1853 | 0.1497 | -0.07277 | 0.03346 |
movieId_5 | 0.09366 | 0.009544 | 0.2279 | 0.09151 | 1 | 0.1149 | 0.005068 | 0.1279 | 0.08839 | 0.01301 |
movieId_6 | 0.01457 | -0.00369 | -0.00042 | -0.02098 | 0.1149 | 1 | -0.00661 | 0.005829 | 0.04757 | -0.01539 |
movieId_7 | 0.1098 | 0.1035 | 0.02262 | 0.1853 | 0.005068 | -0.00661 | 1 | 0.0919 | 0.04296 | 0.1564 |
movieId_8 | 0.005078 | 0.1233 | 0.1398 | 0.1497 | 0.1279 | 0.005829 | 0.09197 | 1 | 0.07215 | -0.01944 |
movieId_9 | -0.05931 | 0.05253 | 0.1029 | -0.07277 | 0.08839 | 0.04757 | 0.04296 | 0.07215 | 1 | 0.02011 |
movieId_10 | 0.02976 | 0.193 | 0.02342 | 0.03346 | 0.01301 | -0.01539 | 0.1564 | -0.01944 | 0.02011 | 1 |

Trending now
This is a popular solution!
Step by step
Solved in 2 steps









