> Berrenur Saylam, Raphaël Fournier-S’niehotta and Lionel Tabourier
The dataset under examination here is MovieLens 1M (see http://grouplens.org/datasets/movielens/). MovieLens is a project created by GroupLens Research group in 1997 for personalised movie recommendations. This dataset gathers the ratings of 6,040 users on 3,900 movies. Additional information about the users include age, gender and occupation. Additional information about the movies include title and genre.
This dataset can be seen as a (weighted) bipartite graph consisting of users and films. Our analysis focuses on the question whether users review similar movies or not. For this purpose, we project the bipartite graph on users, so that they share a link if they reviewed at least one same film. But, as the average number of reviews per user is very high, such a graph is nearly complete. Thus, we decide to filter links according to the Jaccard index of similarity: pairs of users are ranked according to the similarity of the sets of films that they have reviewed. This short movie shows how the degree distribution of the projection evolves when increasing the level of filtering. We gradually increase the level of filtering (10%, 20% . . . ) to see that it evolves from the distribution of a nearly complete graph to a “traditional” heterogeneous distribution, close to a power-law.
The long-term purpose of this work is to describe the environment of a user in the recommendation system, and evaluate how it can be divided into communities with similar tastes.