Predicting interactions between individuals with structural and dynamical information

Thibaud Arnoux, Lionel Tabourier, Matthieu Latapy

In Journal of Interdisciplinary Methodologies and Issues in Sciences, 2019

Capturing both structural and temporal features of interactions is crucial in many real-world situations like studies of contact between individuals. Using the link stream formalism to model data, we address here the activity prediction problem: we predict the number of links that will occur during a given time period between each pair of nodes. To do this, we take benefit from the temporal and structural information captured by link streams. We design and implement a modular supervised learning method to make prediction, and we study the key elements influencing its performances. We then introduce classes of node pairs, which improves prediction quality and increases diversity


Investigating the Lack of Diversity in User Behavior: the Case of Musical Content on Online Platforms.

Rémy Poulain, Fabien Tarissan

In Information Processing & Management, 57(2), Elsevier, 2020

Whether to deal with issues related to information ranking (e.g. search engines) or content recommendation (on social networks, for instance), algorithms are at the core of processes that select which information is made visible. Such algorithmic choices have a strong impact on users’ activity de facto, and therefore on their access to information. This raises the question of how to measure the quality of the choices algorithms make and their impact on users. As a first step in that direction, this paper presents a framework with which to analyze the diversity of information accessed by users in the context of musical content. The approach adopted centers on the representation of user activity through a tripartite graph that maps users to products and products to categories. In turn, conducting random walks in this structure makes it possible to analyze how categories catch users’ attention and how this attention is distributed. Building upon this distribution, we propose a new index referred to as the (calibrated) herfindahl diversity, which is aimed at quantifying the extent to which this distribution is diverse and representative of existing categories. To the best of our knowledge, this paper is the first to connect the output of random walks on graphs with diversity indexes. We demonstrate the benefit of such an approach by applying our index to two datasets that record user activity on online platforms involving musical content. The results are threefold. First, we show that our index can discriminate between different user behaviors. Second, we shed some light on a saturation phenomenon in the diversity of users’ attention. Finally, we show that the lack of diversity observed in the datasets derives from exogenous factors related to the heterogeneous popularity of music styles, as opposed to internal factors such as recurrent user behaviors.


Role of the Website Structure in the Diversity of Browsing Behaviors

Pedro Ramaciotti Morales, Lionel Tabourier, Sylvain Ung, and Christophe Prieur.

In Proceedings of the 30th ACM Conference on Hypertext and Social Media, pp. 133-142. ACM, 2019.

The quantitative measurement of the diversity of information consumption has emerged as a prominent tool in the examination of relevant phenomena such as filter bubbles. This paper proposes an analysis of the diversity of the navigation of users inside a website through the analysis of server log files. The methodology, guided and illustrated by a case study, but easily applicable to other cases, establishes relations between types of users’ behavior, site structure, and diversity of web browsing. Using the navigation paths of sessions reconstructed from the log file, the proposed methodology offers three main insights: 1) it reveals diversification patterns associated with the page network structure, 2) it relates human browsing characteristics (such as multi-tabbing or click frequency) with the degree of diversity, and 3) it helps identifying diversification patterns specific to subsets of users. These results are in turn useful in the analysis of recommender systems and in the design of websites when there are diversity-related goals or constrains.


Approximating the Temporal Neighbourhood Function of Large Temporal Graphs

Pierluigi Crescenzi, Clémence Magnien and Andrea Marino

Algorithms 2019, 12(10), 211 (Special Issue Algorithm Engineering: Towards Practically Efficient Solutions to Combinatorial Problems)

Temporal networks are graphs in which edges have temporal labels, specifying their starting times and their traversal times. Several notions of distances between two nodes in a temporal network can be analyzed, by referring, for example, to the earliest arrival time or to the latest starting time of a temporal path connecting the two nodes. In this paper we mostly refer to the notion of temporal reachability by using the earliest arrival time. In particular, we first show how the sketch approach, which has been already used in the case of classical graphs, can be applied to the case of temporal networks in order to approximately compute the sizes of the temporal cones of a temporal network. By making use of this approach, we subsequently show how we can approximate the temporal neighborhood function (that is, the number of pairs of nodes reachable from one another in a given time interval) of large temporal networks in a few seconds. Finally, we apply our algorithm in order to analyze and compare the behavior of 25 public transportation temporal networks. Our results can be easily adapted to the case in which we want to refer to the notion of distance based on the latest starting time.


Culture et (in)dépendance – Les enjeux de lindépendance dans les industries culturelles

Olivier Alexandre, Sophie Noël and Aurélie Pinto

ICCA – Industries culturelles, création, numérique, 2017

« Indépendant », « alternatif », « indie », « underground », « avant-garde », « de création »… Depuis les années 1970, la revendication d’indépendance a pris une importance grandissante dans les univers de production culturelle. Qu’elle se rapporte à des contenus, des méthodes de travail ou des dispositifs de médiation, cette revendication propose une alternative à la domination des groupes et des productions mainstream. Son succès conduit cependant à s’interroger sur la cohérence même d’une notion progressivement transformée en label de qualité. À travers douze contributions traitant de l’édition, du cinéma, de la musique, des médias et de la vulgarisation scientifique, cet ouvrage montre en quoi l’indépendance relève d’une construction sociale tributaire de son environnement institutionnel et marchand. Des ondes aux écrans, de l’Europe aux États-Unis, des managers aux artistes, il met en évidence le balancement entre artisanat de création et recherche d’une structuration économique pérenne. En mettant à distance la dénonciation ritualisée de l’hégémonie des majors et autres « grands groupes » et en s’appuyant sur des terrains ancrés dans différents contextes nationaux, ce livre fait le pari d’une approche transversale pour mieux saisir la manière dont l’indépendance irrigue et structure des filières trop souvent envisagées de manière cloisonnée. Il éclaire ainsi une catégorie de référence des industries culturelles paradoxalement peu étudiée par les sciences sociales, et permet de saisir l’évolution des rapports de force dans des secteurs confrontés à une rationalisation économique et à des mutations technologiques de grande ampleur.


La sainte famille des Cahiers du cinéma

Olivier Alexandre

Vrin, Philosophie et cinéma, 2018

Plus célèbre revue de cinéma au monde, les « Cahiers » occupent une place singulière dans le domaine de la critique. De crises en renaissances, ils continuent d’incarner un passé élevé au rang de mythe. Leur capacité à marier les contraires, entre gloire et marginalité, sens aigu de l’histoire et rendezvous manqués, révèle la part tragique du critique : ce travailleur sans métier, auteur sans profession, ni cinéaste ni enseignant, pas tout à fait journaliste ni complétement écrivain. À partir d’une enquête auprès de collaborateurs passés par les Cahiers du cinéma au cours des 50 dernières années, ce livre propose une réponse à cette question laissée en suspens depuis leur fondation : qu’est-ce qu’un critique?


Qualified personalities: Sociology of the French Media Government from Cinema to the Digital Era

Olivier Alexandre

Chapter in Reconceptualising Film Policies, 2017

The nature of French audiovisual sector is determined by a layering of policies, created at various periods of time. A public policy system has been continuously developed and adapted since the 1950s, mostly focusing on the support to and defence of the artistic and moral quality of film and television programmes. This institutional system has relied on ‘qualified personalities’ emanating from diverse sectors such as cinema, television, arts, culture, education, administration and the political world. The chapter presents a sociological analysis of the French model matrix. It focuses on the revolving-door system and the policy-making personnel that have enforced a stable regulatory frame for audiovisual industries. The rise of digital operators and executives – more internationalised and engineering-solution oriented – is currently destabilising this ecosystem. There is an important generational, cultural, ideological and linguistic gap between the French ‘Media government’ and the management teams of the new players.


A general graph-based framework for top-N recommendation using content, temporal and trust information

Armel Jacques Nzekon Nzeko’o, Maurice Tchuente and Matthieu Latapy

Journal of Interdisciplinary Methodologies and Issues in Sciences, 2019

Recommending appropriate items to users is crucial in many e-commerce platforms that containimplicit data as users’ browsing, purchasing and streaming history. One common approach con-sists in selecting the N most relevant items to each user, for a given N, which is called top-Nrecommendation. To do so, recommender systems rely on various kinds of information, like itemand user features, past interest of users for items, browsing history and trust between users. How-ever, they often use only one or two such pieces of information, which limits their performance.In this paper, we design and implement GraFC2T2, a general graph-based framework to easilycombine and compare various kinds of side information for top-N recommendation. It encodescontent-based features, temporal and trust information into a complex graph, and uses personal-ized PageRank on this graph to perform recommendation. We conduct experiments on Epinionsand Ciao datasets, and compare obtained performances using F1-score, Hit ratio and MAP eval-uation metrics, to systems based on matrix factorization and deep learning. This shows that ourframework is convenient for such explorations, and that combining different kinds of informationindeed improves recommendation in general.


Spreading dynamics in a cattle trade network: Size, speed, typical profile and consequences on epidemic control strategies

Aurore Payen, Lionel Tabourier and Matthieu Latapy

PLOS ONE, 2019

Infections can spread among livestock notably because infected animals can be brought to uncontaminated holdings, therefore exposing a new group of susceptible animals to the dis- ease. As a consequence, the structure and dynamics of animal trade networks is a major focus of interest to control zoonosis. We investigate the impact of the chronology of animal trades on the dynamics of the process. Precisely, in the context of a basic SI model spread- ing, we measure on the French database of bovine transfers to what extent a snapshot- based analysis of the cattle trade networks overestimates the epidemic risks. We bring into light that an analysis taking into account the chronology of interactions would give a much more accurate assessment of both the size and speed of the process. For this purpose, we model data as a temporal network that we analyze using the link stream formalism in order to mix structural and temporal aspects. We also show that in this dataset, a basic SI spread- ing comes down in most cases to a simple two-phases scenario: a waiting period, with few contacts and low activity, followed by a linear growth of the number of infected holdings. Using this portrait of the spreading process, we identify efficient strategies to control a potential outbreak, based on the identification of specific elements of the link stream which have a higher probability to be involved in a spreading process.


Degree-based Outlier Detection within IP Traffic Modelled as a Link Stream (extended version)

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

Computer Networks, 2019

This paper aims at precisely detecting and identifying anomalous events in IP traffic. To this end, we adopt the link stream formalism which properly captures temporal and structural features of the data. Within this framework, we focus on finding anomalous behaviours with respect to the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to directly find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to several MAWI captures of IP traffic and we show that it succeeds in detecting relevant patterns in terms of anomalous network activity.


Pattern Matching in Link Streams: a Token-based Approach

Clément Bertrand, Hanna Klaudel, Matthieu Latapy et Frédéric Peschanski

Petri Nets, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many ap- plications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algo- rithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical per- formances.


Combining path-constrained random walks to recover link weights in heterogeneous information networks

Hong-Lan Botterman and Robin Lamarche-Perrin

CompleNet, 2019

Heterogeneous information networks (HIN) are abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. Actually, this weight is approximated by a linear combination of probabilities, results of path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types which is commonly called a meta path, performed on the HIN. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter data show the applicability of the method.


Multidimensional Outlier Detection in Interaction Data: Application to Political Communication on Twitter

Audrey Wilmet and Robin Lamarche-Perrin

CompleNet, 2019

We introduce a method which aims at getting a better understanding of how millions of interactions may result in global events. Given a set of dimensions and a context, we find different types of outliers: a user during a given hour which is abnormal compared to its usual behavior, a relationship between two users which is abnormal compared to all other relationships, etc. We apply our method on a set of retweets related to the 2017 French presidential election and show that one can build interesting insights regarding political organization on Twitter.


RankMerging: a supervised learning-to-rank framework to predict links in large social networks

Lionel Tabourier, Daniel F. Bernardes, Anne-Sophie Libert and Renaud Lambiotte

Machine Learning, 2019

Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised methods of ranking as well as standard supervised combination strategies. We also describe various properties of RankMerging, such as its computational complexity, its robustness to feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.


Listing k-cliques in Sparse Real-World Graphs

Maximilien Danisch, Oana Balalau and Mauro Sozio

WWW, 2018

Motivated by recent studies in the data mining community whichrequire to efficiently list allk-cliques, we revisit the iconic algorithmof Chiba and Nishizeki and develop the most efficient parallel algo-rithm for such a problem. Our theoretical analysis provides the bestasymptotic upper bound on the running time of our algorithm forthe case when the input graph is sparse. Our experimental evalua-tion on large real-world graphs shows that our parallel algorithm isfaster than state-of-the-art algorithms, while boasting an excellentdegree of parallelism. In particular, we are able to list allk-cliques(for anyk) in graphs containing up to tens of millions of edges aswell as all10-cliques in graphs containing billions of edges, withina few minutes and a few hours respectively. Finally, we show howour algorithm can be employed as an effective subroutine for find-ing thek-clique core decomposition and an approximatek-clique densest subgraphs in very large real-world graphs.


L’analyse des opinions politiques sur Twitter : Défis et opportunités d’une approche multi-échelle

Marta Severo and Robin Lamarche-Perrin

Revue française de sociologie: Big Data, Sociétés et Sciences Sociales, 2018

Des blogs et forums aux pages Facebook et comptes Twitter, le récent déluge des données numériques du web a fortement affecté la recherche en sciences sociales. Cette nouvelle catégorie d’information, utile à l’extraction des opinions politiques, se présente comme une alternative aux techniques traditionnelles telles que les sondages. Premièrement, en réalisant un état de l’art des études de l’opinion s’appuyant sur les données Twitter, cet article vise à mettre en relation les méthodes d’analyse utilisées dans ces études et les définitions de l’opinion politique qui y sont suggérées. Deuxièmement, cet article étudie la faisabilité de réaliser des analyses multi-échelles en sciences sociales concernant l’étude de l’opinion politique en exposant les mérites de plusieurs méthodes, allant des méthodes orientées contenus aux méthodes orientées interactions, de l’analyse statistique à l’analyse sémantique, des approches supervisées aux approches non supervisées. Le résultat de notre démarche est ainsi d’identifier les tendances futures de la recherche en sciences sociales concernant l?étude de l’opinion politique.


An information-theoretic framework for the lossy compression of link streams

Robin Lamarche-Perrin

Theoretical Computer Science, 2018

Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more concise structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, that is for the compression of graphs that evolve with time. This framework first builds on a simple and limited scheme, exploiting structural equivalence for the lossless compression of static graphs, then generalises it to the lossy compression of link streams, a recent formalism for the study of temporal graphs. Such generalisation builds on the natural extension of (bidimensional) relational data by the addition of a third temporal dimension. Moreover, we introduce an information-theoretic measure to quantify and to control the information that is lost during compression, as well as an algebraic characterisation of the space of possible compression patterns to enhance the expressiveness of the initial compression scheme. These contributions lead to the definition of a combinatorial optimisation problem, that is the Lossy Multistream Compression Problem, for which we provide an exact algorithm.


Comparaison des méthodes de classification pour l’identification des noeuds importants dans les graphes dynamiques

Marwan Ghanem

Rencontres jeunes chercheurs en RI, 2019

De nos jours, nous nous intéressons à la détection d’entités importantes, ceci peut être des mots-clés importants dans un document ou Twitter, ou des individus importants dans un réseau de mouvement. Nous pouvons modéliser ces données sous la forme d’un graphe dynamique et utiliser des métriques de centralité telle que la centralité de proximité temporelle. Malheureusement, cela peut être coûteux. Dans ce travail, nous comparons la précision de plusieurs méthodes de classification supervisée, les unes par rapport aux autres, à la détection de ces nœuds importants. Sur seize jeux de données de natures différentes, nous montrons que ces méthodes réussissent à différencier les nœuds importants de nœuds insignifiants. Nous montrons également que prendre en compte la nature des données diminue la qualité de résultats. Enfin, nous examinons le temps du calcul de chacune de ces méthodes contre le temps du calcul de méthodes exact.