Combining path-constrained random walks to recover link weights in heterogeneous information networks

Hong-Lan Botterman and Robin Lamarche-Perrin

CompleNet, 2019

Heterogeneous information networks (HIN) are abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. Actually, this weight is approximated by a linear combination of probabilities, results of path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types which is commonly called a meta path, performed on the HIN. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter data show the applicability of the method.

Download

Multidimensional Outlier Detection in Interaction Data: Application to Political Communication on Twitter

Audrey Wilmet and Robin Lamarche-Perrin

CompleNet, 2019

We introduce a method which aims at getting a better understanding of how millions of interactions may result in global events. Given a set of dimensions and a context, we find different types of outliers: a user during a given hour which is abnormal compared to its usual behavior, a relationship between two users which is abnormal compared to all other relationships, etc. We apply our method on a set of retweets related to the 2017 French presidential election and show that one can build interesting insights regarding political organization on Twitter.

Download

RankMerging: a supervised learning-to-rank framework to predict links in large social networks

Lionel Tabourier, Daniel F. Bernardes, Anne-Sophie Libert and Renaud Lambiotte

Machine Learning, 2019

Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised methods of ranking as well as standard supervised combination strategies. We also describe various properties of RankMerging, such as its computational complexity, its robustness to feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.

Download

An information-theoretic framework for the lossy compression of link streams

Robin Lamarche-Perrin

Theoretical Computer Science, 2019

Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more concise structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, that is for the compression of graphs that evolve with time. This framework first builds on a simple and limited scheme, exploiting structural equivalence for the lossless compression of static graphs, then generalises it to the lossy compression of link streams, a recent formalism for the study of temporal graphs. Such generalisation builds on the natural extension of (bidimensional) relational data by the addition of a third temporal dimension. Moreover, we introduce an information-theoretic measure to quantify and to control the information that is lost during compression, as well as an algebraic characterisation of the space of possible compression patterns to enhance the expressiveness of the initial compression scheme. These contributions lead to the definition of a combinatorial optimisation problem, that is the Lossy Multistream Compression Problem, for which we provide an exact algorithm.

Comparaison des méthodes de classification pour l’identification des noeuds importants dans les graphes dynamiques

Marwan Ghanem

Rencontres jeunes chercheurs en RI, 2019

De nos jours, nous nous intéressons à la détection d’entités importantes, ceci peut être des mots-clés importants dans un document ou Twitter, ou des individus importants dans un réseau de mouvement. Nous pouvons modéliser ces données sous la forme d’un graphe dynamique et utiliser des métriques de centralité telle que la centralité de proximité temporelle. Malheureusement, cela peut être coûteux. Dans ce travail, nous comparons la précision de plusieurs méthodes de classification supervisée, les unes par rapport aux autres, à la détection de ces nœuds importants. Sur seize jeux de données de natures différentes, nous montrons que ces méthodes réussissent à différencier les nœuds importants de nœuds insignifiants. Nous montrons également que prendre en compte la nature des données diminue la qualité de résultats. Enfin, nous examinons le temps du calcul de chacune de ces méthodes contre le temps du calcul de méthodes exact.

Download

Easy-Mention: a model-driven mention recommendation heuristic to boost your tweet popularity

Soumajit Pramanik, Mohit Sharma, Maximilien Danisch, Qinna Wang, Jean‑Loup Guillaume, Bivas Mitra

International Journal of Data Science and Analytics, vol. 7 (2), 2018

This paper investigates the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR MF based on a multiplex network framework which allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets, and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate and profile similarity are important factors for gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we experimentally identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data-driven simulations demonstrate that the proposed mention recommendation heuristic Easy-Mention outperforms the benchmark Whom-To-Mention algorithm.

Download

A Modular Overlapping Community Detection Algorithm: Investigating the « From Local to Global » Approach

Maximilien Danisch, Noé Gaumont, Jean‑Loup Guillaume

16th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, 2018

We propose an overlapping community detection algorithm following a “from local to global approach”: our algorithm finds local communities one by one by repetitively optimizing a quality function that measures the quality of a community. Then, as some extracted local communities can be very similar to each-other, a cleaning procedure is applied to obtain the global overlapping community structure. Our algorithm depends on three modules: (i) a quality function, (ii) an optimization heuristic and (iii) a cleaning procedure. Various such modules can be independently plugged in. We show that, using default modules, our algorithm improves over a state-of-the-art method on some real-world graphs with ground truth communities. In the future we would like to study which combination of modules performs best in practice and make our code parallel.

Download

Pattern Matching in Link Streams: a Token-based Approach

Clément Bertrand, Hanna Klaudel, Matthieu Latapy et Frédéric Peschanski

Petri Nets, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many ap- plications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algo- rithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical per- formances.

Listing k-cliques in Sparse Real-World Graphs

Maximilien Danisch, Oana Balalau and Mauro Sozio

WWW, 2018

Motivated by recent studies in the data mining community whichrequire to efficiently list allk-cliques, we revisit the iconic algorithmof Chiba and Nishizeki and develop the most efficient parallel algo-rithm for such a problem. Our theoretical analysis provides the bestasymptotic upper bound on the running time of our algorithm forthe case when the input graph is sparse. Our experimental evalua-tion on large real-world graphs shows that our parallel algorithm isfaster than state-of-the-art algorithms, while boasting an excellentdegree of parallelism. In particular, we are able to list allk-cliques(for anyk) in graphs containing up to tens of millions of edges aswell as all10-cliques in graphs containing billions of edges, withina few minutes and a few hours respectively. Finally, we show howour algorithm can be employed as an effective subroutine for find-ing thek-clique core decomposition and an approximatek-clique densest subgraphs in very large real-world graphs.

La sainte famille des Cahiers du cinéma

Olivier Alexandre

Vrin, Philosophie et cinéma, 2018

Plus célèbre revue de cinéma au monde, les « Cahiers » occupent une place singulière dans le domaine de la critique. De crises en renaissances, ils continuent d’incarner un passé élevé au rang de mythe. Leur capacité à marier les contraires, entre gloire et marginalité, sens aigu de l’histoire et rendezvous manqués, révèle la part tragique du critique : ce travailleur sans métier, auteur sans profession, ni cinéaste ni enseignant, pas tout à fait journaliste ni complétement écrivain. À partir d’une enquête auprès de collaborateurs passés par les Cahiers du cinéma au cours des 50 dernières années, ce livre propose une réponse à cette question laissée en suspens depuis leur fondation : qu’est-ce qu’un critique?

L’analyse des opinions politiques sur Twitter : Défis et opportunités d’une approche multi-échelle

Marta Severo and Robin Lamarche-Perrin

Revue française de sociologie: Big Data, Sociétés et Sciences Sociales, 2018

Des blogs et forums aux pages Facebook et comptes Twitter, le récent déluge des données numériques du web a fortement affecté la recherche en sciences sociales. Cette nouvelle catégorie d’information, utile à l’extraction des opinions politiques, se présente comme une alternative aux techniques traditionnelles telles que les sondages. Premièrement, en réalisant un état de l’art des études de l’opinion s’appuyant sur les données Twitter, cet article vise à mettre en relation les méthodes d’analyse utilisées dans ces études et les définitions de l’opinion politique qui y sont suggérées. Deuxièmement, cet article étudie la faisabilité de réaliser des analyses multi-échelles en sciences sociales concernant l’étude de l’opinion politique en exposant les mérites de plusieurs méthodes, allant des méthodes orientées contenus aux méthodes orientées interactions, de l’analyse statistique à l’analyse sémantique, des approches supervisées aux approches non supervisées. Le résultat de notre démarche est ainsi d’identifier les tendances futures de la recherche en sciences sociales concernant l?étude de l’opinion politique.

Caractériser lanalogie entre automates cellulaires déterministes et systèmes physiques

Lionel Tabourier

Lato Sensu, Revue de la Société de Philosophie des Sciences, Vol.5, n°2, 2018

La classification de Wolfram des automates cellulaires déterministes repose sur l’analogie entre le comportement dynamique des automates et celui de systèmes physiques au cours d’une transition de phase. Pour évaluer la valeur scientifique de la classification, longuement débattue, on doit s’interroger sur les caractéristiques de cette analogie. Nous établissons ici quels éléments, présents dans les transitions de phase, n’ont pas d’équivalent dans le domaine des automates. Ensuite, nous discutons la notion de potentiel d’une analogie en la comparant à deux autres exemples de la littérature.

Centrality metrics in dynamic networks: a comparison study

Marwan Ghanem, Clémence Magnien and Fabien Tarissan

IEEE Transactions on network science and engineering, 2018

For a long time, researchers have worked on defining different metrics able to characterize the importance of nodes in static networks. Recently, researchers have introduced extensions that consider the dynamics of networks. These extensions study the time-evolution of the importance of nodes, which is an important question that has yet received little attention in the context of temporal networks. They follow different approaches for evaluating a node’s importance at a given time and the value of each approach remains difficult to assess. In order to study this question more in depth, we compare in this paper a method we recently introduced to three other existing methods. We use several datasets of different nature, and show and explain how these methods capture different notions of importance. We also show that in some cases it might be meaningless to try to identify nodes that are globally important. Finally, we highlight the role of inactive nodes, that still can be important as a relay for future communications.

Download

Quantifying the diversity in users activity: an example study on online music platforms

Poulain Rémy and Tarissan Fabien

SNAMS, 2018

Whether it be through a problematic related to information ranking (e.g. search engines) or content recommendation (on social networks for instance), algorithms are at the core of processes selecting which information is made visible. Those algorithmic choices have in turn a strong impact on users activity and therefore on their access to information. This raises the question of measuring the quality of the choices made by algorithms and their impact on the users. As a first step into that direction, this paper presents a framework to analyze the diversity of the information accessed by the users. By depicting the activity of the users as a tripartite graph mapping users to products and products to categories, we analyze how categories catch users attention and in particular how this attention is distributed. We then propose the \emph{(calibrated) herfindahl diversity} score as a metric quantifying the extent to which this distribution is diverse and representative of the existing categories. In order to validate this approach, we study a dataset recording the activity of users on online music platforms. We show that our score enables to discriminate between very specific categories that capture dense and coherent sub-groups of listeners, and more generic categories that are distributed on a wider range of users. Besides, we highlight the effect of the volume of listening on users attention and reveal a saturation effect above a certain threshold.

Download

Pattern Matching in Link Streams: Timed-Automata with Finite Memory

Clément Bertrand, Frédéric Peschanski, Hanna Klaudel and Matthieu Latapy

Scientific Annals of Computer Science, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many applications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algorithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical performances.

Download

Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

TMA Conference 2018, Vienna

Precise detection and identification of anomalous events in IP traffic are crucial in many applications. This paper intends to address this task by adopting the link stream formalism which properly captures temporal and structural features of the data. Within this framework we focus on finding anomalous behaviours with the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to a MAWI capture of IP traffic and we show that it succeeds at detecting relevant patterns in terms of anomalous network activity.Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Download

OLCPM: An Online Framework for Detecting Overlapping Communities in Dynamic Social Networks

Souâad Boudebza, Rémy Cazabet, Faiçal Azouaou and Omar Noual

Computer Communications, Elsevier, In press, 2018.

Community structure is one of the most prominent features of complex networks. Community structure detection is of great importance to provide insights into the network structure and functionalities. Most proposals focus on static networks. However, finding communities in a dynamic network is even more challenging, especially when communities overlap with each other. In this article , we present an online algorithm, called OLCPM, based on clique percolation and label propagation methods. OLCPM can detect overlapping communities and works on temporal networks with a fine granularity. By locally updating the community structure, OLCPM delivers significant improvement in running time compared with previous clique percolation techniques. The experimental results on both synthetic and real-world networks illustrate the effectiveness of the method.

Download

Enumerating maximal cliques in link streams with durations

Tiphaine Viard, Clémence Magnien, and Matthieu Latapy

Information Processing Letters 133 (2018), p. 44-48

Link streams model interactions over time, and a clique in a link stream is defined as a set of nodes and a time interval such that all pairs of nodes in this set interact permanently during this time interval. This notion was introduced recently in the case where interactions are instantaneous. We generalize it to the case of interactions with durations and show that the instantaneous case actually is a particular case of the case with durations. We propose an algorithm to detect maximal cliques that improves our previous one for instantaneous link streams, and performs better than the state of the art algorithms in several cases of interest.

Download

Discovering Patterns of Interest in IP Traffic Using Cliques in Bipartite Link Streams

Tiphaine Viard, Raphaël Fournier-S’niehotta, Clémence Magnien and Matthieu Latapy

International Conference on Complex Networks (CompleNet), 2018.

Studying IP traffic is crucial for many applications. We focus here on the detection of (structurally and temporally) dense sequences of interactions, that may indicate botnets or coordinated network scans. More precisely, we model a MAWI capture of IP traffic as a link streams, i.e. a sequence of interactions (t1,t2,u,v) meaning that devices u and v exchanged packets from time t1 to time t2 . This traffic is captured on a single router and so has a bipartite structure: links occur only between nodes in two disjoint sets. We design a method for finding interesting bipartite cliques in such link streams, i.e. two sets of nodes and a time interval such that all nodes in the first set are linked to all nodes in the second set throughout the time interval. We then explore the bipartite cliques present in the considered trace. Comparison with the MAWILab classification of anomalous IP addresses shows that the found cliques succeed in detecting anomalous network activity.