Intrinsically dynamic communities from evolving, directed network data

Bivas Mitra, Lionel Tabourier and Camille Roth

Computer Networks, Vol. 56(3), 2012.

Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that commu- nities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phe- nomena—such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed data- sets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection—taking into account the fact that various empir- ical contexts may call for distinct ‘community’ definitions and detection criteria.

Download

File Diffusion in a Dynamic Peer-to-peer Network

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand

Proceedings of the  Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France

Many studies have been made on diffusion in the field of epidemiology, and in the last few years, the development of social networking has induced new types of diffusion. In this paper, we focus on file diffusion on a peer-to-peer dynamic network using eDonkey protocol. On this network, we observe a linear behavior of the actual file diffusion. This result is interesting, because most diffusion models exhibit exponential behaviors. In this paper, we propose a new model of diffusion, based on the SI (Susceptible / Infected) model, which produces results close to the linear behavior of the observed diffusion. We then justify the linearity of this model, and we study its behavior in more details.

Download

Quantifying Paedophile Activity in a Large P2P System

Matthieu Latapy, Clémence Magnien et Raphaël Fournier

in Information Processing and Management, Volume 49, Issue 1, January 2013, Pages 248–263

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Download

Impact of power-law topology on IP-level routing dynamics: simulation results

Amélie Medem, Clémence Magnien and Fabien Tarissan

Fourth International Workshop on Network Science for Communication Networks (NetSciCom), 2012

This paper focuses on the Internet IP-level routing topology and proposes relevant explanations to its apparent dynamics.We first represent this topology as a power-law random graph. Then, we incorporate to the graph two well known factors responsible for the observed dynamics, which are load balancing and route evolution. Finally, we simulate on the graph traceroute-like measurements. Repeating the process many times, we obtain several graph instances that we use to model the dynamics. Our results show that we are able to capture on power-law graphs the dynamic behaviors observed on the Internet. We find that the results on power-law graphs, while qualitatively similar to the one of  Erdös-Rényi  random graphs, highly differ quantitatively; for instance, the rate of discovery of new nodes in power-law graphs is extremely low compared to the rate in Erdös-Rényi graphs.

Download

Internal links and pairs as a new tool for the analysis of bipartite complex networks

Oussama Allali, Lionel Tabourier, Clémence Magnien, Matthieu Latapy

Social Network Analysis and Mining, March 2013, Volume 3, Issue 1, pp 85-91.

Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) from the underlying bipartite structure, though it causes important loss of information and data storage issues. We introduce here internal links and pairs as a new notion useful for a bipartite analysis, and which gives insights on the information lost by projecting the bipartite graph. We illustrate the relevance of theses concepts on several real-world instances illustrating how it enables to discriminate behaviors among various cases when we compare them to a benchmark of random graphs. Then, we show that we can draw benefit from this concept for both modeling complex networks and storing them in a compact format.

Download

Directedness of Information Flow in Mobile Phone Communication Networks

Fernando Peruani and Lionel Tabourier

PLoS ONE, Vol 6(12), 2011

Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study causality trees in mobile-phone data, which can be represented as a dynamical directed network. This representation of the data reveals the existence of super-spreaders and super-receivers. We show that the tree statistics, respectively the information spreading process, are extremely sensitive to the in-out degree correlation exhibited by the users. We also learn that a given information, e.g., a rumor, would require users to retransmit it for more than 30 hours in order to cover a macroscopic fraction of the system. Our analysis indicates that topological node-node correlations of the underlying social network, while allowing the existence of information loops, they also promote information spreading. Temporal correlations, and therefore causality effects, are only visible as local phenomena and during short time scales. Consequently, the very idea that there is (intentional) information spreading beyond a small vicinity is called into question. These results are obtained through a combination of theory and data analysis techniques.

Download

Détection visuelle dévénements dans des grands réseaux dinteraction dynamiques. Application à lInternet

Bénédicte Le Grand et Matthieu Latapy

Atelier EGC 2011 Visualisation et Extraction de Connaissances, Brest, France, Janvier 2011.

L’objectif des travaux présentés dans ce papier est de faciliter la détection visuelle d’événements dans des réseaux d’interaction dynamiques de grande taille. Deux méthodes de visualisation classiques et «exhaustives» ont été étudiées, qui repré-sentent l’évolution des liens du réseau au fil du temps. Les limites liées au facteur d’échelle nous ont conduits à proposer deux métaphores restreintes au suivi des noeuds du réseau. Les forces, les limites et la complémentarité de ces quatre métaphores nous ont permis de déga-ger une ébauche de méthodologie de détection d’événements dans la dynamique de grands réseaux d’interaction. Les visualisations et la méthodologie présentées dans cet article sont génériques et appli-cables à tout type de noeuds et de liens ; elles sont ici appliquées pour illustration à un sous-ensemble du réseau Internet.

Download

Extracting and Visualizing Tree-Like Structures from Concept Lattices

Cassio Melo, Benedicte Le Grand, Marie-Aude Aufaure, and Anastazia Berezianos

IV’2011 conference on Information Visualization, London, July 2011.

Concept lattices built with Formal Concept Analysis are usually represented by Hasse diagrams illustrating the groupings of objects described by common attributes. Hasse diagrams display the relations of partial order between concepts in a hierarchical fashion, where each concept may have several parent concepts. Lattice visualization becomes a problem as the number of clusters grows significantly with the number of objects and attributes. Interpreting the lattice through a direct visualization of the line diagram rapidly becomes impossible and more synthetic representations are needed. In this work we propose several methods to enhance the readability of concept lattices firstly though colouring and distortion techniques, and secondly by extracting and visualizing trees derived from concept lattices structures.

Download

Citations among blogs in a hierarchy of communities: method and case study

Abdelhamid Salah brahim, Bénédicte Le Grand, Lionel Tabourier, Matthieu Latapy

Journal of Computational Science, Vol 2(3), 2011

How does the structure of a network (e.g. its organization into groups or communities) impact the interaction among its nodes? In this paper we propose a generic methodology to study the correlation between complex networks interactions and their community structure. We illustrate it on a blog network and focus on citation links. We first define a homophily probability evaluating the tendency of blogs to ite blogs from the same communities. We then introduce the notion of community distance to capture if a blog cites (or is cited by) blogs distant or not from its community. We analyze the distribution of distances corresponding to each citation link, and use it to build maps of relevant communities which help interpreting blogs interactions. This community-oriented approach allows to study citation links at various abstraction levels, and conversely, enable us to characterize communities with regard to their citation behaviour.

Download

Extraction hiérarchique de fenêtres de temps basée sur la structure communautaire

Thomas Aynaud and Jean-Loup Guillaume

in Proceedings of MARAMI 2011

Dans cet article nous décrivons une méthode de décomposition du temps en fenêtres de temps dans un graphe dynamique. Une particularité de la méthode est que le résultat est un regroupement hiérarchique : les fenêtres de temps sont elles-mêmes susceptibles d’en contenir. En outre, les fenêtres n’ont pas besoin d’être contiguës ce qui permet par exemple de détecter une structure se répétant. De plus, chaque fenêtre est associée à une décomposition en communautés représentant la structure topologique du réseau durant cette fenêtre. Nous appliquons ensuite cette méthode à trois graphes de terrain dynamiques ayant des caractéristiques différentes pour montrer que les fenêtres identifiées correspondent bien à des phénomènes observables. In this paper, we describe a way to cluster the time in time windows in a dynamic network. The result is a tree and thus time windows can themselves contain smaller ones. Moreover, the windows do not have to be consecutive and this allows for instance to detect repeated structure. Each window is also associated to a community decomposition that represents the topological structure of the network during this window. We then apply the method to three dynamic networks to show that observed time windows correspond to observable phenomena.

Download

Phénomènes de diffusion dans les réseaux dynamiques : simulation et modélisation

Alice Albano

JFGG 2011

Les phénomènes de diffusion sont présents dans de nombreux contextes: diffusion d’épidémies, de virus informatiques, d’information dans des réseaux sociaux, etc. Bien que les réseaux où se produit la diffusion soient souvent dynamiques, cette dynamique n’est pas prise en compte dans la plupart des modèles existants. L’objectif de ces travaux est de proposer des modèles de diffusion, et d’étudier l’impact de la dynamique du réseau sur la diffusion.

Download

Internal links prediction: a new approach for predicting links in bipartite graphs

Oussama Allali, Clémence Magnien and Matthieu Latapy

Dynamic Networks and Knowledge Discovery, special issue of Intelligent Data Analysis, 7 (1), 5-25, 2013

Many real-world complex networks, like actor-movie or le-provider relations, have a bipartite nature and evolve over time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specic challenges it raises. We dene in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We thoroughly describe the method and its variations, and experimentally compare it to a basic collaborative ltering approach. We present results obtained for a typical practical case. We reach the conclusion that our method performs very well, and we study in details how its parameters may inuence obtained results.

Download

Multi-Step Community Detection and Hierarchical Time Segmentation in Evolving Networks

Thomas Aynaud and Jean-Loup Guillaume

proceedings of the Fifth SNA-KDD Workshop Social Network Mining and Analysis, in conjunction with the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011)

Many complex systems composed of interacting objects  like social networks or the web can be modeled as graphs. They can usually be divided in dense sub-graphs with few links between them, called communities and detecting this underlying community structure may have a major impact in the understanding of these systems. We focus here on evolving graphs, for which the usual approach is to represent the state of the system at different time steps and to compute communities independently on the graph obtained at each time step. We propose in this paper to use a different framework: instead of detecting communities on each time step, we detect a unique decomposition in communities that is relevant for (almost) every time step during a given period called the time window.  We propose a definition of this new decomposition and two algorithms to detect it quickly. We validate both the approach and the algorithms on three evolving networks of different kinds showing that the quality loss at each time step is very low despite the constraint of maximization on several time steps. Since the time window length is a crucial parameter of our technique, we also propose an unsupervised hierarchical clustering algorithm to build automatically a hierarchical time segmentation into time windows. This clustering relies on a new similarity measure based on community structure. We show that it is very efficient in detecting meaningful windows.

Download

A Real-World Spreading Experiment in the Blogosphere

Adrien Friggeri, Jean-Philippe Cointet and Matthieu Latapy

Complex Systems 19, 2011

We designed an experiment to observe a spreading phenomenon in the blogosphere. This experiment relies on a small applet that participants copy on their own web page. We present the obtained dataset, which we freely provide for study, and conduct basic analysis. We conclude that, despite the classical assumption, in this experiment famous blogs do not necessarily act as super spreaders.

Download

Post-processing hierarchical community structures: Quality improvements and multi-scale view

Pascal Pons, Matthieu Latapy

Theoretical Computer Science (TCS) 412(8-10): 892-900 (2011)

Dense sub-graphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Most existing community detection algorithms produce a hierarchical structure of communities and seek a partition into communities that optimizes a given quality function. We propose new methods to improve the results of any of these algorithms. First we show how to optimize a general class of additive quality functions (containing the modularity, the performance, and a new similarity-based quality function which we propose) over a larger set of partitions than the classical methods. Moreover, we define new multi-scale quality functions which make it possible to detect different scales at which meaningful community structures appear, while classical approaches find only one partition.

Download

A Radar for the Internet

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo

Complex Systems, 20 (1), 23-30, 2011.

Mapping the internet’s topology is a challenge in itself, and studying its dynamics is even more difficult. Achieving this would however provide key information on the nature of the internet, crucial for modeling and simulation. Moreover, detecting anomalies in this dynamics is a key issue for security. We introduce here a new measurement approach which makes it possible to capture internet dynamics at a scale of a few minutes in a radar-like manner. By conducting and analyzing large-scale measurements of this kind, we rigorously and automatically detect events in the observed dynamics, which is totally out of reach of previous approaches.

Download

Link prediction in bipartite graphs using internal links and weighted projection

Oussama Allali, Clémence Magnien and Matthieu Latapy

Proceedings of the third International Workshop on Network Science for Communication Networks (Netscicom 2011), In conjunction with IEEE Infocom 2011.

Many real-world complex networks, like client-product or file-provider relations, have a bipartite nature and evolve during time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specific challenges it raises. We define in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We describe the method and experimentally compare it to a basic collaborative filtering approach. We present results obtained for two typical practical cases. We reach the conclusion that our method performs very well, and that internal links play an important role in bipartite graphs and their dynamics.

Download

Quantifying paedophile queries in a large P2P system

Matthieu Latapy, Clémence Magnien and Raphaël Fournier

IEEE Infocom Mini-Conference 2011, Shanghai

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Download

Optimisation locale multi-niveaux de la modularité

Thomas Aynaud, Vincent Blondel, Jean-Loup Guillaume and Renaud Lambiotte

in Partitionnement de graphe : optimisation et applications, Traité IC2, Hermes-Lavoisier 2011

Dans ce chapitre, nous présentons une méthode gloutonne pour optimiser la modularité d’un graphe. Cette méthode de partionnement permet de traiter avec une excellente précision des systèmes de taille inégalée, allant jusqu’à plusieurs milliards de liens. Notre algorithme a de surcroît l’avantage de ne pas être limité à l’optimisation de la modularité puisqu’il peut être généralisé à d’autres fonctions de qualité, et de découvrir des communautés à différentes échelles. Les performances de l’algorithme sont évaluées sur des graphes artificiels pour lesquels la structure communautaire est connue, ainsi que sur des graphes de terrain réels.

Long range community detection

Thomas Aynaud and Jean-Loup Guillaume

Latin-American Workshop on Dynamic Networks (LAWDN), Buenos Aires, 2010

Complex networks can usually be divided in dense subnetworks called communities. In evolving networks, the usual way to detect communities is to find several partitions independently, one for each time step. However, this generally causes troubles when trying to track communities from one time step to the next. We propose here a new method to detect only one decomposition in communities that is good for (almost) every time step. We show that this unique partition can be computed with a modification of the Louvain method and that the loss of quality at each time step is generally low despite the constraint of global maximization. We also show that some specific modifications of the networks topology can be identified using this unique partition in the case of the Internet topology.

Download