Armel Jacques Nzekon Nzeko’o, Maurice Tchuente, Matthieu Latapy
In WEBIST 2017
Time Weight Content-Based Extensions of Temporal Graphs for Personalized Recommendation

Recommender systems are an answer to information overload on the web. They filter and present to customer, a small subset of items that he is most likely to be interested in. Since user’s interests may change over time, accurately capturing these dynamics is important, though challenging. The Session-based Temporal Graph (STG) has been proposed by Xiang et al. to provide temporal recommendations by combining long- and short-term preferences. Later, Yu et al. have introduced an extension called Topic-STG, which takes into account topics extracted from tweets’ textual information. Recently, we pushed the idea further and proposed Content-based STG. However, in all these frameworks, the importance of links does not depend on their arrival time, which is a strong limitation: at any given time, purchases made last week should have a greater influence than purchases made a year ago. In this paper, we address this problem by proposing Time Weight Content-based STG, in which we assign a time-decreasing weight to edges. Using Time-Averaged Hit Ratio, we show that this approach outperforms all previous ones in real-world situations.

In J Complex Netw (2015) 4 (1): 15-37
Efficient and simple generation of random simple connected graphs with prescribed degree sequence

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm designed for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we argue rigorously and which was confirmed by strong empirical evidence, we finally reduce the best asymptotic complexity bound known so far.

Remy Cazabet, Pierre Borgnat, Pablo jensen
In ESANN 2017
Using Degree Constrained Gravity Null-Models to understand the structure of journeys’ networks in Bicycle Sharing Systems

Bicycle Sharing Systems are now ubiquitous in large cities around the world. In most of these systems, journeys’ data can be ex- tracted, providing rich information to better understand it. Recent works have used network based-machine learning, and in particular space-corrected node clustering, to analyse such datasets. In this paper, we show that spatial-null models used in previous methods have a systematic bias, and we propose a degree-contrained null-model to improve the results. We finally apply the proposed method on the BSS of a city.

Remy Cazabet, Pierre Borgnat, Pablo Jensen
In Complenet 2017
Enhancing Space-Aware Community Detection Using Degree Constrained Spatial Null Model

Null models have many applications on networks, from testing the significance of observations to the conception of algorithms such as community detection. They usually preserve some network properties , such as degree distribution. Recently, some null-models have been proposed for spatial networks, and applied to the community detection problem. In this article, we propose a new null-model adapted to spatial networks, that, unlike previous ones, preserves both the spatial structure and the degrees of nodes. We show the efficacy of this null-model in the community detection case both on synthetic and collected networks.

Jean Creusefond, Remy Cazabet
In Complenet 2017
Characterising inter and intra-community interactions in link streams using temporal motifs

The analysis of dynamic networks has received a lot of attention in recent years, thanks to the greater availability of suitable datasets. One way to analyze such dataset is to study temporal motifs in link streams , i.e. sequences of links for which we can assume causality. In this article, we study the relationship between temporal motifs and communities , another important topic of complex networks. Through experiments on several real-world networks, with synthetic and ground truth community partitions, we identify motifs that are overrepresented at the frontier –or inside of– communities.

Matthieu Latapy, Elie Rotenberg, Christophe Crespelle, Fabien Tarissan
In Complex Systems, volume 26, number 1 (2017)
Rigorous Measurement of the Internet Degree Distribution

The degree distribution of the internet, i.e. the fraction of routers with k links for any k, is its most studied property. It has a crucial influence on network robustness, spreading phenomena, and protocol design. In practice, however, this distribution is observed on partial, biased and erroneous maps. This raises serious concerns about the true knowledge we actually have of this key property. Here, we design and run a drastically new measurement approach for the reliable estimation of the degree distribution of the internet, without resorting to any map. It consists in sampling random core routers and precisely estimating their degree with probes sent from many monitors scattered over the internet. Our measurement shows that the true degree distribution significantly differs from classical assumptions: it is heterogeneous but it decreases sharply, in a way incompatible with a heavy-tailed power law.

Noé Gaumont, Clémence Magnien and Matthieu Latapy
In Social Network Analysis and Mining (2016) 6: 87. doi:10.1007/s13278-016-0396-z
Finding remarkably dense sequences of contacts in link streams

A link stream is a set of quadruplets (b, e, u, v) meaning that a link exists between u and v from time b to time e. Link streams model many real-world situations like contacts between individuals, connections between devices, and others. Much work is currently devoted to the generalization of classical graph and network concepts to link streams. We argue that the density is a valuable notion for understanding and characterizing links streams. We propose a method to capture specific groups of links that are structurally and temporally densely connected and show that they are meaningful for the description of link streams. To find such groups, we use classical graph community detection algorithms, and we assess obtained groups. We apply our method to several real-world contact traces (captured by sensors) and demonstrate the relevance of the obtained structures.

Marwan Ghanem, Olivier Fourmaux, Fabien Tarissan and Takumi Miyoshi
In The 18th Asia-Pacific Network Operations and Management Symposium (APNOMS'16), Kanazawa, Japan, 2016.
P2PTV Multi-channel Peers Analysis

After being the support of the data and voice convergence, the Internet has become one of the main video providers such as TV-stream. As an  alternative to limited or expensive technologies, P2PTV has turned out to be a promising support for such applications. This infrastructure strongly relies on the overlay composed by the peers that consume and diffuse video contents at the same time. Understanding the dynamical properties of this overlay, and in particular how the users switch from one overlay to another, appears to be a key aspect if one wants to improve the quality of P2PTV. In this paper, we investigate the question of relying on non-invasive measurement techniques to track the presence of users on several channels of P2PTV. Using two datasets obtained by using network measurement on P2PTV infrastructure, we show that such an approach contains sufficient information to track the presence of users on several channels. Besides, exploiting the view provided by sliding time windows, we are able to refine the analysis and track users that switch from one channel to another, leading to the detection of super-peers and providing explanations of the different roles they can play in the infrastructure. In addition, by comparing the results obtained on the two datasets, we show how such analyses can shed some light on the evolution of the infrastructure policy.

Lionel Tabourier, Anne-Sophie Libert and Renaud Lambiotte
In EPJ Data Science (2016) 5: 1
Predicting links in ego-networks using temporal information

Link prediction appears as a central problem of network science, as it calls for unfolding the mechanisms that govern the micro-dynamics of the network. In this work, we are interested in ego-networks, that is the mere information of interactions of a node to its neighbors, in the context of social relationships. As the structural information is very poor, we rely on another source of information to predict links among egos’ neighbors: the timing of interactions. We define several features to capture different kinds of temporal information and apply machine learning methods to combine these various features and improve the quality of the prediction. We demonstrate the efficiency of this temporal approach on a cellphone interaction dataset, pointing out features which prove themselves to perform well in this context, in particular the temporal profile of interactions and elapsed time between contacts.

Keun-Woo Lim, Stefano Secci, Lionel Tabourier and Badis Tebbani
In Computer Communications, 2016, vol. 95, p. 82-94
Characterizing and predicting mobile application usage

In this paper, we propose data clustering techniques to predict temporal characteristics of data consumption behavior of different mobile applications via wireless communications. While most of the research on mobile data analytics focuses on the analysis of call data records and mobility traces, our analysis concentrates on mobile application usages, to characterize them and predict their behavior. We exploit mobile application usage logs provided by a Wi-Fi local area network service provider to characterize temporal behavior of mobile applications. More specifically, we generate daily profiles of “what” types of mobile applications users access and “when” users access them. From these profiles, we create usage classes of mobile applications via aggregation of similar profiles depending on data consumption rate, using three clustering techniques that we compare. Furthermore, we show that we can utilize these classes to analyze and predict future usages of each mobile application through progressive comparison using distance and similarity comparison techniques. Finally, we also detect and exploit outlying behavior in application usage profiles and discuss methods to efficiently predict them.

Raphaël Tackx, Jean-Loup Guillaume and Fabien Tarissan
In IEEE Ninth International Conference on Research Challenges in Information Science (RCIS'15), Athènes, Greece, 2015
Revealing intricate properties of communities in the bipartite structure of online social networks

Many real-world networks based on human activities exhibit a bipartite structure. Although bipartite graphs seem appropriate to analyse and model their properties, it has been shown that standard metrics fail to reproduce intricate patterns observed in real networks. In particular, the overlapping of the neighbourhood of communities is difficult to capture precisely. In this work, we tackle this issue by analysing the structure of 4 real-world networks coming from online social activities. We first analyse their structure using standard metrics. Surprisingly, the clustering coefficient turns out to be less relevant than the redundancy coefficient to account for overlapping patterns. We then propose new metrics, namely the dispersion and the monopoly coefficients, and show that they help refining the study of bipartite overlaps. Finally, we compare the results obtained on real networks with the ones obtained on random bipartite models. This shows that the patterns captured by the redundancy and the dispersion coefficients are strongly related to the real nature of the observed overlaps.

Fabien Tarissan and Raphaëlle Nollez-Goldbach
In 28th International Conference on Legal Knowledge and Information Systems (JURIX'15), Braga, Portugal, 2015.
Temporal properties of legal decision networks: a case study from the International Criminal Court

Many studies have proposed to apply artificial intelligence techniques to legal networks, whether it be for highlighting legal reasoning, resolving conflict or extracting information from legal databases. In this context, a new line of research has recently emerged which consists in considering legal decisions as elements of complex networks and conduct a structural analysis of the relations between the decisions. It has proved to be efficient for detecting important decisions in legal rulings. In this paper, we follow this approach and propose to extend structural analyses with temporal properties. We define in particular the notion of relative in-degree, temporal distance and average longevity and use those metrics to rank the legal decisions of the two first trials of the International Criminal Court. The results presented in this paper highlight non trivial temporal properties of those legal networks, such as the presence of decisions with an unexpected high longevity, and show the relevance of the proposed relative in-degree property to detect landmark decisions. We validate the outcomes by confronting the results to the one obtained with the standard in-degree property and provide juridical explanations of the decisions identified as important by our approach.

Noé Gaumont, Tiphaine Viard, Raphaël Fournier-S’niehotta, Qinna Wang and Matthieu Latapy
In Complex Networks VII: Proceedings of the 2016 Workshop on Complex Networks.
Analysis of the temporal and structural features of threads in a mailing-list

A link stream is a collection of triplets (t,u,v) indicating that an interaction occurred between u and v at time t. Link streams model many real-world situations like email exchanges between individuals, connections between devices, and others. Much work is currently devoted to the generalization of classical graph and network concepts to link streams. In this paper, we generalize the existing notions of intra-community density and inter-community density. We focus on emails exchanges in the Debian mailing list, and show that threads of emails, like communities in graphs, are dense subsets loosely connected from a link stream perspective.

Soumajit Pramanik, Qinna Wang, Maximilien Danisch, Mohit Sharma, Sumanth Bandi, Jean-Loup Guillaume, Stéphane Raux and Bivas Mitra
6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015
Augmenter les retweets sur Twitter : comment tirer parti des mentions ?

Alors que Twitter est devenu incontournable, la propagation des tweets et hashtags est toujours largement incomprise. Le propagation d’information sur Twitter est principalement due aux retweets et aux mentions mais, alors que les retweets ne permettent d’atteindre que les abonnés d’un individu, les mentions permettent d’atteindre n’importe qui directement. De nombreuses études ont montré que les mentions sont largement utilisées sur Twitter, mais surtout qu’elles sont fondamentales pour augmenter la popularité des tweets et hashtags. Des méthodes automatiques pour choisir les bons utilisateurs à mentionner pourraient donc permettre d’augmenter la visibilité des tweets. Dans cet article nous proposons un système de recommandation de mentions en temps réel pour aug- menter la popularité d’un tweet. Ce système est basé sur un modèle de propagation de tweet dans un graphe multiplexe construit à partir d’une étude de données réelles. Il permet de clairement faire la différence entre les propagations dues aux mentions et celles dues aux abonnements. Les simulations du modèle donnent des résultats similaires aux observations empiriques et sont également fondées sur des résultats analytiques. En utilisant ces différents résultats nous proposons une stratégie de recom- mandation effective et une application Twitter associée.

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand
6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015
Déplier la structure communautaire d’un réseau en mesurant la proximité aux représentants de communauté

Nous proposons un algorithme pour déplier la structure communautaire des grands graphes de terrain. L’algorithme est basé sur la détection de la communauté de chaque représentant communau- taire : nœud contenu dans une seule communauté et important en son sein. Cette détection est faite avec une approche à base de mesure de proximité développée récemment. Par comparaison avec d’autres méthodes de l’état de l’art nous montrons que notre algorithme a des performances équivalentes voire meilleures et est capable de traiter les plus grands graphes de terrain.

Emmanuel Orsini
6ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2015.
Détection de communautés dans les flots de liens par optimisation de la modularité

L’article qui suit propose de donner un sens à la modularité dans les flots de liens et ainsi de bénéficier de certaines de ses propriétés, et des heuristiques qui l’optimisent. Cette no- tion de modularité aboutira après quelques simplifications à un algorithme capable de calculer une partition sur un jeu de données de 400 000 emails. Pour ce faire on construira une nou- velle modélisation où le temps est complètement continu, sur laquelle la modularité se définit naturellement et de manière pertinente. Cette modélisation apporte une nouvelle interpretation des réseaux dynamiques, qui se veut suffisamment générale pour s’adapter à différents types de données.

Raphaël Fournier and Matthieu Latapy
International Journal of Internet Science ARTICLE IN PRES S 2015, 10 (1), ISSN 1662-5544
Temporal Patterns of Pedophile Activity in a P2P Network: First Insights about User Profiles from Big Data

Recent studies have shown that child abuse material is shared through peer-to-peer (P2P) networks, which allow users to exchange files without a central server. Obtaining knowledge on the extent of this activity has major consequences for child protection, policy making and Internet regulation. Previous works have developed tools and analyses to provide overall figures in temporally-limited measurements. Offenders' behavior is mostly studied through small-scale interviews and there is few information on the times at which they engage in such activity. Here we show that the proportion of search-engine queries for pedophile content gradually has grown by a factor of almost 3 in three years. We also find that during the day, certain hours are, on average, privileged by seekers. Our results demonstrate that P2P networks are actively used to search for pedophile content and we find new and large-scale results on pedophile offenders' profile, indicating that a substantial proportion is well-integrated into family life and professional work activities.

Tiphaine Viard, Matthieu Latapy and Clémence Magnien
Theoretical Computer Science (TCS), Volume 609, Part 1, 4 January 2016, Pages 245–252. DOI: 10.1016/j.tcs.2015.09.030
Computing maximal cliques in link streams

A link stream is a collection of triplets (t, u, v) indicating that an interaction occurred between u and v at time t. We generalize the classical notion of cliques in graphs to such link streams: for a given ∆, a ∆-clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least once during each sub-interval of duration ∆. We propose an algorithm to enumerate all maximal (in terms of nodes or time interval) cliques of a link stream, and illustrate its practical relevance on a real-world contact trace.

Clémence Magnien and Fabien Tarissan.
In proceedings of the International Symposium on Foundations and Applications of Big Data Analytics (FAB), in conjunction with ASONAM, 2015.
Time Evolution of the Importance of Nodes in dynamic Networks

For a long time now, researchers have worked on defining different metrics able to characterize the importance of nodes in networks. Among them,
centrality measures have proved to be pertinent as they relate the position of a node in the structure to its ability to diffuse an information efficiently. The case of dynamic networks, in which nodes and links appear and disappear over time, led the community to propose extensions of those classical measures. Yet, they do not investigate the fact that the network structure evolves and that node importance may evolve accordingly. In the present paper, we propose temporal extensions of notions of centrality, which take into account the paths existing at any given time, in order to study the time
evolution of nodes’ importance in dynamic networks. We apply this to two datasets and show that the importance of nodes does indeed vary greatly with time. We also show that in some cases it might be meaningless to try to identify nodes that are consistently important over time, thus strengthening the interest of temporal extensions of centrality measures.

Nicolas Dugué, Anthony Perez, Maximilien Danisch, Florian Bridoux, Amélie Daviau, Tennessy Kolubako, Simon Munier and Hugo Durbano.
IEEE/ACM International Conference on Advances in Social Network Analysis and Mining (ASONAM), 2015, Paris. (Demo track paper)
A reliable and evolutive web application to detect social capitalists

On Twitter, social capitalists use dedicated hashtags and mutual subscriptions to each other in order to gain followers and to be retweeted. Their methods are successful enough to make them appear as influent users. Indeed, applications dedicated to the influence measurement such as Klout and Kred give high scores to most of these users. Meanwhile, their high number of retweets and followers are not due to the relevance of the content they tweet, but to their social capitalism techniques. In order to be able to detect these users, we train a classifier using a dataset of social capitalists and regular users. We then implement this classifier in a web application that we call DDP. DDP allows users to test whether a Twitter account is a social capitalist or not and to visualize the data we use to make the prediction. DDP allows administrator to crawl data from a lot of users automatically. Furthermore, administrators can manually label Twitter accounts as social capitalists or regular users to add them into the dataset. Finally, administrators can train new classifiers in order to take into account the new Twitter accounts added to the dataset, and thus making evolve the classifier with these new recently collected data. The web application is thus a way to collect data, make evolve the knowledge about social capitalists and to keep detecting them efficiently.

Tiphaine Viard, Matthieu Latapy, Clémence Magnien
First International Workshop on Dynamics in Networks (DyNo), in conjunction with ASONAM, 2015.
Revealing contact patterns among high-school students using maximal cliques in link streams

Interaction traces between humans are usually rich in information concerning the patterns and habits of individuals. Such datasets have been recently made available, and more and more researchers address the new questions raised by this data. A link stream is a sequence of triplets (t, u, v) indicating that an interaction occurred between u and v at time t, and as such is a natural representation of these data. We generalize the classical notion of cliques in graphs to such link streams: for a given ∆, a ∆-clique is a set of nodes and a time interval such that all pairs of nodes in this set interact at least every ∆ during this time interval. We proceed to compute the maximal ∆-cliques on a real-world dataset of contact among students, and show how it can bring new interpretation to patterns of contact.

François Queyroi, Sergey Kirgizov
Information Processing Letters, Volume 115, Issue 9, September 2015, Pages 689–693 http://www.sciencedirect.com/science/article/pii/S0020019015000678
Suppression Distance Computation for Hierarchical Clusterings

We discuss  the computation  of a  distance between  two hierarchical clusterings of the  same set. It is defined as  the minimum number of  elements that  have to  be removed so  the remaining  clusterings are  equal. The problem of distance  computing was extensively studied for partitions. We prove it can be  solved in polynomial time in the case of hierarchies  as it gives  birth to a  class of perfect  graphs. We  also  propose an  algorithm  based on  recursively computing  maximum assignments.

Tiphaine Viard, Matthieu Latapy et Clémence Magnien
ALGOTEL 2015 -- 17èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications, juin 2015, Beaune, France
Calcul de cliques maximales dans les flots de liens

Un flot de liens est une séquence de triplets (t,u,v), signifiant que u et v ont interagi au temps t. Nous généralisons la notion de cliques à ces flots de liens : pour un delta donné, une delta-clique est un ensemble de nœuds et un intervalle de temps, tels que toutes les paires de nœuds dans cet ensemble interagissent au moins tous les delta sur cet intervalle. Nous proposons un premier algorithme permettant d’énumérer les delta-cliques dans un flot de liens.

Soumajit Pramanik, Maximilien Danisch, Qinna Wang and Bivas Mitra
[extended abstract] Twitter for Research, 1st International & Interdisciplinary Conference, 2015
An empirical approach towards an efficient “whom to mention?” Twitter app

We developed a Twitter app to suggest users to mention in a tweet in order to maximise the spread of an information. Users that are popular, active on Twitter and interested in the content of the tweet are targeted. The problem is mapped to the knapsack problem, the length of the screen name of a user being an important variable. Collected data (who retweets among the suggested users and features of these users) will be used to improve the app and theory/models of information spread on Online Social Networks. The application is available at: http://bit.ly/1BKZURE

Christophe Crespelle, Matthieu Latapy, Ha Duong Phan.
Discrete Applied Mathematics, Volume 195, 20 November 2015, Pages 59–73
On the Termination of Some Biclique Operators on Multipartite Graphs.

We define a new graph operator, called the weak-factor graph, which comes from the context of complex network modelling. The weak-factor operator is close to the well-known clique-graph operator but it rather operates in terms of bicliques in a multipartite graph. We address the problem of the termination of the series of graphs obtained by iteratively applying the weak-factor operator starting from a given input graph. As for the clique-graph operator, it turns out that some graphs give rise to series that do not terminate. Therefore, we design a slight variation of the weak-factor operator, called clean-factor, and prove that its associated series terminates for all input graphs. In addition, we show that the multipartite graph on which the series terminates has a very nice combinatorial structure: we exhibit a bijection between its vertices and the chains of the inclusion order on the intersections of the maximal cliques of the input graph.

Noé Gaumont, François Queyroi, Clémence Magnien et Matthieu Latapy.

In Complex Networks VI (pp. 57-64). Springer International Publishing. 2015

Expected Nodes: a quality function for the detection of link communities

Many studies use community detection algorithms in order to understand complex networks. Most papers study node communities, i.e. groups of nodes, which may or may not overlap. A widely used measure to evaluate the quality of a community structure is the modularity. However, sometimes it is also relevant to study link partitions rather than node partitions. In order to evaluate a link partition, we propose a new quality function: Expected Nodes. Our function is based on the same inspiration as the modularity and compares, for a given link group, the number of incident nodes to the expected one. In this short note, we discuss the advantages and drawbacks of our quality function compared to other ones on synthetics graphs. We show that Expected Nodes is able to pass some fundamental sanity criteria and is the one that best identifies the most relevant partition in a more realistic context.

Noé Gaumont, François Queyroi
ALGOTEL 2014 -- 16èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications, Jun 2014, Le Bois-Plage-en-Ré, France. pp.1-4
Partitionnement des Liens d’un Graphe : Critères et Mesures

La recherche de communautés chevauchantes est un enjeu important pour l'analyse des réseaux complexes. Une piste souvent envisagée est la recherche d'un partitionnement des arêtes du graphe. L'évaluation de cette décomposition tient cependant rarement compte du fait que les communautés recherchées correspondent à des groupes d'arêtes. Nous discutons dans ce papier l'utilisation de nouveaux critères pouvant répondre à ce problème. Nous proposons de comparer le nombre de sommets incidents à un groupe d'arêtes au nombre attendu dans un graphe aléatoire. Un optimum local de la mesure dérivée de ce concept peut être obtenu par un algorithme glouton. Nous présentons les premiers résultats obtenus à travers une analyse de la mesure et des tests empiriques.

François Queyroi
5ème conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Paris, 2014.
∆-Duplication of Time-Varying Graphs

Nous présentons une transformation de graphes temporels, appelée ∆-duplication, permettant de réduire l’hétérogénéité temporelle dans l’analyse de réseaux dynamiques. Au lieu de construire une séquence d’instantanées à partir d’un découpage global du temps, nous utilisons une approche centrée sur les individus en considérant un sommet sur plusieurs sessions i.e. des périodes durant lesquelles il se connecte au moins tous les ∆. Cette note décrit précisément le concept de ∆-duplication et fournit des pistes quant à son utilisation pour l’analyse de réseaux complexes. En particulier, nous proposons une généralisation du concept de k-cores aux graphes temporels.

Lionel Tabourier, Anne-Sophie Libert et Renaud Lambiotte
EGC 2015, 15ème conférence internationale sur l'extraction et la gestion des connaissances
RankMerging: Apprentissage supervisé de classements pour la prédiction de liens dans les grands réseaux sociaux

Trouver les liens manquants dans un grand réseau social est une tâche difficile, car ces réseaux sont peu denses, et les liens peuvent correspondre à des environnements structurels variés. Dans cet article, nous décrivons RankMerging, une méthode d’apprentissage supervisé simple pour combiner l’information obtenue par différentes méthodes de classement. Afin d’illustrer son intérêt, nous l’appliquons à un réseau d’utilisateurs de téléphones portables, pour montrer comment un opérateur peut détecter des liens entre les clients de ses concurrents. Nous montrons que RankMerging surpasse les méthodes à disposition pour prédire un nombre variable de liens dans un grand graphe épars.

Tackx Raphaël, Maximilien Danisch et Fabien Tarissan
In Acte de la 5ème Conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI'14), Paris, France, 2014.
Structures biparties et communautés recouvrantes des graphes de terrains

De nombreux réseaux rencontrés en pratique se prêtent naturellement à la formalisation sous forme de graphes pour analyser et modéliser leur structure. Cette représentation plate des réseaux s’est montrée cependant peu efficace pour rendre compte de propriétés importantes et non triviales liées à la structure bipartie des réseaux. Des travaux récents ont montré notamment que des propriétés de recouvrements semblaient être présentes dans la plupart des réseaux réels et qu’elles permettaient de mieux expliquer des propriétés observées sur dans les graphes simples. Le présent travail entend poursuivre cette problématique en étudiant les propriétés liées aux recouvrements dans les structures communautaire des réseaux sociaux. Nous conduisons pour cela une étude basée sur le réseau des pages et catégories WIKIPEDIA et nous montrons notamment que parmi les métriques proposées récemment pour rendre compte de ces recouvrements complexes entre communautés, le coefficient de redondance était plus pertinent que le populaire coefficient de clustering biparti étudié généralement en pratique.

Fabien Tarissan
In ISCS 2014: Interdisciplinary Symposium on Complex Systems, Emergence, Complexity and Computation, 14:309-318, Springer, 2014.
Comparing overlapping properties of real bipartite networks

Many real-world networks lend themselves to the use of graphs for analysing and modelling their structure. But such a simple representation has proven to miss some important and non trivial properties hidden in the bipartite structure of the networks. Recent papers have shown that overlapping properties seem to be present in bipartite networks and that it could explain better the properties observed in simple graphs. This work intends to investigate this question by studying two proposed metrics to account for overlapping structures in bipartite networks. The study, conducted on four dataset stemming from very different contexts (computer science, juridical science and social science), shows that the most popular metrics, the clustering coefficient, turns out to be less relevant that the recent redundancy coefficient to analyse intricate overlapping properties of real networks.

Romain Hollanders, Daniel Bernardes, Bivas Mitra, Raphael Jungers, Jean-Charles Delvenne, Fabien Tarissan.
In Journal of Network Science, 2(3):341-266, Cambridge University Press, 2014.
Data-driven traffic and diffusion modeling in peer-to-peer networks : A real case study.

Peer-to-peer (p2p) systems have driven a lot of attention in the past decade as they have become a major source of Internet traffic. The amount of data flowing through the p2p network is huge and hence challenging both to comprehend and to control. In this work, we take advantage of a new and rich dataset recording p2p activity at a remarkable scale to address these difficult problems. After extracting the relevant and measurable properties of the network from the data, we develop two models that aim to make the link between the low-level properties of the network, such as the proportion of peers that do not share content (i.e., free riders) or the distribution of the files among the peers, and its high-level properties, such as the Quality of Service or the diffusion of content, which are of interest for supervision and control purposes. We observe a significant agreement between the high-level properties measured on the real data and on the synthetic data generated by our models, which is encouraging for our models to be used in practice as large-scale prediction tools. Relying on them, we demonstrate that spending efforts to reduce the amount of free-riders indeed helps to improve the availability of files on the network. We observe however a saturation of this phenomenon after 65% of free-riders.

Matthieu Latapy, Elie Rotenberg, Christophe Crespelle, Fabien Tarissan
13th IFIP International Conference on Networking - Networking 2014, 2014, Trondheim, Norway. IEEE, pp.1-9
Measuring the Degree Distribution of Routers in the Core Internet

Most current models of the internet rely on knowledge of the degree distribution of its core routers, which plays a key role for simulation purposes. In practice, this distribution is usually observed directly on maps known to be partial, biased and erroneous. This raises serious concerns on the true knowledge one may have of this key property. Here, we design an original measurement approach targeting reliable estimation of the degree distribution of core routers, without resorting to any map. It consists in sampling random core routers and precisely estimate their degree thanks to probes sent from many distributed monitors. We run and assess a large-scale measurement following this approach, carefully controlling and correcting bias and errors encountered in practice. The estimate we obtain is much more reliable than previous knowledge, and it shows that the true degree distribution is very different from all current assumptions.

Fabien Tarissan, Elie Rotenberg, Matthieu Latapy, Christophe Crespelle
IEEE 22nd International Symposium on Modeling Analysis and Simulation of Computer and Telecomunication Systems (MASCOTS'14), At Paris, France
UDP Ping: a dedicated tool for improving measurements of the Internet topology

The classical approach for Internet topology measurement consists in distributively collecting as much data as possible and merging it into one single piece of topology on which are conducted subsequent analysis. Although this approach may seem reasonable, in most cases network measurements performed in this way suffer from some or all of the following limitations: they give only partial views of the networks under concern, these views may be intrinsically biased, and they contain erroneous data due to the measurement tools. Here we present a new tool, named UDP Ping , that relies on a very different approach for the measurement of the Internet topology. Its basic principle is to measure the interface of a given target directed toward a monitor which sends the measurement probe. We demonstrate how to use it to deploy real world-wide measurements that provide reliable (i.e. bias and error free) knowledge of the Internet topology, namely the degree distribution of routers in the core Internet in our example.

Maximilien Danisch, Nicolas Dugué, Anthony Perez

MARAMI 2014

Prendre en compte le capitalisme social dans la mesure de l’influence sur Twitter

L’influence sur Twitter est un sujet particulièrement discuté avec l’explosion de l’utilisation de ce service de micro-blogging. En effet, afin de fouiller efficacement dans la masse de tweets produite par les millions d’utilisateurs de Twitter, de déterminer les tendances et les informations pertinentes, il est important de pouvoir détecter les utilisateurs influents. Ainsi, plusieurs outils fournissant un score d’influence ont été proposés et font référence. Cependant, les algorithmes utilisés par les sociétés qui les développent restent secrets. Dans des travaux récents, il a été montré que des comptes automatiques peuvent obtenir des scores élevés sans raison. De façon à étendre et compléter ces travaux, nous montrons que ces outils sont incapables de distinguer les utilisateurs réels de ceux appelés capitalistes sociaux, qui obtiennent à tort des scores d’influence élevés. Afin de résoudre ce problème, nous définissons un classifieur qui réalise cet objectif et rétablit ainsi des scores réalistes pour les capitalistes sociaux. Pour réaliser ce classifieur, nous avons réuni un jeu de données contenant des exemples de capitalistes sociaux et d’utilisateurs réguliers du réseau ainsi que leurs informations de profils et d’utilisation. Pour finir, nous avons développé une application en ligne qui utilise ce classifieur.

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand
Proceedings of the 8th IEEE International Conference on Research Challenges in Information Science (RCIS 2014)
On the Use of Intrinsic Time Scale for Dynamic Community Detection and Visualization in Social Networks

The analysis of social networks is a challenging research area, in particular because of their dynamic features. In this paper, we study such evolving graphs through the evolution of their community structure. More specifically, we build on existing approaches for the identification of stable communities over time. This paper presents two contributions. We first propose a new way to compute such stable communities, using a different time scale, called intrinsic time. This intrinsic time is related to the dynamics of the graph (e.g., in terms of link appearance or disappearance) and independent from traditional (extrinsic) time units, like the second. We then show how visualization both at intrinsic and extrinsic time scales can help validating and interpreting the obtained communities. Our results are illustrated on a social network made of contacts among the participants of the 2006 edition of the Infocom conference.

Lionel Tabourier, Anne-Sophie Libert, and Renaud Lambiotte
2014, DyNakII, 2nd International Workshop on Dynamic Networks and Knowledge Discovery (PKDD 2014 workshop)
RankMerging: Learning to rank in large-scale social networks

In this work, we consider the issue of unveiling unknown links in a social network, one of the difficulties of this problem being the small number of unobserved links in comparison of the total number of pairs of nodes. We define a simple supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. As an illustration, we apply the method to the case of a cell phone service provider, which uses the network among its contractors as a learning set to discover links existing among users of its competitors. We show that our method substantially improves the performance of unsupervised metrics of classification. Finally, we discuss how it can be used with additional sources of data, including temporal or semantic information

Qinna Wang

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

Link Prediction and Threads in Email Networks

We tackle the problem of predicting future links in dynamic networks. For this, we work with the Debian Mailing Lists. In this dataset, a user can post a question to the debian list and other users can reply it by email forming a thread. We show that the number of threads shared in the past between users is a better feature to predict future email exchanges than classical features, like the number of common neighbors. We also show that the structure of a thread do not match the traditional definition of a community, particularly a thread does not have many triangles and has many outgoing connections. While the number of shared (detected) communities is also a better feature to predict future email exchanges than traditional features, is not as good as the number of shared threads. We believe our work should raise interests in characterizing and detecting thread-like structures in dynamic networks.

Maximilien Danisch, Nicolas Dugué and Anthony Perez

2014 International Conference on Behavioral, Economic, and Socio-Cultural Computing (BESC 2014)

On the importance of considering social capitalism when measuring influence on Twitter

Influence on Twitter has drawn a lot of attention these past few years since this microblogging service is used to share, seek or debate about any kind of information. Several tools providing so-called influential scores have thus been proposed. However, the algorithms behind them are kept secret and it is not clear how they consider influence. Yet, many users rely on such tools to evaluate and even try to improve their influence in the Twitter network. In a recent work, it has been shown that automatic accounts can obtain high influential scores with no intuitive reason. Extending and completing this work, we show that such measures fail at distinguishing so-called social capitalists from real, truthful users. This enlights the fact that actual scores do not seem to consider the way followers and interactions are obtained on the network. To overcome this issue, we define a classifier that discriminates social capitalists from truthful users. To that aim, we crawled the Twitter network to gather examples of certified social capitalists and regular users and obtained features related to the profile and behavior of each user. We then use such a classifier to balance Klout’s score to adjust influential scores. We also developed an application that allows using our classifier online. We believe our work should raise the question of the legitimacy of influence on Twitter, and lead to significant improvements in the way it is measured.

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

2014 International Conference on Data Science and Advanced Analytics (DSAA2014)

Learning a Proximity Measure to Complete a Community

In large-scale online complex networks (Wikipedia, Facebook, Twitter, etc.) finding nodes related to a specific topic is a strategic research subject. This article focuses on two central notions in this context: communities (groups of highly connected nodes) and proximity measures (indicating whether nodes are topologically close). We propose a parameterized proximity measure which, given a set of nodes belonging to a community, learns the optimal parameters and identifies the other nodes of this community, called multi-ego-centered community as it is centered on a set of nodes. We validate our results on a large dataset of categorized Wikipedia pages and on benchmarks, we also show that our approach performs better than existing ones. Our main contributions are (i) a new ergonomic parametrized proximity measure, (ii) the automatic tuning of the proximity’s parameters and (iii) the unsupervised detection of community boundaries.

Darko Obradović, Maximilien Danisch

The 6th International Conference on Computational Aspects of Social Networks (CASON2014)

Direct Generation of Random Graphs Exactly Realising a Prescribed Degree Sequence

This paper intends to extend the possibilites available to researchers for the evaluation of directed networks with the use of randomly generated graphs. The direct generation of a simple network with a prescribed degree sequence still seems to be an open issue, since the prominent configuration model usually does not realise the degree distribution exactly. We propose such an algorithm using a heuristic for node prioritisation. We demonstrate that the algorithm samples approximately uniformly. In comparison to the switching Markov Chain Monte Carlo algorithms, the direct generation of edges allows an easy modification of the linking behaviour in the random graph, introducing for example degree correlations, mixing patterns or community structure. That way, more specific random graphs can be generated (non-uniformly) in order to test hypotheses on the question, whether specific network features are due to a specific linking behaviour only. Or it can be used to generate series of synthetic benchmark networks with a specific community structure, including hierarchies and overlaps.

Raphaël Fournier, Maximilien Danisch

IEEE International Conference on Research Challenges in Information Science 2014 (RCIS2014)

Mining bipartite graphs to improve semantic paedophile activity detection

Peer-to-peer (P2P) networks are popular to exchange large volumes of data through the Internet. Pedophile activity is a very important topic for our society and some< works have recently attempted to gauge the extent of pedophile exchanges on P2P networks. A key issue is to obtain an efficient detection tool, which may decide if a sequence of keywords is related to the topic or not. We propose to use social network analysis in a large dataset from a P2P network to improve a state-of-the-art filter for pedophile queries. We obtain queries and thus combinations of words which are not tagged by the filter but should be. We also perform some experiments to explore if the original four categories of paedophile queries were to be found by topological measures only.

Raphaël Fournier, Thibault Cholez, Matthieu Latapy, Isabelle Chrisment, Clémence Magnien, Olivier Festor and Ivan Daniloff

Soc. Sci. 2014, 3(3), 314-325

Comparing Pedophile Activity in Different P2P Systems

Peer-to-peer (P2P) systems are widely used to exchange content over the Internet. Knowledge of pedophile activity in such networks remains limited, despite having important social consequences. Moreover, though there are different P2P systems in use, previous academic works on this topic focused on one system at a time and their results are not directly comparable. We design a methodology for comparing KAD and eDonkey, two P2P systems among the most prominent ones and with different anonymity levels. We monitor two eDonkey servers and the KAD network during several days and record hundreds of thousands of keyword-based queries. We detect pedophile-related queries with a previously validated tool and we propose, for the first time, a large-scale comparison of pedophile activity in two different P2P systems. We conclude that there are significantly fewer pedophile queries in KAD than in eDonkey (approximately 0.09% vs. 0.25%).

James Abello and François Queyroi
Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.191
Network Decomposition into Fixed Points of Degree Peeling

Degree peeling is used to study complex networks. It is a decomposition of the network into vertex groups of increasing minimum degree. However, the peeling value of a vertex is non-local in this context since it relies on the number of connections the vertex has to groups above it. We explore a different way to decompose a network into edge layers such that the local peeling value of the vertices on each layer does not depend on their non-local connections with the other layers. This corresponds to the decomposition of a graph into subgraphs that are invariant with respect to degree peeling, i.e. they are fixed points. We introduce a general method to partition the edges of an arbitrary graph into fixed points of degree peeling, called the iterative-edge-core decomposition. Information from this decomposition is used to formulate a novel notion of vertex diversity based on Shannon’s entropy. We illustrate the usefulness of this decomposition on a variety of social networks including weighted graphs. Our method can be used as a preprocessing step for community detection and graph visualization.

Tiphaine Viard and Matthieu Latapy
NetSciCom 2014
Identifying Roles in an IP Network with Temporal and Structural Density

Captures of IP traffic contain much information on very different kinds of activities like file transfers, users interacting with remote systems, automatic backups, or distributed computations. Identifying such activities is crucial for an appropriate analysis, modeling and monitoring of the traffic. We propose here a notion of density that captures both temporal and structural features of interactions, and generalizes the classical notion of clustering coefficient. We use it to point out important differences between distinct parts of the traffic, and to identify interesting nodes and groups of nodes in terms of roles in the network.

Élie Rotenberg, Christophe Crespelle and Matthieu Latapy
NetSciCom 2014
Measuring Routing Tables in the Internet

The most basic function of an Internet router is to decide, for a given packet, which of its interfaces it will use to forward it to its next hop. To do so, routers maintain a routing table, in which they look up for a prefix of the destination address. The routing table associates an interface of the router to this prefix, and this interface is used to forward the packet. We explore here a new measurement method based upon distributed UDP probing to estimate this routing table for Internet routers.

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand

in "Complex Networks", pages 76-111, H. Cherifi (ed), Cambridge Scholars Publishing, 2014

Multi-ego-centred communities

The community structure of a graph is defined in various ways in the literature:

  • partition, where nodes can belong to only one community. This vision is unrealistic and may lead to poor results because most nodes belong to several communities in real-world networks;
  • overlapping community structure, which is the most natural view, but is often very difficult to identify in practice due to the complex structure of real-world networks and the huge potential number of such communities;
  • egocentered community structure which focuses on individual nodes' communities and seems to be a good compromise.

In this chapter, the third vision is investigated; a new proximity measure based on opinion dynamics is proposed to score and select nodes according to their proximity to a node of interest. We call it the carryover opinion. In addition to be parameter-free, the carryover opinion can be calculated in a very time-efficient way and can thus be used in very large graphs.
We also go further in the idea of egocentered communities by introducing the new concept of multi-egocentered communities, i.e., focusing on the communities of a set of nodes rather than of a single node.
A key idea is that, although one node generally belongs to numerous communities, e.g., friends, colleagues, family, a small set of appropriate nodes can fully characterize a single community. We also show how to unfold all egocentered communities of a given node using this notion of multi-egocentered community.

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand
Social Network Analysis and Mining, Springer, 2014, 4 (1), pp.180.
Multi-ego-centered communities in practice

We propose here a framework to unfold the ego-centered community structure of a given node in a network. The framework is not based on the optimization of a quality function, but on the study of the irregularity of the decrease of a proximity measure. It is a practical use of the notion of multi-ego-centered community and we validate the pertinence of the approach on benchmarks and a real-world network of wikipedia pages.

Matthieu Latapy, Assia Hamzaoui, Clémence Magnien

Journal of Complex Networks 2(1): 38-59, 2014

Detecting events in the dynamics of ego-centred measurements of the Internet topology

Detecting events such as major routing changes or congestions in the dynamics of the Internet topology is an important but challenging task. We explore here an empirical approach based on a notion of statistically significant events. It consists in identifying properties of graph dynamics which exhibit a homogeneous distribution with outliers, corresponding to events. We apply this approach to ego-centred measurements of the  Internet topology (views obtained from a single monitor) and show that it succeeds in detecting meaningful events. Finally, we give some hints for   the interpretation of detected events in terms of network operations.

Daniel Bernardes, Matthieu Latapy, Fabien Tarissan
In Journal of Social Network Analysis and Mining, 3(4):1195-1208,Springer, 2013.
Inadequacy of SIR Model to Reproduce Key Properties of Real-world Spreading Phenomena: Experiments on a Large-scale P2P System

Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We reconstruct the underlying social network of peers sharing content and perform simulations on it to assess the relevance of the standard SIR model to mimic key properties of real spreading cascades. First we examine the impact of the network topology on observed properties. Then we turn to the evaluation of two heterogeneous extensions of the SIR model. Finally we improve the social network reconstruction, introducing an affinity index between peers, and simulate a SIR model which integrates this new feature. We conclude that the simple, homogeneous model is insufficient to mimic real spreading cascades. Moreover, none of the natural extensions of the model we considered, which take into account extra topological properties, yielded satisfying results in our context. This raises an alert against the careless, widespread use of this model.

Aurélie Faure de Pebeyre, Fabien Tarissan et Julien Sopena
IFIP Wireless Days conference (WD'13), Valencia, Spain, 2013
On the relevance of the edge-Markovian evolving graph model for real mobile networks

The development of wireless devices led the scientific community to focus more and more on systems of interaction composed of moving entities. In this context, different models have been proposed in an attempt to capture properties of the observed dynamics. Among those models, the edge-Markovian evolving graph model is appealing since it enables to highlight temporal dependencies in the evolution of the graphs. This model relies on two parameters accounting respectively for the creation and suppression of links in the graph. Thus it assumes that these two parameters are sufficient to characterise the dynamics during all the evolution of the graph. In this paper, we test this hypothesis by confronting the model to 6 datasets recording real traces of evolving networks. In particular, we study the proportion of created and deleted links over the time. The results show that 5 of the 6 case studies present an heterogeneous distribution of those fractions which contradicts the underlying hypothesis of the model. Besides, in order to understand the importance this might have as regard structural properties of real networks, we also study the impact the Markovian model has on the mean degree over the time. It turns out that even in the suitable case, the model fails to reproduce correctly this property which indicates its inadequacy for even more complex properties of real evolving networks

Dimitri Papadimitriou, Davide Careglio, Fabien Tarissan and Piet Demeester
Proceedings of the International Workshop on Reliable Networks Design and Modeling (RNDM'13), 2013 (Invited paper)
Method of reliability and availability analysis – From the dynamic properties of routing and forwarding paths

Confronted to the increasing dynamic of Internet routing system and its underlying topology, we propose a reliability and availability analysis method based on the characterization of the dynamic properties (in particular, the stability properties) of routing paths and their corresponding forwarding paths. The key driver underlying this method is that transient but frequent changes in the spatio-temporal properties of routing paths can affect the performance and operating conditions of the corresponding forwarding paths; hence, their reliability. The results obtained by means of this method verify that, although the main cause of instability results from the forwarding plane dynamics, a second order effect relates forwarding and routing path instability events. Applying our analysis method reveals that the dominant source (main cause) of instability originates indeed from the forwarding plane. This result which confirms previous studies from 2003 further corroborates the assumption that the dynamic properties of routing system are mainly driven by its adaptation to the forwarding system (adaptive routing). However, even if the likelihood of forwarding instability becomes the prominent behavior (cause), about 50% of them induce routing path instability whereas the corresponding forwarding path remains unstable. This observation suggests that 50% of the reactive decisions performed by the BGP routing system (reactive routing) tend to further delay convergence of the forwarding paths. In turn, this observation provides the first indication that simple causal effects can’t explain anymore the occurrence of instability. Moreover, more elaborated analysis techniques (such as the one proposed in this paper) are required to explain the inter-dependent routing and forwarding paths dynamics which affects their reliability and availability.

Émilie Coupechoux and Fabien Tarissan
4ème Journées Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI'13), 2013
Un modèle pour les graphes bipartis aléatoires avec redondance

Current models of random graphs do not capture all the properties observed in realworld networks. In particular, two cliques in such models generally do not have more than one node in common, while it is intuitive that, in social networks for instance, two friends have more than one interest in common. The model presented here aims at capturing this kind of property. More precisely, we present a model for random tripartite graphs such that the bipartite projection has degree and redundancy distributions close to those of a given bipartite graph.

Aurélie Faure de Pebeyre, Fabien Tarissan and Julien Sopena
In 4ème Journées Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI'13), 2013
Évaluation du modèle évolutif par arête-markovienne pour reproduire la dynamique des réseaux mobiles

L’avènement des équipements mobiles a amené la communauté scientifique à étudier plus intensément les systèmes d’interactions formés par des entités en mouvement. Dans ce contexte, plusieurs modèles ont été proposés pour tenter de capturer les propriétés dynamiques de tels systèmes. Parmi ceux-ci, le modèle de graphes évolutifs à arêtes markoviennes est attirant en ce qu’il met en avant les dépendances temporelles dans un graphe dynamique. Ce modèle repose sur l’identification de deux paramètres régissant respectivement l’apparition et la disparition des liens dans le graphe et fait donc l’hypothèse que ces deux paramètres sont suffisants pour caractériser cette dynamique sur l’ensemble de la durée de vie du graphe. Dans cet article nous testons la pertinence de cette hypothèse par rapport à 6 jeux de données réelles. Pour se faire, nous avons étudié la fraction de liens créés et supprimés au cours du temps. Les résultats montrent que dans 5 cas sur les 6 étudiés, la répartition de ces fractions est hétérogène, ce qui contredit l’hypothèse faite par le modèle. De plus, nous avons regardé l’impact que le modèle markovien avait sur le degré moyen des nœuds au cours du temps. Il s’avère que même dans le jeu de données favorable au modèle, ce dernier échoue à rendre compte du comportement des réseaux dynamiques de façon satisfaisante.

Romain Campigotto, Jean-Loup Guillaume and Massoud Seifi
Proceedings of the 5th IEEE/ACM International Conference on Advances in Social Networks and Mining (ASONAM 2013). Niagara Falls, Canada.
The Power of Consensus: Random Graphs Have No Communities

Communities are a powerful tool to describe the structure of complex networks. Algorithms aiming at maximizing a quality function called modularity have been shown to effectively compute the community structure. However, some problems remain: in particular, it is possible to find high modularity partitions in graph without any community structure, in particular random graphs. In this paper, we study the notion of consensual communities and show that they do not exist in random graphs. For that, we exhibit a phase transition based on the strength of consensus: below a given threshold, all the nodes belongs to the same consensual community; above this threshold, each node is in its own consensual community.

Clémence Magnien, Amélie Medem, Sergey Kirgizov, Fabien Tarissan
Networking Science, 4 (1-4), p. 24-33, 2013
Towards realistic modeling of IP-level routing topology dynamics

Many works have studied the Internet topology, but few have investigated the question of how it evolves over time. This paper focuses on the Internet routing IP-level topology and proposes a first step towards realistic modeling of its dynamics. We study periodic measurements of routing trees from a single monitor to a fixed destination set and identify invariant properties of its dynamics. Based on those observations, we then propose a model for the underlying mechanisms of the topology dynamics. Our model remains simple as it only incorporates load-balancing phenomena and routing changes. By extensive simulations,  we show that, despite its simplicity, this model effectively captures the observed behaviors, thus providing key insights of relevant mechanisms governing the Internet routing dynamics. Besides, by confronting simulations over different kinds of topology, we also provide insights of which structural properties play a key role to explain the properties of the observed dynamics, which therefore strengthens the relevance of our model.

Fabien Tarissan and Raphaëlle Nollez-Goldbach
ISCS 2013: Interdisciplinary Symposium on Complex Systems, Emergence, Complexity and Computation, 8:225-264, Springer, 2013.
The network of the International Criminal Court decisions as a complex system

Many real-world networks lend themselves to the use of graphs for analysing and modeling their structure. This approach has proved to be very useful for a wide variety of networks stemming from very different fields. Yet, only few papers focused their attention on legal networks. This paper intends precisely to remedy this situation by analysing a major legal network by means of complex system methods. The network under investigation is the network composed by decisions taken by the International Criminal Court since its creation. We first model the network by a simple directed graph in which nodes are the decisions and links represent citations between decisions. Our analysis shows that standard properties shared by common real networks are also present in this network. Then we turn to studying the network by means of bipartite graphs that involve both decisions and articles of law. We show that this two-level structure presents several non trivial properties and we show evidences of the relevance of the bipartite representation to explain properties observed in the graph of citations.

Alice Albano, Jean-Loup Guillaume, Sébastien Heymann and Bénédicte Le Grand
Proceedings of the 2013 IEEE/ACM International Conference on Advances  n Social Networks Analysis and Mining (ASONAM 2013), Niagara Falls, Canada
A Matter of Time – Intrinsic or Extrinsic – for Diffusion in Evolving Complex Networks

Diffusion phenomena occur in many kinds of real-world complex networks, e.g., biological, information or social networks. Because of this diversity, several types of diffusion models have been proposed in the literature: epidemiological models, threshold models, innovation adoption models, among others. Many studies aim at investigating diffusion as an evolving phenomenon but mostly occurring on static networks, and much remains to be done to understand diffusion on evolving networks. In order to study the impact of graph dynamics on diffusion, we propose in this paper an innovative approach based on a notion of intrinsic time, where the time unit corresponds to the appearance of a new link in the graph. This original notion of time allows us to isolate somehow the diffusion phenomenon from the evolution of the network. The objective is to compare the diffusion features observed with this intrinsic time concept from those obtained with traditional (extrinsic) time, based on seconds. The comparison of these time concepts is easily understandable yet completely new in the study of diffusion phenomena. We experiment our approach on synthetic graphs, as well as on a dataset extracted from the Github sofware sharing platform.

Fabien Tarissan, Bruno Quoitin, Pascal Mérindol, Benoit Donnet, Matthieu Latapy et Jean-Jacques Pansiot
In Journal of Computer Networks, 57(11):2331-2347, Elsevier, 2013.
Towards a Bipartite Graph Modeling of the Internet Topology

Modeling the properties of the Internet topology aims at generating large scale artificial IP networks that mimic properties of real ones for simulation purposes. Current models typ- ically consider the Internet as a simple graph where edges are point-to-point connections between routers. This approach does not take into account point-to-multipoint connec- tions that exist at lower layers in the network, e.g. layer-2 clouds, such as Ethernet switches or MPLS networks. Instead, such physical point-to-multipoint connections are modeled as several logical IP level point-to-point connections. In this paper, we rely on recent developments in topology discovery based on IGMP probing that allows for revealing part of the network’s layer-2 structure. We take advantage of this additional knowledge for proposing an Internet model based on bipartite graphs considering both point-to-point and point-to-multipoint connections. Our model remains simple: it only takes as input the node degree sequence for both layer-2 and layer-3 nodes, randomly generates a bipartite graph respecting those distri- butions, and then derives the corresponding layer-3 topology. We show that, despite the simplicity of our model, realistic network properties, such as high local density, emerge naturally. This is in contrast with the now common belief that such properties can only appear with more intricate models or if explicitly injected in random models. Besides, we also provide evidences of how the analysis performed at the bipartite level might shed light on important properties of the real network structure. Finally, we propose and evaluate a bipartite graph generator based on our model that only takes two synthetic node degree distributions as input.

Maximilien Danisch, Jean-Loup Guillaume and Bénédicte Le Grand
CompleNet 2013, Berlin
Unfolding ego-centered community structures with “a similarity approach”

We propose a framework to unfold the ego-centered community structure of a given node in a network. The framework is not based on the optimization of a quality function, but on the study of the irregularity of the decrease of a similarity measure. It is a practical use of the notion of multi-ego-centered community and we validate the pertinence of the approach on a real-world network of wikipedia pages.

Dimitri Papadimitriou, Davide Careglio, Fabien Tarissan and Piet Demeester
Proceedings of the the 9th International Conference on Design of Reliable Communication Networks (DRCN), Budapest, Hungary, 2013
Internet routing paths stability model and relation to forwarding paths

Analysis of real datasets to characterize the local stability properties of the Internet routing paths suggests that extending the route selection criteria to account for such property would not increase the routing path length. Nevertheless, even if selecting a more stable routing path could be considered as valuable from a routing perspective, it does not necessarily imply that the associated forwarding path would be more stable. Hence, if the dynamics of the Internet routing and forwarding system show different properties, then one can not straightforwardly derive the one from the other. If this assumption is verified, then the relationship between the stability of the forwarding path (followed by the traffic) and the corresponding routing path as selected by the path-vector routing algorithm requires further characterization. For this purpose, we locally relate, i.e., at the router level, the stability properties of routing path with the corresponding forwarding path. The proposed stability model and measurement results verify this assumption and show that, although the main cause of instability results from the forwarding plane, a second order effect relates forwarding and routing path instability events. This observation provides the first indication that differential stability can safely be taken into account as part of the route selection process.

J. Whitbeck and M. Dias de Amorim and V. Conan and J.-L. Guillaume
The 18th Annual International Conference on Mobile Computing and Networking, Mobicom'12, pp. 377-388
Temporal Reachability Graphs

While a natural fit for modeling and understanding mobile networks, time-varying graphs remain poorly understood. Indeed, many of the usual concepts of static graphs have no obvious counterpart in time-varying ones. In this paper, we introduce the notion of temporal reachability graphs. A (tau, sigma)-reachability graph is a time-varying directed graph derived from an existing connectivity graph. An edge exists from one node to another in the reachability graph at time t if there exists a journey (i.e., a spatiotemporal path) in the connectivity graph from the first node to the second, leaving after t, with a positive edge traversal time  tau, and arriving within a maximum delay sigma. We make three contributions. First, we develop the theoretical framework around temporal reachability graphs. Second, we harness our theoretical findings to propose an algorithm for their efficient computation. Finally, we demonstrate the analytic power of the temporal reachability graph concept by applying it to synthetic and real-life data sets. On top of defining clear upper bounds on communication capabilities, reachability graphs highlight asymmetric communication opportunities and offloading potential.

M. Danisch, J.-L. Guillaume and B. Le Grand
Int. J. of Web Based Communities, Vol. 9, No. 3, pp. 299-322, 2012
Towards multi-ego-centered communities: a node similarity approach

The community structure of a graph is defined in various ways in the literature: (i) Partition, where nodes can belong to only one community. This vision is unrealistic and may lead to poor results because most nodes belong to several communities in real-world networks. (ii) Overlapping community structure, which is the most natural view, but is often very difficult to identify in practice due to the complex structure of real-world networks, and the huge number of such possible communities. (iii) Ego-centered community which focuses on individual nodes' communities and seems to be a good compromise. In this paper we investigate the third vision; we propose a new similarity measure between nodes based on opinion dynamics to unfold ego-centered communities. We call it the carryover opinion. In addition to be parameter-free, the carryover opinion can be calculated in a very time-efficient way and can thus be used in huge graphs. We also go further in the idea of ego-centered communities by introducing the new concept of multi-ego-centered communities, i.e., focusing on the communities of a set of nodes rather than of a single node. A key idea is that, although one node generally belongs to numerous communities, a small set of appropriate nodes can fully characterize a single community.

Abdelhamid Salah Brahim, Bénédicte Le Grand and Matthieu Latapy
Parallel Processing Letters 22(1): (2012)
Diffusion Cascades: Spreading Phenomena in Blog Network Communities

A diffusion cascade occurs when information spreads from one node to the rest of the network through a succession of diffusion events. So far diffusion phenomena have been mostly considered at a macroscopic scale i.e. by studying all nodes of the network. We give a complementary way to analyse network interactions by considering the problem at different scales. To that purpose, we use the community structure of the network to characterize diffusion between nodes (and between communities) and to identify interactions behaviour patterns.

Xiaomin Wang, Matthieu Latapy, Michèle Soria
International Journal of Computer Networks & Communications (IJCNC), May 2012, Volume 4. Number 3
Deciding on the type of the degree distribution of a graph from traceroute-like measurements

The degree distribution of the Internet topology is considered as one of its main properties. However, it is only known through a measurement procedure which gives a biased estimate. This measurement may in first approximation be modeled by a BFS (Breadth-First Search) tree. We explore here our ability to infer the type (Poisson or power-law) of the degree distribution from such a limited knowledge. We design procedures which estimate the degree distribution of a graph from a BFS of it, and show experimentally (on models and real-world data) that this approach succeeds in making the difference between Poisson and power-law degree distributions.

Daniel F. Bernardes, Matthieu Latapy, Fabien Tarissan
Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey
Relevance of SIR Model for Real-world Spreading Phenomena: Experiments on a Large-scale P2P System

Understanding the spread of information on complex networks is a key issue from a theoretical and applied perspective. Despite the effort in developing theoretical models for this phenomenon, gauging them with large-scale real-world data remains an important challenge due to the scarcity of open, extensive and detailed data. In this paper, we explain how traces of peer-to-peer file sharing may be used to this goal. We also perform simulations to assess the relevance of the standard SIR model to mimic key properties of spreading cascade. We examine the impact of the network topology on observed properties and finally turn to the evaluation of two heterogeneous versions of the SIR model. We conclude that all the models tested failed to reproduce key properties of such cascades: typically real spreading cascades are relatively “elongated” compared to simulated ones. We have also observed some interesting similarities common to all SIR models tested.

Sébastien Heymann, Matthieu Latapy, Clémence Magnien
Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012), Istanbul, Turkey
Outskewer: Using Skewness to Spot Outliers in Samples and Time Series

Finding outliers in datasets is a classical problem of high interest for (dynamic) social network analysis. However, most methods rely on assumptions which are rarely met in practice, such as prior knowledge of some outliers or about normal behavior. We propose here Outskewer, a new approach based on the notion of skewness (a measure of the symmetry of a distribution) and its evolution when extremal values are removed one by one. Our method is easy to set up, it requires no prior knowledge on the system, and it may be used on-line. We illustrate its performance on two data sets representative of many use-cases: evolution of ego-centered views of the internet topology, and logs of queries entered into a search engine.

Tabourier, L. and Stoica, A. and Peruani, F.
Communication Systems and Networks (COMSNETS), 2012 Fourth International Conference on
How to detect causality effects on large dynamical communication networks: A case study

Here we propose a set of dynamical measures to detect causality effects on communication datasets. Using appropriate comparison models, we are able to enumerate patterns containing causality relationships. This approach is illustrated on a large cellphone call dataset: we show that specific patterns such as short chain-like trees and directed loops are more frequent in real networks than in comparison models at short time scales. We argue that these patterns - which involve a node and its close neighborhood - constitute indirect evidence of active spreading of information only at a local level. This suggests that mobile phone networks are used almost exclusively to communicate information to a closed group of individuals. Furthermore, our study reveals that the bursty activity of the callers promotes larger patterns at small time scales.

Massoud Seifi, Jean-Loup Guillaume, Ivan Junier, Jean-Baptiste Rouquier and Svilen Iskrov
Proceedings of the 3rd Workshop on Complex Networks (CompleNet 2012), Melbourne, Florida
Stable community cores in complex networks

Complex networks are generally composed of dense sub-networks called communities. Many algorithms have been proposed to automatically detect such communities. However, they are often unstable and behave non-deterministically. We propose here to use this non-determinism in order to compute groups of nodes on which community detection algorithms agree most of the time.We show that these groups of nodes, called community cores, are more similar to Ground Truth than communities in real and arti cial networks. Furthermore, we show that in contrary to the classical approaches, we can reveal the absence of community structure in random graphs.

Massoud Seifi and Jean-Loup Guillaume
Proceedings of the Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France, pp. 1173-1180
Community Cores in Evolving Networks

Community structure is a key property of complex networks. Many algorithms have been proposed to automatically detect communities in static networks but few studies have considered the detection and tracking of communities in an evolving network. Tracking the evolution of a given community over time requires a clustering algorithm that produces stable clusters. However, most community detection algorithms are very unstable and therefore unusable for evolving networks. In this paper, we apply the methodology proposed in [14] to detect what we call community cores in evolving networks. We show that cores are much more stable than "classical" communities and that we can overcome the disadvantages of the stabilized methods.

Bivas Mitra, Lionel Tabourier and Camille Roth
Computer Networks, Vol. 56(3), 2012.
Intrinsically dynamic communities from evolving, directed network data

Community finding algorithms for networks have recently been extended to dynamic data. Most of these recent methods aim at exhibiting community partitions from successive graph snapshots and thereafter connecting or smoothing these partitions using clever time-dependent features and sampling techniques. These approaches are nonetheless achieving longitudinal rather than dynamic community detection. We assume that commu- nities are fundamentally defined by the repetition of interactions among a set of nodes over time. According to this definition, analyzing the data by considering successive snapshots induces a significant loss of information: we suggest that it blurs essentially dynamic phe- nomena—such as communities based on repeated inter-temporal interactions, nodes switching from a community to another across time, or the possibility that a community survives while its members are being integrally replaced over a longer time period. We propose a formalism which aims at tackling this issue in the context of time-directed data- sets (such as citation networks), and present several illustrations on both empirical and synthetic dynamic networks. We eventually introduce intrinsically dynamic metrics to qualify temporal community structure and emphasize their possible role as an estimator of the quality of the community detection—taking into account the fact that various empir- ical contexts may call for distinct ‘community’ definitions and detection criteria.

Alice Albano, Jean-Loup Guillaume, and Bénédicte Le Grand
Proceedings of the  Mining Social Network Dynamic 2012 Workshop (MSND), In conjunction with the international conference World Wide Web WWW 2012, Lyon, France
File Diffusion in a Dynamic Peer-to-peer Network

Many studies have been made on diffusion in the field of epidemiology, and in the last few years, the development of social networking has induced new types of diffusion. In this paper, we focus on file diffusion on a peer-to-peer dynamic network using eDonkey protocol. On this network, we observe a linear behavior of the actual file diffusion. This result is interesting, because most diffusion models exhibit exponential behaviors. In this paper, we propose a new model of diffusion, based on the SI (Susceptible / Infected) model, which produces results close to the linear behavior of the observed diffusion. We then justify the linearity of this model, and we study its behavior in more details.

Matthieu Latapy, Clémence Magnien et Raphaël Fournier
in Information Processing and Management, Volume 49, Issue 1, January 2013, Pages 248–263
Quantifying Paedophile Activity in a Large P2P System

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25% of queries are paedophile, and that more than 0.2% of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Amélie Medem, Clémence Magnien and Fabien Tarissan
Fourth International Workshop on Network Science for Communication Networks (NetSciCom), 2012
Impact of power-law topology on IP-level routing dynamics: simulation results

This paper focuses on the Internet IP-level routing topology and proposes relevant explanations to its apparent dynamics.We first represent this topology as a power-law random graph. Then, we incorporate to the graph two well known factors responsible for the observed dynamics, which are load balancing and route evolution. Finally, we simulate on the graph traceroute-like measurements. Repeating the process many times, we obtain several graph instances that we use to model the dynamics. Our results show that we are able to capture on power-law graphs the dynamic behaviors observed on the Internet. We find that the results on power-law graphs, while qualitatively similar to the one of  Erdös-Rényi  random graphs, highly differ quantitatively; for instance, the rate of discovery of new nodes in power-law graphs is extremely low compared to the rate in Erdös-Rényi graphs.

Oussama Allali, Lionel Tabourier, Clémence Magnien, Matthieu Latapy
Social Network Analysis and Mining, March 2013, Volume 3, Issue 1, pp 85-91.
Internal links and pairs as a new tool for the analysis of bipartite complex networks

Many real-world complex networks are best modeled as bipartite (or 2-mode) graphs, where nodes are divided into two sets with links connecting one side to the other. However, there is currently a lack of methods to analyze properly such graphs as most existing measures and methods are suited to classical graphs. A usual but limited approach consists in deriving 1-mode graphs (called projections) from the underlying bipartite structure, though it causes important loss of information and data storage issues. We introduce here internal links and pairs as a new notion useful for a bipartite analysis, and which gives insights on the information lost by projecting the bipartite graph. We illustrate the relevance of theses concepts on several real-world instances illustrating how it enables to discriminate behaviors among various cases when we compare them to a benchmark of random graphs. Then, we show that we can draw benefit from this concept for both modeling complex networks and storing them in a compact format.

Fernando Peruani and Lionel Tabourier
PLoS ONE, Vol 6(12), 2011
Directedness of Information Flow in Mobile Phone Communication Networks

Without having direct access to the information that is being exchanged, traces of information flow can be obtained by looking at temporal sequences of user interactions. These sequences can be represented as causality trees whose statistics result from a complex interplay between the topology of the underlying (social) network and the time correlations among the communications. Here, we study causality trees in mobile-phone data, which can be represented as a dynamical directed network. This representation of the data reveals the existence of super-spreaders and super-receivers. We show that the tree statistics, respectively the information spreading process, are extremely sensitive to the in-out degree correlation exhibited by the users. We also learn that a given information, e.g., a rumor, would require users to retransmit it for more than 30 hours in order to cover a macroscopic fraction of the system. Our analysis indicates that topological node-node correlations of the underlying social network, while allowing the existence of information loops, they also promote information spreading. Temporal correlations, and therefore causality effects, are only visible as local phenomena and during short time scales. Consequently, the very idea that there is (intentional) information spreading beyond a small vicinity is called into question. These results are obtained through a combination of theory and data analysis techniques.

Bénédicte Le Grand et Matthieu Latapy
Atelier EGC 2011 Visualisation et Extraction de Connaissances, Brest, France, Janvier 2011.
Détection visuelle d’événements dans des grands réseaux d’interaction dynamiques. Application à l’Internet

L’objectif des travaux présentés dans ce papier est de faciliter la détection visuelle d’événements dans des réseaux d’interaction dynamiques de grande taille.
Deux méthodes de visualisation classiques et «exhaustives» ont été étudiées, qui repré-sentent l’évolution des liens du réseau au fil du temps. Les limites liées au facteur d’échelle nous ont conduits à proposer deux métaphores restreintes au suivi des noeuds du réseau. Les forces, les limites et la complémentarité de ces quatre métaphores nous ont permis de déga-ger une ébauche de méthodologie de détection d’événements dans la dynamique de grands réseaux d’interaction.
Les visualisations et la méthodologie présentées dans cet article sont génériques et appli-cables à tout type de noeuds et de liens ; elles sont ici appliquées pour illustration à un sous-ensemble du réseau Internet.

Cassio Melo, Benedicte Le Grand, Marie-Aude Aufaure, and Anastazia Berezianos
IV’2011 conference on Information Visualization, London, July 2011.
Extracting and Visualizing Tree-Like Structures from Concept Lattices

Concept lattices built with Formal Concept Analysis are usually represented by Hasse diagrams illustrating the groupings of objects described by common attributes. Hasse diagrams display the relations of partial order between concepts in a hierarchical fashion, where each concept may have several parent concepts. Lattice visualization becomes a problem as the number of clusters grows significantly with the number of objects and attributes. Interpreting the lattice through a direct visualization of the line diagram rapidly becomes impossible and more synthetic representations are needed. In this work we propose several methods to enhance the readability of concept lattices firstly though colouring and distortion techniques, and secondly by extracting and visualizing trees derived from concept lattices structures.

Abdelhamid Salah brahim, Bénédicte Le Grand, Lionel Tabourier, Matthieu Latapy
Journal of Computational Science, Vol 2(3), 2011
Citations among blogs in a hierarchy of communities: method and case study

How does the structure of a network (e.g. its organization into groups or communities) impact the interaction among its nodes? In this paper we propose a generic methodology to study the correlation between complex networks interactions and their community structure. We illustrate it on a blog network and focus on citation links. We first define a homophily probability evaluating the tendency of blogs to ite blogs from the same communities. We then introduce the notion of community distance to capture if a blog cites (or is cited by) blogs distant or not from its community. We analyze the distribution of distances corresponding to each citation link, and use it to build maps of relevant communities which help interpreting blogs interactions. This community-oriented approach allows to study citation links at various abstraction levels, and conversely, enable us to characterize communities with regard to their citation behaviour.

Thomas Aynaud and Jean-Loup Guillaume
in Proceedings of MARAMI 2011
Extraction hiérarchique de fenêtres de temps basée sur la structure communautaire

Dans cet article nous décrivons une méthode de décomposition du temps en fenêtres de temps dans un graphe dynamique. Une particularité de la méthode est que le résultat est un regroupement hiérarchique : les fenêtres de temps sont elles-mêmes susceptibles d'en contenir. En outre, les fenêtres n'ont pas besoin d'être contiguës ce qui permet par exemple de détecter une structure se répétant. De plus, chaque fenêtre est associée à une décomposition en communautés représentant la structure topologique du réseau durant cette fenêtre. Nous appliquons ensuite cette méthode à trois graphes de terrain dynamiques ayant des caractéristiques différentes pour montrer que les fenêtres identifiées correspondent bien à des phénomènes observables. In this paper, we describe a way to cluster the time in time windows in a dynamic network. The result is a tree and thus time windows can themselves contain smaller ones. Moreover, the windows do not have to be consecutive and this allows for instance to detect repeated structure. Each window is also associated to a community decomposition that represents the topological structure of the network during this window. We then apply the method to three dynamic networks to show that observed time windows correspond to observable phenomena.

Alice Albano
JFGG 2011
Phénomènes de diffusion dans les réseaux dynamiques : simulation et modélisation

Les phénomènes de diffusion sont présents dans de nombreux contextes: diffusion d’épidémies, de virus informatiques, d’information dans des réseaux sociaux, etc. Bien que les réseaux où se produit la diffusion soient souvent dynamiques, cette dynamique n’est pas prise en compte dans la plupart des modèles existants. L’objectif de ces travaux est de proposer des modèles de diffusion, et d’étudier l’impact de la dynamique du réseau sur la diffusion.

Oussama Allali, Clémence Magnien and Matthieu Latapy
Dynamic Networks and Knowledge Discovery, special issue of Intelligent Data Analysis, 7 (1), 5-25, 2013
Internal links prediction: a new approach for predicting links in bipartite graphs

Many real-world complex networks, like actor-movie or file-provider relations, have a bipartite nature and evolve over time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specific challenges it raises. We define in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We thoroughly describe the method and its variations, and experimentally compare it to a basic collaborative filtering approach. We present results obtained for a typical practical case. We reach the conclusion that our method performs very well, and we study in details how its parameters may influence obtained results.

Thomas Aynaud and Jean-Loup Guillaume
proceedings of the Fifth SNA-KDD Workshop Social Network Mining and Analysis, in conjunction with the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2011)
Multi-Step Community Detection and Hierarchical Time Segmentation in Evolving Networks

Many complex systems composed of interacting objects  like social networks or the web can be modeled as graphs. They can usually be divided in dense sub-graphs with few links between them, called communities and detecting this underlying community structure may have a major impact in the understanding of these systems. We focus here on evolving graphs, for which the usual approach is to represent the state of the system at different time steps and to compute communities independently on the graph obtained at each time step. We propose in this paper to use a different framework: instead of detecting communities on each time step, we detect a unique decomposition in communities that is relevant for (almost) every time step during a given period called the time window.  We propose a definition of this new decomposition and two algorithms to detect it quickly. We validate both the approach and the algorithms on three evolving networks of different kinds showing that the quality loss at each time step is very low despite the constraint of maximization on several time steps. Since the time window length is a crucial parameter of our technique, we also propose an unsupervised hierarchical clustering algorithm to build automatically a hierarchical time segmentation into time windows. This clustering relies on a new similarity measure based on community structure. We show that it is very efficient in detecting meaningful windows.

Adrien Friggeri, Jean-Philippe Cointet and Matthieu Latapy
Complex Systems 19, 2011
A Real-World Spreading Experiment in the Blogosphere

We designed an experiment to observe a spreading phenomenon in the blogosphere. This experiment relies on a small applet that participants copy on their own web page. We present the obtained dataset, which we freely provide for study, and conduct basic analysis. We conclude that, despite the classical assumption, in this experiment famous blogs do not necessarily act as super spreaders.

Pascal Pons, Matthieu Latapy
Theoretical Computer Science (TCS) 412(8-10): 892-900 (2011)
Post-processing hierarchical community structures: Quality improvements and multi-scale view

Dense sub-graphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Most existing community detection algorithms produce a hierarchical structure of communities and seek a partition into communities that optimizes a given quality function. We propose new methods to improve the results of any of these algorithms. First we show how to optimize a general class of additive quality functions (containing the modularity, the performance, and a new similarity-based quality function which we propose) over a larger set of partitions than the classical methods. Moreover, we define new multi-scale quality functions which make it possible to detect different scales at which meaningful community structures appear, while classical approaches find only one partition.

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo
Complex Systems, 20 (1), 23-30, 2011.
A Radar for the Internet

Mapping the internet's topology is a challenge in itself, and studying its dynamics is even more difficult. Achieving this would however provide key information on the nature of the internet, crucial for modeling and simulation. Moreover, detecting anomalies in this dynamics is a key issue for security. We introduce here a new measurement approach which makes it possible to capture internet dynamics at a scale of a few minutes in a radar-like manner. By conducting and analyzing large-scale measurements of this kind, we rigorously and automatically detect events in the observed dynamics, which is totally out of reach of previous approaches.

Oussama Allali, Clémence Magnien and Matthieu Latapy
Proceedings of the third International Workshop on Network Science for Communication Networks (Netscicom 2011), In conjunction with IEEE Infocom 2011.
Link prediction in bipartite graphs using internal links and weighted projection

Many real-world complex networks, like client-product or file-provider relations, have a bipartite nature and evolve during time. Predicting links that will appear in them is one of the main approach to understand their dynamics. Only few works address the bipartite case, though, despite its high practical interest and the specific challenges it raises. We define in this paper the notion of internal links in bipartite graphs and propose a link prediction method based on them. We describe the method and experimentally compare it to a basic collaborative filtering approach. We present results obtained for two typical practical cases. We reach the conclusion that our method performs very well, and that internal links play an important role in bipartite graphs and their dynamics.

Matthieu Latapy, Clémence Magnien and Raphaël Fournier
Quantifying paedophile queries in a large P2P system

Increasing knowledge of paedophile activity in P2P systems is a crucial societal concern, with important consequences on child protection, policy making, and internet regulation. Because of a lack of traces of P2P exchanges and rigorous analysis methodology, however, current knowledge of this activity remains very limited. We consider here a widely used P2P system, eDonkey, and focus on two key statistics: the fraction of paedophile queries entered in the system and the fraction of users who entered such queries. We collect hundreds of millions of keyword-based queries; we design a paedophile query detection tool for which we establish false positive and false negative rates using assessment by experts; with this tool and these rates, we then estimate the fraction of paedophile queries in our data; finally, we design and apply methods for quantifying users who entered such queries. We conclude that approximately 0.25 % of queries are paedophile, and that more than 0.2 % of users enter such queries. These statistics are by far the most precise and reliable ever obtained in this domain.

Thomas Aynaud, Vincent Blondel, Jean-Loup Guillaume and Renaud Lambiotte
in Partitionnement de graphe : optimisation et applications, Traité IC2, Hermes-Lavoisier 2011
Optimisation locale multi-niveaux de la modularité

Dans ce chapitre, nous présentons une méthode gloutonne pour optimiser la modularité d'un graphe. Cette méthode de partionnement permet de traiter avec une excellente précision des systèmes de taille inégalée, allant jusqu'à plusieurs milliards de liens. Notre algorithme a de surcroît l'avantage de ne pas être limité à l'optimisation de la modularité puisqu'il peut être généralisé à d'autres fonctions de qualité, et de découvrir des communautés à différentes échelles. Les performances de l'algorithme sont évaluées sur des graphes artificiels pour lesquels la structure communautaire est connue, ainsi que sur des graphes de terrain réels.

Thomas Aynaud and Jean-Loup Guillaume
Latin-American Workshop on Dynamic Networks (LAWDN), Buenos Aires, 2010
Long range community detection

Complex networks can usually be divided in dense subnetworks called communities. In evolving networks, the usual way to detect communities is to find several partitions independently, one for each time step. However, this generally causes troubles when trying to track communities from one time step to the next. We propose here a new method to detect only one decomposition in communities that is good for (almost) every time step. We show that this unique partition can be computed with a modification of the Louvain method and that the loss of quality at each time step is generally low despite the constraint of global maximization. We also show that some specific modifications of the networks topology can be identified using this unique partition in the case of the Internet topology.

Thomas Aynaud and Jean-Loup Guillaume
Journée thématique Fouille de grands graphes, en conjonction avec la première conférence sur les Modèles et l’Analyse des Réseaux : Approches Mathématiques et Informatique (MARAMI), Toulouse, France, 2010
Détection de communautés à long terme dans les graphes dynamiques

La plupart des graphes de terrain peuvent être décomposés en sous graphes denses appelés communautés. Habituellement, dans des graphes dynamiques, les communautés sont détectées pour chaque instant indépendamment ce qui pose de nombreux problèmes tels que la stabilité ou le suivi de des communautés entre deux décompositions successives. Nous proposons ici une méthode pour trouver une partition unique, de qualité, couvrant une longue période. Cette décomposition peut être trouvée efficacement via une adaptation de la méthode de Louvain et la perte de qualité à chaque instant due à la contrainte de détecter des communautés globales s’avère assez faible.

Matthieu Latapy, Thi Ha Duong Phan, Christophe Crespelle, Thanh Qui Nguyen
COCOA 2010
Termination of Multipartite Graph Series Arising from Complex Network Modelling

An intense activity is nowadays devoted to the definition of models capturing the properties of complex networks. Among the most promising approaches, it has been proposed to model these graphs via their clique incidence bipartite graphs. However, this approach has, until now, severe limitations resulting from its incapacity to reproduce a key property of this object: the overlapping nature of cliques in complex networks. In order to get rid of these limitations we propose to encode the structure of clique overlaps in a network thanks to a process consisting in iteratively factorising the maximal bicliques between the upper level and the other levels of a multipartite graph. We show that the most natural definition of this factorising process leads to infinite series for some instances. Our main result is to design a restriction of this process that terminates for any arbitrary graph. Moreover, we show that the resulting multipartite graph has remarkable combinatorial properties and is closely related to another fundamental combinatorial object. Finally, we show that, in practice, this multipartite graph is computationally tractable and has a size that makes it suitable for complex network modelling.

Christophe Crespelle and Fabien Tarissan
in Complex Networks, special issue of Computer Communications, 34-5, pp. 635-648 (2011), DOI: 10.1016/j.comcom.2010.06.006
Evaluation of a new method for measuring the internet degree distribution: Simulation results

Many contributions rely on the degree distribution of the Internet topology. However, current knowledge of this property is based on biased and erroneous measurements and is subject to much debate. Recently, a new approach, referred to as the Neighborhood Flooding method, was proposed to avoid issues raised by classical measurements. It aims at measuring the neighborhood of Internet core routers by sending traceroute probes from many monitors distributed in the Internet towards a given target router. In this paper, we investigate the accuracy of this method with simulations. Our results show that Neighborhood Flooding is free from the bias highlighted in the classical approach and is able to observe properly the exact degree of a vast majority of nodes in the core of the network. We show how the quality of the estimation depends on the number of monitors used and we carefully examine the influence of parameters of the simulations on our results. We also point out some limitations of the Neighborhood Flooding method and discuss their impact on the observed distribution.

Thomas Aynaud and Jean-Loup Guillaume
Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010, pages 508-514
Static community detection algorithms for evolving networks

Complex networks can often be divided in dense sub-networks called communities. We study, using a partition edit distance, how three community detection algorithms transform their outputs if the input network is sligthly modified. The instabilities appear to be important and we propose a modification of one algorithm to stabilize it and to allow the tracking of the communities in a dynamic network. This modification has one parameter which is a tradeoff between stability and quality. The resulting algorithm appears to be very effective. We finally use it on a dynamic network of blogs.

Abdelhamid Salah Brahim, Bénédicte Le Grand and Matthieu Latapy
IEEE ICC 2010 workshop "SocNets", Cape Town, South Africa, May 2010
Some Insight on Dynamics of Posts and Citations in Different Blog Communities

This paper explores new approaches and methods to characterize post and citation dynamics in different blog communities. In particular, evolution of post popularity over time is studied, as well as information spreading cascades. This methodology goes beyond traditional approaches by defining classes of dynamic behaviors based on topological features of the post network, and by investigating the impact of topical communities on post popularity dynamics and on information spreading cascades. This methodology has been applied to a corpus of active French blogs monitored during 4 months.

Frédéric Ouédraogo, Clémence Magnien
Computer Communications, 34, 670-679, 2011
Impact of Sources and Destinations on the Observed Properties of the Internet Topology

Maps of the internet topology are generally obtained by measuring the routes from a given set of sources to a given set of destinations (with tools such as traceroute). It has been shown that this approach misses some links and nodes. Worse, in some cases it can induce a bias in the obtained data, i.e. the properties of the obtained maps are significantly different from those of the real topology. In order to reduce this bias, the general approach consists in increasing the number of sources. Some works have studied the relevance of this approach. Most of them have used theoretical results, or simulations on network models. Some papers have used real data obtained from actual measurement procedures to evaluate the importance of the number of sources and destinations, but no work to our knowledge has studied extensively the importance of the choice of sources or destinations. Here, we use real data from internet topology measurements to study this question: by comparing partial measurements to our complete data, we can evaluate the impact of adding sources or destinations on the observed properties.

We show that the number of sources and destinations used plays a role in the observed properties, but that their choice, and not only their number, also has a strong influence on the observations. We then study common statistics used to describe the internet topology, and show that they behave differently: some can be trusted once the number of sources and destinations are not too small, while others are difficult to evaluate.

Alina Stoica, Zbigniew Smoreda, Christophe Prieur and Jean-Loup Guillaume
Proceedings of the Workshop on the Analysis of Mobile Phone Networks, satellite workshop to NetSci 2010
Age, Gender and Communication Networks

In this paper, we address some sociological  and topological issues associated with mobile phone communication. Based on a dataset of a few million users, we use customers' age and gender information  to study relation between these parameters and the average behavior of users in terms of number of calls, number of SMS and calls duration. We  also study the dataset from a networking point of view: we define different profiles based on the topological properties of the  personal network of each individual and study the relations between these profiles and the age of customers

Thomas Aynaud and Jean-Loup Guillaume
in Technique et Science Informatiques, vol. 30/2, pp. 137-154, 2011
Structure multi-échelle de grands graphes de terrain

Most complex networks can be divided into dense sub-graphs called communities. These communities may also be divided recursively producing a hierarchical structure of communities, summarized in a tree named dendrogram. In this article we analyze this structure extracted from several complex networks. First we study the shape of the tree and, in particular, communities imbrications. Then we show that an excessive decomposition of communities can result in meaningless communities. We propose a couple of approaches to solve this problem.

Assia Hamzaoui, Matthieu Latapy and Clémence Magnien
Proceedings of International Workshop on Dynamic Networks (WDN), in conjunction with WiOpt 2010
Detecting Events in the Dynamics of Ego-centered Measurements of the Internet Topology

Detecting events such as major routing changes or congestions in the dynamics of the internet topology is an important but challenging task. We explore here a top-down approach based on a notion of statistically significant events. It consists in identifying statistics which exhibit a homogeneous distribution with outliers, which correspond to events. We apply this approach to ego-centerd measurements of the internet topology (views obtained from a single monitor) and show that it succeeds in detecting meaningful events. Finally, we give some hints for the interpretation of such events in terms of network events.

Lamia Benamara and Clémence Magnien
Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.
Estimating properties in dynamic systems: the case of churn in P2P networks

In many systems, such as P2P systems, the dynamicity of participating elements, or churn, has a strong impact. As a consequence, many efforts have been made to characterize it, and in particular to capture the session length distribution. However in most cases, estimating it rigorously is difficult. One of the reasons is that, because the observation window is by definition finite, parts of the sessions that begin before the window and/or end after it are missed. This induces a bias. Although it tends to decrease when the observation window length increases, it is difficult to quantify its importance, or how fast it decreases.

Here, we introduce a general methodology that allows us to know if the observation window is long enough to characterize a given property. This methodology is not specific to one study case and may be applied to any property in a dynamic system. We apply this methodology to the study of session lengths in a massive measurement of P2P activity in the eDonkey system. We show that the measurement needs to last for at least one week in order to obtain representative results. We also show that our methodology allows us to precisely characterize the shape of the session length distribution.

Christophe Crespelle, Matthieu Latapy, Elie Rotenberg
Proceedings of the third International Workshop on Network Science for Communication Networks (NetSciCom 2010), In conjunction with IEEE Infocom 2010.
Rigorous measurement of IP-level neighborhood of Internet core routers

Many contributions use the degree distribution of IP-level internet topology. However, current knowledge of this property relies on biased and erroneous measurements, and so it is subject to much debate. We introduce here a new approach, dedicated to the core of the internet, which avoids the issues raised by classical measurements. It is based on the measurement of IP-level neighborhood of internet core routers, for which we design and implement a rigorous method. It consists in sending traceroute probes from many monitors distributed in the internet towards a given target router and carefully selecting the relevant information in collected data. Using simulations, we provide strong evidence of the accuracy of our approach. We then conduct real-world measurements illustrating the practical effectiveness of our method. This constitutes a significant step towards reliable knowledge of the IP-level degree distribution of the core of the internet.

Massoud Seifi, Jean-loup Guillaume, Matthieu Latapy and Bénédicte Le Grand
Proceedings of the 8th Workshop on Visualization and Knowledge Extraction (EGC 2010)
Interactive multiscale visualization of huge graphs: application to a network of weblogs

De nombreux réseaux du monde réel peuvent être modélisés par des grands graphes. Réduire la complexité d'un graphe de manière à ce qu'il puisse être facilement interprété par l'oeil humain est une aide précieuse pour comprendre et analyser ce type de données. Nous proposons une méthodologie de visualisation interactive multi-échelle de grands graphes basée sur une classification des sommets qui nous permet de représenter ces graphes de manière lisible et interprétable. Nous appliquons notre méthodologie à un réseau de blogs francophones.

Clémence Magnien, Matthieu Latapy and Jean-Loup Guillaume
ACM Computing Surveys, 43 (3), 2011. Extended abstract published in LNCS, proceedings of the 8-th International Conference On Principles Of Distributed Systems OPODIS'04, 2004, Grenoble, France (title: Comparison of Failures and Attacks on Random and Scale-free Networks)
Impact of Random Failures and Attacks on Poisson and Power-Law Random Networks

It appeared recently that the underlying degree distribution of a network may play a crucial role concerning its robustness. Empiric and analytic results have been obtained, based on asymptotic and mean-field approximations. Previous work insisted on the fact that power-law degree distributions induce a high resilience to random failure but a high sensitivity to some attack strategies, while Poisson degree distributions are quite sensitive in both cases. Then, much work has been done to extend these results. We focus here on these basic results, with the aim of deepening significantly our understanding of their origin and their limitations. We review in detail previous contributions and we give full proofs in a unified framework, in which the approximations on which these results rely are well identified. We then add to these known results a set of new ones aimed at enlightening some important aspects. We also provide extensive and rigorous experiments which make it possible to evaluate the relevance of the analytic results. We reach the conclusion that, even if it is clear that the basic results of the field are true and important, they are in practice much less striking than generally thought. The differences between random failures and attacks are not so huge and can be explained with simple facts. Likewise, the differences in the behaviors induced by power-law and Poisson distributions are not as striking as often claimed.

Guillaume Valadon, Clémence Magnien and Ryuji Wakikawa
Mobile IPv6 Deployments: Graph-based Analysis and practical Guidelines

The Mobile IPv6 protocol is a major solution to supply mobility services on the Internet. Many networking vendors have already implemented it in their operating systems and equipments. Moreover, it was recently selected to provide permanent IP addresses to end users of WiMAX and 3GPP2. Mobile IPv6 relies on a specific router called the home agent that hides location changes of the mobile nodes from the rest of the Internet. To do so, the mobile nodes' traffic must flow through the home agent. This mandatory deviation produces longer paths and higher communication delays. In order to solve these problems, we describe a new approach to address deployments of Mobile IPv6 based on graph theory and could be applied to any operator's network. In particular, we use notions of centrality in graphs to quantify increases of communication distances induced by dogleg routing and identify relevant home agents locations. We evaluate this approach using real-world network topologies and show that the obtained Mobile IPv6 performance could be close to direct paths ones. The proposed algorithm is generic and can be used to achieve efficient deployments of Mobile IPv6 as well as Home Agent Migration: a new Mobile IPv6 architecture using several distributed home agents.

Fabien Tarissan, Matthieu Latapy and Christophe Prieur
In Proceedings of the IEEE International Workshop on Network Science For Communication Networks (NetSciCom'09)
Efficient Measurement of Complex Networks Using Link Queries

Complex networks are at the core of an intense research activity. However, in most cases, intricate and costly measurement procedures are needed to explore their structure. In some cases, these measurements rely on link queries: given two nodes, it is possible to test the existence of a link between them. These tests may be costly, and thus minimizing their number while maximizing the number of discovered links is a key issue. This is a challenging task, though, as initially no information is known on the network. This paper studies this problem: we observe that properties classically observed on real-world complex networks give hints for their efficient measurement; we derive simple principles and several measurement strategies based on this, and experimentally evaluate their efficiency on real-world cases. In order to do so, we introduce methods to evaluate the efficiency of strategies. We also explore the bias that different measurement strategies may induce.

Frédéric Aidouni, Matthieu Latapy and Clémence Magnien
Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy
Ten weeks in the life of an eDonkey server

This paper presents a capture of the queries managed by an eDonkey server during almost 10 weeks, leading to the observation of almost 9 billion messages involving almost 90 million users and more than 275 million distinct files. Acquisition and management of such data raises several challenges, which we discuss as well as the solutions we developed. We obtain a very rich dataset, orders of magnitude larger than previously avalaible ones, which we provide for public use. We finally present basic analysis of the obtained data, which already gives evidence of non-trivial features.

Clémence Magnien, Frédéric Ouedraogo, Guillaume Valadon, Matthieu Latapy
Fourth International Conference on Internet Monitoring and Protection (ICIMP 2009), May 24-28, 2009, Venice, Italy
Fast dynamics in Internet topology: preliminary observations and explanations

By focusing on what can be observed by running traceroute-like measurements at a high frequency from a single monitor to a fixed destination set, we show that the observed view of the topology is constantly evolving at a pace much higher than expected. Repeated measurements discover new IP addresses at a constant rate, for long period of times (up to several months). In order to provide explanations, we study this phenomenon both at the IP, and at the Autonomous System levels. We show that this renewal of IP addresses is partially caused by a BGP routing dynamics, altering paths between existing ASes. Furthermore, we conjecture that an intra AS routing dynamics is another cause of this phenomenon.

Oussama Allali, Matthieu Latapy and Clémence Magnien
Sixth International Workshop on Hot Topics in Peer-to-Peer Systems (Hot-P2P 2009), May 29, 2009, Rome, Italy
Measurement of eDonkey Activity with Distributed Honeypots

Collecting information about user activity in peer-to-peer systems is a key but challenging task. We describe here a distributed platform for doing so on the eDonkey network, relying on a group of honeypot peers which claim to have certain files and log queries they receive for these files. We then conduct some measurements with typical scenarios and use the obtained data to analyze the impact of key parameters like measurement duration, number of honeypots involved, and number of advertised files. This illustrates both the possible uses of our measurement system, and the kind of data one may collect using it.

Bénédicte Le Grand, Marie-Aude Aufaure et Michel Soto
Conférence EGC 2009 (Extraction et Gestion des Connaissances), Strasbourg, France, 27-30 janvier 2009
Empreintes conceptuelles et spatiales pour la caractérisation des réseaux sociaux

In this paper, Formal Concept Analysis and Galois lattices are used for the analysis of complex datasets, online social networks in particular. Lattice-inspired statistics computed on the objects of the lattice provide their "conceptual distribution". An experimentation conducted on four social networks' samples shows how these statistics may be used to characterize these networks and filter them automatically.

Matthieu Latapy
Theoretical Computer Science (TCS) 407 (1-3), pages 458-473, 2008
Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs

Finding, counting and/or listing triangles (three vertices with three edges) in massive graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed survey of proposed main-memory solutions to these problems, in an unified way. We note that previous authors paid surprisingly little attention to space complexity of main-memory solutions, despite its both fundamental and practical interest. We therefore detail space complexities of known algorithms and discuss their implications. We also present new algorithms which are time optimal for triangle listing and beats previous algorithms concerning space needs. They have the additional advantage of performing better on power-law graphs, which we also detail. We finally show with an experimental study that these two algorithms perform very well in practice, allowing to handle cases which were previously out of reach.

Matthieu Latapy, Clémence Magnien and Frédéric Ouédraogo
Proceedings of ADN'08: 1st International Workshop on Analysis of Dynamic Networks, in conjunction with IEEE ICDM 2008
A Radar for the Internet

In contrast with most internet topology measurement research, our concern here is not to obtain a map as complete and precise as possible of the whole internet. Instead, we claim that each machine's view of this topology, which we call ego-centered view, is an object worth of study in itself. We design and implement an ego-centered measurement tool, and perform radar-like measurements consisting of repeated measurements of such views of the internet topology. We conduct long-term (several weeks) and high-speed (one round every few minutes) measurements of this kind from more than one hundred monitors, and we provide the obtained data. We also show that these data may be used to detect events in the dynamics of internet topology.

Clémence Magnien, Matthieu Latapy and Michel Habib
ACM Journal of Experimental Algorithmics (JEA), 13, 2009
Fast Computation of Empirically Tight Bounds for the Diameter of Massive Graphs

The diameter of a graph is among its most basic parameters. Since a few years, it moreover became a key issue to compute it for massive graphs in the context of complex network analysis. However, known algorithms, including the ones producing approximate values, have too high a time and/or space complexity to be used in such cases. We propose here a new approach relying on very simple and fast algorithms that compute (upper and lower) bounds for the diameter. We show empirically that, on various real-world cases representative of complex networks studied in the literature, the obtained bounds are very tight (and even equal in some cases). This leads to rigorous and very accurate estimations of the actual diameter in cases which were previously untractable in practice.

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre
J. Stat. Mech. (october 2008) P10008
Fast unfolding of communities in large networks

We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection method in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2.6 million customers and by analyzing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad-hoc modular networks

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Céline Robardet and Antoine Scherrer
Computer Networks 52 (2008), pp. 2842-2858
Description and simulation of dynamic mobility networks

During the last decade, the study of large scale complex networks has attracted a substantial amount of attention and works from several domains: sociology, biology, computer science, epidemiology. Most of such complex networks are inherently dynamic, with new vertices and links appearing while some old ones disappear. Until recently, the dynamic of these networks was less studied and there is a strong need for dynamic network models in order to sustain protocol performance evaluations and fundamental analyzes in all the research domains listed above. We propose in this paper a novel framework for the study of dynamic mobility networks. We address the characterization of dynamics by proposing an in-depth description and analysis of two real-world data sets. We show in particular that links creation and deletion processes are independent of other graph properties and that such networks exhibit a large number of possible configurations, from sparse to dense. From those observations, we propose simple yet very accurate models that allow to generate random mobility graphs with similar temporal behavior as the one observed in experimental data.

Pierre Borgnat, Eric Fleury, Jean-Loup Guillaume, Clémence Magnien, Céline Robardet et Antoine Scherrer
Proceedings of NATO Advanced Study Institute on Mining Massive Data Sets for Security, IOS Press, 2008
Evolving networks

Most real networks often evolve through time: changes of topology can occur if some nodes and/or edges appear and/or disappear, and the types or weights of nodes and edges can also change even if the topology stays static. Mobile devices with wireless capabilities (mobile phones, laptops, etc.) are a typical example of evolving networks where nodes or users are spread in the environment and connections between users can only occur if they are near each other. This whois- near-whom network evolves every time users move and communication services (such as the spread of any information) will deeply rely on the mobility and on the characteristics of the underlying network. This paper presents some recent results concerning the characterization of the dynamics of complex networks through three different angles: evolution of some parameters on snapshots of the network, parameters describing the evolution itself, and intermediate approaches consisting in the study of specific phenomena or users of interest through time.

Matthieu Latapy, Clémence Magnien and Nathalie Del Vecchio
Social Networks (2008), vol. 30, no1, pp. 31-48
Basic Notions for the Analysis of Large Two-mode Networks

Many large real-world networks actually have a 2-mode nature: their nodes may be separated into two classes, the links being between nodes of different classes only. Despite this, and despite the fact that many ad-hoc tools have been designed for the study of special cases, very few exist to analyse (describe, extract relevant information) such networks in a systematic way. We propose here an extension of the most basic notions used nowadays to analyse large 1-mode networks (the classical case) to the 2-mode case. To achieve this, we introduce a set of simple statistics, which we discuss by comparing their values on a representative set of real-world networks and on their random versions. This makes it possible to evaluate their relevance in capturing properties of interest in 2-mode networks.

Fabien Viger, Brice Augustin, Xavier Cuvellier, Clémence Magnien, Matthieu Latapy, Timur Friedman, and Renata Teixeira
Computer Networks 52-5 (2008), pp. 998-1018. Extended abstract published in the proceedings of the 6-th Internet Measurement Conference IMC'06, 2006, Rio de Janeiro, Brazil
Detection, Understanding, and Prevention of Traceroute Measurement Artifacts

Traceroute is widely used: from the diagnosis of network problems to the assemblage of internet maps. Unfortunately, there are a number of problems with traceroute methodology, which lead to the inference of erroneous routes. This paper studies particular structures arising in nearly all traceroute measurements. We characterize them as "loops", "cycles", and "diamonds". We identify load balancing as a possible cause for the appearance of false loops, cycles and diamonds, i.e., artifacts that do not represent the internet topology. We provide a new publicly-available traceroute, called Paris traceroute, which, by controlling the packet header contents, provides a truer picture of the actual routes that packets follow. We performed measurements, from the perspective of a single source tracing towards multiple destinations, and Paris traceroute allowed us to show that many of the particular structures we observe are indeed traceroute measurement artifacts.

Matthieu Latapy and Clémence Magnien
Infocom'08 Proceedings, Phoenix, USA
Complex Network Measurements: Estimating the Relevance of Observed Properties

Complex networks, modeled as large graphs, received much attention during these last years. However, data on such networks is only available through intricate measurement procedures. Until recently, most studies assumed that these procedures eventually lead to samples large enough to be representative of the whole, at least concerning some key properties. This has crucial impact on network modeling and simulation, which rely on these properties. Recent contributions proved that this approach may be misleading, but no solution has been proposed. We provide here the first practical way to distinguish between cases where it is indeed misleading, and cases where the observed properties may be trusted. It consists in studying how the properties of interest evolve when the sample grows, and in particular whether they reach a steady state or not. In order to illustrate this method and to demonstrate its relevance, we apply it to data-sets on complex network measurements that are representative of the ones commonly used. The obtained results show that the method fulfills its goals very well. We moreover identify some properties which seem easier to evaluate in practice, thus opening interesting perspectives.

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, Cristobald de Kerchove and Renaud Lambiotte
Phys. Rev. E 77, 036114 (2008)
Local leaders in random networks

We consider local leaders in random uncorrelated networks, i.e., nodes whose degree is higher than or equal to the degree of all their neighbors. An analytical expression is found for the probability for a node of degree k to be a local leader. This quantity is shown to exhibit a transition from a situation where high-degree nodes are local leaders to a situation where they are not, when the tail of the degree distribution behaves like the power law ~k^gamma_c with gamma_c=3. Theoretical results are verified by computer simulations, and the importance of finite-size effects is discussed.

Vincent D. Blondel, Jean-Loup Guillaume, Julien M. Hendrickx, and Raphaël M. Jungers
Phys. Rev. E 76, 066101 (2007)
Distance distribution in random graphs and application to network exploration

We consider the problem of determining the proportion of edges that are discovered in an Erdős–Rényi graph when one constructs all shortest paths from a given source node to all other nodes. This problem is equivalent to the one of determining the proportion of edges connecting nodes that are at identical distance from the source node. The evolution of this quantity with the probability of existence of the edges exhibits intriguing oscillatory behavior. In order to perform our analysis, we introduce a different way of computing the distribution of distances between nodes. Our method outperforms previous similar analyses and leads to estimates that coincide remarkably well with numerical simulations. It allows us to characterize the phase transitions appearing when the connectivity probability varies.

Rémi Dorat, Matthieu Latapy, Bernard Conein and Nicolas Auray
Annals of Telecommunications (2007), vol. 62, no3-4, pp. 325-349
Multi-level analysis of an interaction network between individuals in a mailing-list

It is well known now that most real-world complex networks have some properties which make them very different from random networks. In the case of interactions between authors of messages in a mailing-list, however, a multi-level structure may be responsible for some of these properties. We propose here a rigorous but simple formalism to investigate this question, and we apply it to an archive of the Debian user mailing-list. This leads to the identification of some properties which may indeed be explained this way, and of some properties which need deeper analysis.

Jérémie Leguay, Matthieu Latapy, Timur Friedman, Kavé Salamatian
Computer Networks 51, pages 2067-2087, 2007. Extended abstract published in LNCS, proceedings of the 4-th IFIP international conference on Networking, 2005, Waterloo, Canada
Describing and simulating routes on the Internet

This contribution deals with actual routes followed by packets on the internet at IP level. We first propose a set of statistical properties to analyse such routes, which brings detailed information on them. We then use the obtained results to suggest and evaluate methods for generating artificial routes suitable for simulation purposes. This also makes it possible to evaluate various network models. This work is based on large data sets provided mainly by CAIDA's skitter infrastructure.

Ravi Kumar and Matthieu Latapy
Theoretical Computer Science, special issue on Complex Networks, Volume 355, Number 1, 6 April 2006
Theoretical Computer Science special issue on Complex Networks – foreword

Jean-Loup Guillaume and Matthieu Latapy
Physica A 371, pages 795-813, 2006. Extended abstract published in LNCS, proceedings of the 1-st international workshop on Combinatorial and Algorithmic Aspects of Networking CAAN'04, 2004, Banff, Canada
Bipartite graphs as Models of Complex Networks

It appeared recently that the classical random graph model used to represent real-world complex networks does not capture their main properties. Since then, various attempts have been made to provide accurate models. We study here the first model which achieves the following challenges: it produces graphs which have the three main wanted properties (clustering, degree distribution, average distance), it is based on some real-world observations, and it is sufficiently simple to make it possible to prove its main properties. This model consists in sampling a random bipartite graph with prescribed degree distribution. Indeed, we show that any complex network can be viewed as a bipartite graph with some specific characteristics, and that its main properties can be viewed as consequences of this underlying structure. We also propose a growing model based on this observation.

Anne-Ruxandra Carvunis, Matthieu Latapy, Annick Lesne, Clémence Magnien and Laurent Pezard
Physica A 367, pages 585-612, 2006
Dynamics of three-state excitable units on Poisson vs power-law random networks

The influence of the network topology on the dynamics of systems of coupled excitable units is studied numerically and demonstrates a lower dynamical variability for power-law networks than for Poisson ones. This effect which reflects a robust collective excitable behavior is however lower than that observed for diffusion processes or network robustness. Instead, the presence (and number) of triangles and larger loops in the networks appears as a parameter with strong influence on the considered dynamics.

Pascal Pons and Matthieu Latapy
Journal of Graph Algorithms and Applications (JGAA) vol. 10, no. 2, pages 191-218, 2006. Extended abstract published in LNCS, proceedings of the 20-th International Symposium on Computer and Information Sciences ISCIS'05, 2005, Istambul, Turquie
Computing communities in large networks using random walks

Dense subgraphs of sparse graphs (communities), which appear in most real-world complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn²) and space O(n²) in the worst case, and in time O(n² log n) and space O(n²) in most real-world cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.

Stevens Le Blond, Jean-Loup Guillaume and Matthieu Latapy
LNCS, proceedings of the 4-th International workshop on Peer-to-Peer Systems IPTPS'05, 2005, Ithaka, New York, USA
Clustering in P2P exchanges and consequences on performances

We propose here an analysis of a rich dataset which gives an exhaustive and dynamic view of the exchanges processed in a running eDonkey system. We focus on correlation in term of data exchanged by peers having provided or queried at least one data in common. We introduce a method to capture these correlations (namely the data clustering), and study it in detail. We then use it to propose a very simple and efficient way to group data into clusters and show the impact of this underlying structure on search in typical P2P systems. Finally, we use these results to evaluate the relevance and limitations of a model proposed in a previous publication. We indicate some realistic values for the parameters of this model, and discuss some possible improvements.

Pierre Fraigniaud, Philippe Gauron and Matthieu Latapy
LNCS, proceedings of the 11-th international conference Euro-Par, 2005, Lisbonne, Portugal
Combining the use of clustering and scale-free nature of user exchanges into a simple and efficient P2P system

It appeared recently that user interests in a P2P system possess clustering properties that may be used to reduce significantly the amount of traffic of flooding-based search strategies. It was also observed that they possess scale-free properties that may be used for the design of efficient routing-based search strategies. In this paper, we show that the combination of these two properties make it possible to design an efficient and simple fully decentralized search strategy. Further, simulations processed on real-world traces show that other unidentified properties hidden in actual queries make our protocol even more efficient, performing searches in logarithmic expected number of steps.

Jean-Loup Guillaume and Matthieu Latapy
Complex Systems 16, pages 83-94, 2005
Complex Network Metrology

In order to study some complex networks like the Internet, the Web, social networks or biological networks, one first has to explore them. This gives a partial and biased view of the real object, which is generally assumed to be representative of the whole. However, up to now nobody knows how and how much the measure influences the results. Using the example of the Internet and a rough model of its exploration process, we show that the way a given complex network is explored may strongly influence the observed properties. This leads us to argue for the necessity of developing a science of metrology of complex networks. Its aim would be to study how the partial and biased view of a network relates to the properties of the whole network.

Jean-Loup Guillaume, Matthieu Latapy and Damien Magoni
Computer Networks 50, pages 3197-3224, 2006. Extended abstract published in the proceedings of the 24-th IEEE international conference Infocom'05, 2005, Miami, USA
Relevance of Massively Distributed Explorations of the Internet Topology: Qualitative Results

Internet maps are generally constructed using the traceroute tool from a few sources to many destinations. It appeared recently that this exploration process gives a partial and biased view of the real topology, which leads to the idea of increasing the number of sources to improve the quality of the maps. In this paper, we present a set of experiments we have conducted to evaluate the relevance of this approach. It appears that the statistical properties of the underlying network have a strong influence on the quality of the obtained maps, which can be improved using massively distributed explorations. Conversely, some statistical properties are very robust, and so the known values for the Internet may be considered as reliable. We validate our analysis using real-world data and experiments, and we discuss its implications.

Fabien Viger and Matthieu Latapy
Extended abstract published in LNCS, proceedings of the 11-th international conference on Computing and Combinatorics COCOON'05, 2005, Kunming, Yunnan, Chine
Random generation of large connected simple graphs with prescribed degree distribution

We address here the problem of generating random graphs uniformly from the set of simple connected graphs having a prescribed degree sequence. Our goal is to provide an algorithm suitable for practical use both because of its ability to generate very large graphs (efficiency) and because it is easy to implement (simplicity). We focus on a family of heuristics for which we introduce optimality conditions, and show how this optimality can be reached in practice. We then propose a different approach, specifically designed for real-world degree distributions, which outperforms the first one. Based on a conjecture which we discuss rigorously and study empirically, we finally reduce the best asymptotic complexity bound known so far.

Jean-Loup Guillaume, Matthieu Latapy and Stevens Le-Blond
LNCS, proceedings of the 6-th International Workshop on Distributed Computing IWDC'04, 2004, Kolkata, India
Statistical analysis of a P2P query graph based on degrees and their time-evolution

Despite their crucial impact on the performances of P2P systems, very few is known on peers behaviors in such networks. We propose here a study of these behaviors in a running environment using a semicentralised P2P system (eDonkey). To achieve this, we use a trace of the queries made to a large server managing up to fifty thousands peers simultaneously, and a few thousands queries per second. We analyse these data using complex network methods, and focus in particular on the degrees, their correlations, and their time-evolution. Results show a large variety of observed phenomena, including the variety of peers behaviors and heterogeneity of data queries, which should be taken into account when designing P2P systems.

Jean-Loup Guillaume and Matthieu Latapy
Information Processing Letters (IPL) 90:5, pages 215-221, 2004
Bipartite Structure of all Complex Networks

The analysis and modelling of various complex networks has received much attention in the last few years. Some such networks display a natural bipartite structure: two kinds of nodes coexist with links only between nodes of different kinds. This bipartite structure has not been deeply studied until now, mainly because it appeared to be specific to only a few complex networks. However, we show here that all complex networks can be viewed as bipartite structures sharing some important statistics, like degree distributions. The basic properties of complex networks can be viewed as consequences of this underlying bipartite structure. This leads us to propose the first simple and intuitive model for complex networks which captures the main properties met in practice.

Jean-Loup Guillaume, Matthieu Latapy and Laurent Viennot
LNCS, proceedings of the 3-rd international conference Web-Age Information Management WAIM'02, 2002, Beijing, Chine. Abstract published in the proceedings of the 11-th international conference World Wide Web WWW'02, 2002, Honolulu, Hawaï
Efficient and Simple Encodings for the Web Graph

In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.