Randomized reference models for temporal networks

Christian Lyngby Vestergaard

mercredi 18 décembre 2019 à 10h30, salle 24-25-405, Jussieu

Many dynamical systems can be successfully analyzed by representing them as networks. Empirically measured networks and dynamic processes that take place in these situations show heterogeneous, non-Markovian, and intrinsically correlated topologies and dynamics. This makes their analysis particularly challenging. Randomized reference models (RRMs) have emerged as a general and versatile toolbox for studying such systems. Defined as ensembles of random networks with given features constrained to match those of an input (empirical) network, they may for example be used to identify important features of empirical networks and their effects on dynamical processes unfolding in the network. RRMs are typically implemented as procedures that reshuffle an empirical network, making them very generally applicable. However, the effects of most shuffling procedures on network features remain poorly understood, rendering their use non-trivial and susceptible to misinterpretation. Here we propose a unified framework for classifying and understanding microcanonical RRMs (MRRMs). Focusing on temporal networks, we use this framework to build a taxonomy of MRRMs that proposes a canonical naming convention, classifies them, and deduces their effects on a range of important network features. We furthermore show that certain classes of compatible MRRMs may be applied in sequential composition to generate over a hundred new MRRMs from the existing ones surveyed in this article. We provide two tutorials showing applications of the MRRM framework to empirical temporal networks: 1) to analyze how different features of a network affect other features and 2) to analyze how such features affect a dynamic process in the network. We finally survey applications of MRRMs found in literature. Our taxonomy provides a reference for the use of MRRMs, and the theoretical foundations laid here may further serve as a base for the development of a principled and automatized way to generate and apply randomized reference models for the study of networked systems.

Mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : structure des chaînes dabonnés, similarité daudience des fans et diversité des recommandations algorithmiques.

Bilel Benbouzid

8 Octobre, 2019, 11:00hrs. Salle 25-26/105, Jussieu.

Cette présentation vise à rendre compte des résultats d’une recherche sur l’espace politique et médiatique de la plateforme YouTube. Deux questions principales motivent notre étude qui se situe encore dans une phase exploratoire.  Tout d’abord, YouTube n’étant pas un réseau social en tant que tel – c’est à la fois un réseau d’utilisateurs (chaque utilisateur est une chaîne), un espace de stockage auquel on a accès via un moteur de recherche et une plateforme de streaming qui dépend en grande partie d’un système de recommandation – comment rendre compte de sa structure et catégoriser les chaînes qui la composent? Deuxièmement, nous cherchons à mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : l’analyse relationnelle des chaînes révèle-t-elle une structure équivalente à celle de l’écosystème médiatique révélé par le Médialab de Sciences Po à partir d’une analyse des réseaux d’hyperliens des sites d’information et leur dynamique de circulation sur Twitter (Cardon, 2019) ? Ou bien, au contraire, les relations entre des chaînes sur YouTube fait-elle apparaître une structure spécifique à l’architecture de la plateforme ? Pour répondre à cette question, il faut analyser YouTube du point de vue des différents modes de mise en visibilité des chaînes, qu’ils soient humains ou algorithmique. C’est pourquoi, nous proposons de présenter la structuration des chaînes sur YouTube en trois dimensions qui sont autant de manières différentes, mais complémentaires de rendre compte de la polarisation : le réseau social qui correspond au réseau des chaînes abonnées ou recommandées par les chaînes elles-mêmes ; le réseau des chaînes qui partagent des communautés de fans ; et le réseau de chaînes formé par les vidéos recommandées conjointement par l’algorithme. Ces trois dimensions donnent à voir les facettes multiples de YouTube. Alors que la première permet de comprendre la manière dont les humains se recommandent les chaînes entre eux, la seconde montre les publics partagés par les chaînes, la troisième rend compte de l’espace médiatique et politique formé par une machine (la recommandation algorithmique). Dans cette phase exploratoire, nous avons comparé ces trois dimensions selon la diversité des contenus qu’elles valorisent (une catégorisation des chaînes a été réalisée manuellement). Dans cette présentation nous montrerons 1) les réseaux de chaînes qui se donnent à voir selon les trois dimensions, 2) les matrices de modularité en fonction des catégories de chaîne et 3) une comparaison de la distribution des catégories de chaines selon les trois dimensions à partir des résultats des expériences de marche aléatoire et des calculs de la perplexité qui leur sont associés. Nous montrons que les réseaux de chaînes produits par les humains (les réseaux de chaînes amis et celui des publics de fan) polarisent moins que le réseau produit par la recommandation algorithmique. Nous montrons aussi que si bulle de filtre il y a, elle ne se situe pas où elle est attendue : c’est moins les contenus à caractère sensationnels et « complotistes » qui sont les plus recommandés par la plateforme (ce qui est souvent dénoncé dans le débat public), mais les chaines de médias traditionnels – un des effets sans doute de l’action menée par YouTube faire évoluer le système de recommandation. Ces résultats invitent à discuter de la nature de l’espace public que construit YouTube dans ce contexte de problème de désinformation. YouTube semble accorder un soutien particulier aux contenus des professionnels de l’information, bien plus qu’à celui des chaînes alternatives. Dans cette discussion, nous proposons de débattre de la nature de la « démocratie YouTube » au travers des interprétations des métriques et des modèles d’analyse de réseaux.

Qualified personalities: Sociology of the French Media Government from Cinema to the Digital Era

Olivier Alexandre

Chapter in Reconceptualising Film Policies, 2017

The nature of French audiovisual sector is determined by a layering of policies, created at various periods of time. A public policy system has been continuously developed and adapted since the 1950s, mostly focusing on the support to and defence of the artistic and moral quality of film and television programmes. This institutional system has relied on ‘qualified personalities’ emanating from diverse sectors such as cinema, television, arts, culture, education, administration and the political world. The chapter presents a sociological analysis of the French model matrix. It focuses on the revolving-door system and the policy-making personnel that have enforced a stable regulatory frame for audiovisual industries. The rise of digital operators and executives – more internationalised and engineering-solution oriented – is currently destabilising this ecosystem. There is an important generational, cultural, ideological and linguistic gap between the French ‘Media government’ and the management teams of the new players.

Download

A general graph-based framework for top-N recommendation using content, temporal and trust information

Armel Jacques Nzekon Nzeko’o, Maurice Tchuente and Matthieu Latapy

Journal of Interdisciplinary Methodologies and Issues in Sciences, 2019

Recommending appropriate items to users is crucial in many e-commerce platforms that containimplicit data as users’ browsing, purchasing and streaming history. One common approach con-sists in selecting the N most relevant items to each user, for a given N, which is called top-Nrecommendation. To do so, recommender systems rely on various kinds of information, like itemand user features, past interest of users for items, browsing history and trust between users. How-ever, they often use only one or two such pieces of information, which limits their performance.In this paper, we design and implement GraFC2T2, a general graph-based framework to easilycombine and compare various kinds of side information for top-N recommendation. It encodescontent-based features, temporal and trust information into a complex graph, and uses personal-ized PageRank on this graph to perform recommendation. We conduct experiments on Epinionsand Ciao datasets, and compare obtained performances using F1-score, Hit ratio and MAP eval-uation metrics, to systems based on matrix factorization and deep learning. This shows that ourframework is convenient for such explorations, and that combining different kinds of informationindeed improves recommendation in general.

Download

Spreading dynamics in a cattle trade network: Size, speed, typical profile and consequences on epidemic control strategies

Aurore Payen, Lionel Tabourier and Matthieu Latapy

PLOS ONE, 2019

Infections can spread among livestock notably because infected animals can be brought to uncontaminated holdings, therefore exposing a new group of susceptible animals to the dis- ease. As a consequence, the structure and dynamics of animal trade networks is a major focus of interest to control zoonosis. We investigate the impact of the chronology of animal trades on the dynamics of the process. Precisely, in the context of a basic SI model spread- ing, we measure on the French database of bovine transfers to what extent a snapshot- based analysis of the cattle trade networks overestimates the epidemic risks. We bring into light that an analysis taking into account the chronology of interactions would give a much more accurate assessment of both the size and speed of the process. For this purpose, we model data as a temporal network that we analyze using the link stream formalism in order to mix structural and temporal aspects. We also show that in this dataset, a basic SI spread- ing comes down in most cases to a simple two-phases scenario: a waiting period, with few contacts and low activity, followed by a linear growth of the number of infected holdings. Using this portrait of the spreading process, we identify efficient strategies to control a potential outbreak, based on the identification of specific elements of the link stream which have a higher probability to be involved in a spreading process.

Download

OTMedia+ : Graphes et Propagation d’information

Nicolas Hervé

9 Juillet, 2019, 11:00hrs. Salle 26-00/332, Jussieu.

OTMedia (Observatoire TransMedia) est une plateforme logicielle dédiée aux projets de recherche qui permet d’analyser de grandes quantités de données diverses, multimodales, transmédia liées à l’actualité française et francophone. OTMedia collecte, traite et indexe en permanence des milliers de flux provenant de la télévision, de la radio, du Web, de la presse, des agences de presse et de Twitter. Dans le contexte de ce projet, nous souhaiterions étudier la propagation d’informations et d’images sur le Web en utilisant la théorie des graphes pour nous aider à extraire les caractéristiques/indicateurs pour décrire les événements médiatiques.

Degree-based Outlier Detection within IP Traffic Modelled as a Link Stream (extended version)

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

Computer Networks, 2019

This paper aims at precisely detecting and identifying anomalous events in IP traffic. To this end, we adopt the link stream formalism which properly captures temporal and structural features of the data. Within this framework, we focus on finding anomalous behaviours with respect to the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to directly find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to several MAWI captures of IP traffic and we show that it succeeds in detecting relevant patterns in terms of anomalous network activity.

Download

Drawing and Visualising Event-Based Dynamic Graphs.

Daniel Archambault

May 27th, 11h Room 24-25-405. UPMC – Sorbonne Université. 4 Place de Jussieu, 75005 Paris.

One of the most important types of data in data science is the graph or network. Networks encode relationships between entities:  people in social network, genes in biological network, and many others forms of data.  These networks are often dynamic and consist of a set of events — edges/nodes with individual timestamps.  In the complex network literature, these networks are often referred to as temporal networks.  As an example, a post to a social media service creates an edge existing at a specific time and a series of posts is a series of such events.  However, the majority of dynamic graph visualisations use the timeslice, a series of snapshots of the network at given times, as a basis for visualisation. In this talk, I present two recent approaches for event-based network visualisation:  DynNoSlice and the Plaid. DynNoSlice is a method for embedding these networks directly in the 2D+t space-time cube along with methods to explore the contents of the cube.  The Plaid is an interactive system for visualising long in time dynamic networks and interaction provenance through interactive timeslicing.

Combining path-constrained random walks to recover link weights in heterogeneous information networks

Hong-Lan Botterman and Robin Lamarche-Perrin

CompleNet, 2019

Heterogeneous information networks (HIN) are abstract representations of systems composed of multiple types of entities and their relations. Given a pair of nodes in a HIN, this work aims at recovering the exact weight of the incident link to these two nodes, knowing some other links present in the HIN. Actually, this weight is approximated by a linear combination of probabilities, results of path-constrained random walks i.e., random walks where the walker is forced to follow only a specific sequence of node types and edge types which is commonly called a meta path, performed on the HIN. This method is general enough to compute the link weight between any types of nodes. Experiments on Twitter data show the applicability of the method.

Download

Multidimensional Outlier Detection in Interaction Data: Application to Political Communication on Twitter

Audrey Wilmet and Robin Lamarche-Perrin

CompleNet, 2019

We introduce a method which aims at getting a better understanding of how millions of interactions may result in global events. Given a set of dimensions and a context, we find different types of outliers: a user during a given hour which is abnormal compared to its usual behavior, a relationship between two users which is abnormal compared to all other relationships, etc. We apply our method on a set of retweets related to the 2017 French presidential election and show that one can build interesting insights regarding political organization on Twitter.

Download

RankMerging: a supervised learning-to-rank framework to predict links in large social networks

Lionel Tabourier, Daniel F. Bernardes, Anne-Sophie Libert and Renaud Lambiotte

Machine Learning, 2019

Uncovering unknown or missing links in social networks is a difficult task because of their sparsity and because links may represent different types of relationships, characterized by different structural patterns. In this paper, we define a simple yet efficient supervised learning-to-rank framework, called RankMerging, which aims at combining information provided by various unsupervised rankings. We illustrate our method on three different kinds of social networks and show that it substantially improves the performances of unsupervised methods of ranking as well as standard supervised combination strategies. We also describe various properties of RankMerging, such as its computational complexity, its robustness to feature selection and parameter estimation and discuss its area of relevance: the prediction of an adjustable number of links on large networks.

Download

Applications through human mobility lens

Vsevolod Salnikov

jeudi 4 avril 2019, 14h, salle 26-00/332, LIP6, Sorbonne Université

Slides

In this talk I will present various data-oriented projects we have done recently. The general line will focus on human mobility sensing and different applications of such datasets from more theoretical ones towards extremely applied, which are on the border of research and commercial activities.Moreover we will discuss different stages: from data collection towards models and application as well as the ‘in-the-field’ validation of model predictions. I will propose few ways of data collection, which permitted to get impressive and reliable datasets with almost no cost. These datasets are already used for studies, but I would be also happy to discuss various applications and ways to collaborate!

Modélisation du contrôle des utilisateurs sur leurs données personnelles

Pablo Rauzy

Vendredi 12 avril 2019, 11h, salle 25-26/105, LIP6, Sorbonne Université

Du point de vue d’un utilisateur ou d’une utilisatrice d’un système d’informations, la privacy correspond au contrôle qu’il ou elle peut exercer sur ses données personnelles dans ce système. Cette vision de la privacy est essentielle si l’on veut contribuer au développement de technologies émancipatrices, c’est à dire aux services de leurs utilisateurs et utilisatrices seulement. L’étude et l’évaluation rigoureuse de la privacy offerte par un système nécessite donc une caractérisation formelle de ce contrôle. Nous proposons un cadre formel basé sur des capacités qui permet de spécifier et de raisonner sur ce contrôle et ses propriétés. Nous verrons au travers d’exemples que cela permet notamment la comparaison de mises en oeuvre alternatives d’un même système (un réseau social basique dont nous comparons trois implémentations possibles), et donc la possibilité d’étudier et d’optimiser la privacy dès la phase de conception.

An information-theoretic framework for the lossy compression of link streams

Robin Lamarche-Perrin

Theoretical Computer Science, 2019

Graph compression is a data analysis technique that consists in the replacement of parts of a graph by more concise structural patterns in order to reduce its description length. It notably provides interesting exploration tools for the study of real, large-scale, and complex graphs which cannot be grasped at first glance. This article proposes a framework for the compression of temporal graphs, that is for the compression of graphs that evolve with time. This framework first builds on a simple and limited scheme, exploiting structural equivalence for the lossless compression of static graphs, then generalises it to the lossy compression of link streams, a recent formalism for the study of temporal graphs. Such generalisation builds on the natural extension of (bidimensional) relational data by the addition of a third temporal dimension. Moreover, we introduce an information-theoretic measure to quantify and to control the information that is lost during compression, as well as an algebraic characterisation of the space of possible compression patterns to enhance the expressiveness of the initial compression scheme. These contributions lead to the definition of a combinatorial optimisation problem, that is the Lossy Multistream Compression Problem, for which we provide an exact algorithm.

Comparaison des méthodes de classification pour l’identification des noeuds importants dans les graphes dynamiques

Marwan Ghanem

Rencontres jeunes chercheurs en RI, 2019

De nos jours, nous nous intéressons à la détection d’entités importantes, ceci peut être des mots-clés importants dans un document ou Twitter, ou des individus importants dans un réseau de mouvement. Nous pouvons modéliser ces données sous la forme d’un graphe dynamique et utiliser des métriques de centralité telle que la centralité de proximité temporelle. Malheureusement, cela peut être coûteux. Dans ce travail, nous comparons la précision de plusieurs méthodes de classification supervisée, les unes par rapport aux autres, à la détection de ces nœuds importants. Sur seize jeux de données de natures différentes, nous montrons que ces méthodes réussissent à différencier les nœuds importants de nœuds insignifiants. Nous montrons également que prendre en compte la nature des données diminue la qualité de résultats. Enfin, nous examinons le temps du calcul de chacune de ces méthodes contre le temps du calcul de méthodes exact.

Download

Neighbour-distinguishing decompositions of graphs

Mohammed SENHAJI

Vendredi 15 mars 2019, 14hrs, salle 25-26/105, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

The main question that we explore was introduced by Karonski, Luczak and Thomason in 2004 : Can we weight the edges of a graph G , with weights 1 ,2 , and 3 , such that any two of adjacent vertices of G are distinguished by the sum of their incident weights ? This question later becomes the famous 1-2-3 Conjecture.In this presentation we explore several variants of the 1-2-3 Conjecture, and their links with locally irregular decompositions. We are interested in both optimisation results and algorithmic problems. We first introduce an equitable version of the neighbour-sum-distinguishing edge-weightings, that is a variant where we require every edge weight to be used the same number of times up to a difference of 1. After that we explore how neighbour-sum-distinguishing weightings behave if we require sums of neighbouring vertices to differ by at least 2. Namely, we present results on the smallest maximal weight needed to construct such weightings for some classes of graphs, and study some algorithmic aspects of this problem. Due to the links between neighbour-sum-distinguishing edge weightings and locally irregular decompositions, we also explore the locally irregular index of subcubic graphs, along with other variants of the locally irregular decomposition problem. Finally, we present a more general work toward a general theory unifying neighbour-sum-distinguishing edge-weightings and locally irregular decompositions.

Minorities in Networks

Claudia Wagner

Lundi 28 janvier 2018, 11hrs, salle 24-25/405, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

Networks are the infrastructure of our social and professional life andalso of modern information systems where billions of documents andentities are interlinked. However, not all nodes are equal in thesenetworks. Often we observe attributes (e.g. gender or ethnicity) thatdefine the group membership of a node. In this talk I will explore therole of minorities in social networks and information networks, provideempirical evidence for the disadvantage of minorities and discussfactors that may place minorities at a disadvantage.

What graphs can contribute to a more transparent artificial intelligence

Tiphaine Viard

January 17th 2019, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

AI and machine learning are commonly described as « black boxes » that are efficient, but opaque. While complete opacity would be an exageration, it is true that many methods for explainability rely on forms of retro-engineering: we try to infer the model from its (partial, intermediary, final) results. These methods are typically based on large-scale, efficient matrix manipulation. Graphs and their extensions have shown to be visualisable and interpretable, even at large scales. In their classical formulation, they are also very similar to matrices. However, few to no machine learning method explored what graphs could contribute to its models.  This is partly due to the fact that graph computations have long been expensive, typically having polynomial running times, which is incompatible with the scale of data in most of today’s machine learning applications. However, the situation has changed: (i) the impact of AI on society makes it no longer acceptable to favour efficiency despite transparency, and (ii) recent advances in algorithmic methods on graphs demonstrates that due to the nature of real-world graphs, even some NP-hard problems become tractable. The aim of this talk is to explore this avenue of research. We will discuss the state-of-the art in learning from graph data, present some recent results showing that structure-based features indeed have the potential to make machine learning more transparent at no extra cost, and finally we will discuss future tracks of research.