Randomized reference models for temporal networks

Christian Lyngby Vestergaard

mercredi 18 décembre 2019 à 10h30, salle 24-25-405, Jussieu

Many dynamical systems can be successfully analyzed by representing them as networks. Empirically measured networks and dynamic processes that take place in these situations show heterogeneous, non-Markovian, and intrinsically correlated topologies and dynamics. This makes their analysis particularly challenging. Randomized reference models (RRMs) have emerged as a general and versatile toolbox for studying such systems. Defined as ensembles of random networks with given features constrained to match those of an input (empirical) network, they may for example be used to identify important features of empirical networks and their effects on dynamical processes unfolding in the network. RRMs are typically implemented as procedures that reshuffle an empirical network, making them very generally applicable. However, the effects of most shuffling procedures on network features remain poorly understood, rendering their use non-trivial and susceptible to misinterpretation. Here we propose a unified framework for classifying and understanding microcanonical RRMs (MRRMs). Focusing on temporal networks, we use this framework to build a taxonomy of MRRMs that proposes a canonical naming convention, classifies them, and deduces their effects on a range of important network features. We furthermore show that certain classes of compatible MRRMs may be applied in sequential composition to generate over a hundred new MRRMs from the existing ones surveyed in this article. We provide two tutorials showing applications of the MRRM framework to empirical temporal networks: 1) to analyze how different features of a network affect other features and 2) to analyze how such features affect a dynamic process in the network. We finally survey applications of MRRMs found in literature. Our taxonomy provides a reference for the use of MRRMs, and the theoretical foundations laid here may further serve as a base for the development of a principled and automatized way to generate and apply randomized reference models for the study of networked systems.

Mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : structure des chaînes dabonnés, similarité daudience des fans et diversité des recommandations algorithmiques.

Bilel Benbouzid

8 Octobre, 2019, 11:00hrs. Salle 25-26/105, Jussieu.

Cette présentation vise à rendre compte des résultats d’une recherche sur l’espace politique et médiatique de la plateforme YouTube. Deux questions principales motivent notre étude qui se situe encore dans une phase exploratoire.  Tout d’abord, YouTube n’étant pas un réseau social en tant que tel – c’est à la fois un réseau d’utilisateurs (chaque utilisateur est une chaîne), un espace de stockage auquel on a accès via un moteur de recherche et une plateforme de streaming qui dépend en grande partie d’un système de recommandation – comment rendre compte de sa structure et catégoriser les chaînes qui la composent? Deuxièmement, nous cherchons à mesurer le degré de polarisation de l’espace médiatique et politique sur YouTube : l’analyse relationnelle des chaînes révèle-t-elle une structure équivalente à celle de l’écosystème médiatique révélé par le Médialab de Sciences Po à partir d’une analyse des réseaux d’hyperliens des sites d’information et leur dynamique de circulation sur Twitter (Cardon, 2019) ? Ou bien, au contraire, les relations entre des chaînes sur YouTube fait-elle apparaître une structure spécifique à l’architecture de la plateforme ? Pour répondre à cette question, il faut analyser YouTube du point de vue des différents modes de mise en visibilité des chaînes, qu’ils soient humains ou algorithmique. C’est pourquoi, nous proposons de présenter la structuration des chaînes sur YouTube en trois dimensions qui sont autant de manières différentes, mais complémentaires de rendre compte de la polarisation : le réseau social qui correspond au réseau des chaînes abonnées ou recommandées par les chaînes elles-mêmes ; le réseau des chaînes qui partagent des communautés de fans ; et le réseau de chaînes formé par les vidéos recommandées conjointement par l’algorithme. Ces trois dimensions donnent à voir les facettes multiples de YouTube. Alors que la première permet de comprendre la manière dont les humains se recommandent les chaînes entre eux, la seconde montre les publics partagés par les chaînes, la troisième rend compte de l’espace médiatique et politique formé par une machine (la recommandation algorithmique). Dans cette phase exploratoire, nous avons comparé ces trois dimensions selon la diversité des contenus qu’elles valorisent (une catégorisation des chaînes a été réalisée manuellement). Dans cette présentation nous montrerons 1) les réseaux de chaînes qui se donnent à voir selon les trois dimensions, 2) les matrices de modularité en fonction des catégories de chaîne et 3) une comparaison de la distribution des catégories de chaines selon les trois dimensions à partir des résultats des expériences de marche aléatoire et des calculs de la perplexité qui leur sont associés. Nous montrons que les réseaux de chaînes produits par les humains (les réseaux de chaînes amis et celui des publics de fan) polarisent moins que le réseau produit par la recommandation algorithmique. Nous montrons aussi que si bulle de filtre il y a, elle ne se situe pas où elle est attendue : c’est moins les contenus à caractère sensationnels et « complotistes » qui sont les plus recommandés par la plateforme (ce qui est souvent dénoncé dans le débat public), mais les chaines de médias traditionnels – un des effets sans doute de l’action menée par YouTube faire évoluer le système de recommandation. Ces résultats invitent à discuter de la nature de l’espace public que construit YouTube dans ce contexte de problème de désinformation. YouTube semble accorder un soutien particulier aux contenus des professionnels de l’information, bien plus qu’à celui des chaînes alternatives. Dans cette discussion, nous proposons de débattre de la nature de la « démocratie YouTube » au travers des interprétations des métriques et des modèles d’analyse de réseaux.

OTMedia+ : Graphes et Propagation d’information

Nicolas Hervé

9 Juillet, 2019, 11:00hrs. Salle 26-00/332, Jussieu.

OTMedia (Observatoire TransMedia) est une plateforme logicielle dédiée aux projets de recherche qui permet d’analyser de grandes quantités de données diverses, multimodales, transmédia liées à l’actualité française et francophone. OTMedia collecte, traite et indexe en permanence des milliers de flux provenant de la télévision, de la radio, du Web, de la presse, des agences de presse et de Twitter. Dans le contexte de ce projet, nous souhaiterions étudier la propagation d’informations et d’images sur le Web en utilisant la théorie des graphes pour nous aider à extraire les caractéristiques/indicateurs pour décrire les événements médiatiques.

Drawing and Visualising Event-Based Dynamic Graphs.

Daniel Archambault

May 27th, 11h Room 24-25-405. UPMC – Sorbonne Université. 4 Place de Jussieu, 75005 Paris.

One of the most important types of data in data science is the graph or network. Networks encode relationships between entities:  people in social network, genes in biological network, and many others forms of data.  These networks are often dynamic and consist of a set of events — edges/nodes with individual timestamps.  In the complex network literature, these networks are often referred to as temporal networks.  As an example, a post to a social media service creates an edge existing at a specific time and a series of posts is a series of such events.  However, the majority of dynamic graph visualisations use the timeslice, a series of snapshots of the network at given times, as a basis for visualisation. In this talk, I present two recent approaches for event-based network visualisation:  DynNoSlice and the Plaid. DynNoSlice is a method for embedding these networks directly in the 2D+t space-time cube along with methods to explore the contents of the cube.  The Plaid is an interactive system for visualising long in time dynamic networks and interaction provenance through interactive timeslicing.

Applications through human mobility lens

Vsevolod Salnikov

jeudi 4 avril 2019, 14h, salle 26-00/332, LIP6, Sorbonne Université

Slides

In this talk I will present various data-oriented projects we have done recently. The general line will focus on human mobility sensing and different applications of such datasets from more theoretical ones towards extremely applied, which are on the border of research and commercial activities.Moreover we will discuss different stages: from data collection towards models and application as well as the ‘in-the-field’ validation of model predictions. I will propose few ways of data collection, which permitted to get impressive and reliable datasets with almost no cost. These datasets are already used for studies, but I would be also happy to discuss various applications and ways to collaborate!

Modélisation du contrôle des utilisateurs sur leurs données personnelles

Pablo Rauzy

Vendredi 12 avril 2019, 11h, salle 25-26/105, LIP6, Sorbonne Université

Du point de vue d’un utilisateur ou d’une utilisatrice d’un système d’informations, la privacy correspond au contrôle qu’il ou elle peut exercer sur ses données personnelles dans ce système. Cette vision de la privacy est essentielle si l’on veut contribuer au développement de technologies émancipatrices, c’est à dire aux services de leurs utilisateurs et utilisatrices seulement. L’étude et l’évaluation rigoureuse de la privacy offerte par un système nécessite donc une caractérisation formelle de ce contrôle. Nous proposons un cadre formel basé sur des capacités qui permet de spécifier et de raisonner sur ce contrôle et ses propriétés. Nous verrons au travers d’exemples que cela permet notamment la comparaison de mises en oeuvre alternatives d’un même système (un réseau social basique dont nous comparons trois implémentations possibles), et donc la possibilité d’étudier et d’optimiser la privacy dès la phase de conception.

Neighbour-distinguishing decompositions of graphs

Mohammed SENHAJI

Vendredi 15 mars 2019, 14hrs, salle 25-26/105, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

The main question that we explore was introduced by Karonski, Luczak and Thomason in 2004 : Can we weight the edges of a graph G , with weights 1 ,2 , and 3 , such that any two of adjacent vertices of G are distinguished by the sum of their incident weights ? This question later becomes the famous 1-2-3 Conjecture.In this presentation we explore several variants of the 1-2-3 Conjecture, and their links with locally irregular decompositions. We are interested in both optimisation results and algorithmic problems. We first introduce an equitable version of the neighbour-sum-distinguishing edge-weightings, that is a variant where we require every edge weight to be used the same number of times up to a difference of 1. After that we explore how neighbour-sum-distinguishing weightings behave if we require sums of neighbouring vertices to differ by at least 2. Namely, we present results on the smallest maximal weight needed to construct such weightings for some classes of graphs, and study some algorithmic aspects of this problem. Due to the links between neighbour-sum-distinguishing edge weightings and locally irregular decompositions, we also explore the locally irregular index of subcubic graphs, along with other variants of the locally irregular decomposition problem. Finally, we present a more general work toward a general theory unifying neighbour-sum-distinguishing edge-weightings and locally irregular decompositions.

Minorities in Networks

Claudia Wagner

Lundi 28 janvier 2018, 11hrs, salle 24-25/405, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

Networks are the infrastructure of our social and professional life andalso of modern information systems where billions of documents andentities are interlinked. However, not all nodes are equal in thesenetworks. Often we observe attributes (e.g. gender or ethnicity) thatdefine the group membership of a node. In this talk I will explore therole of minorities in social networks and information networks, provideempirical evidence for the disadvantage of minorities and discussfactors that may place minorities at a disadvantage.

What graphs can contribute to a more transparent artificial intelligence

Tiphaine Viard

January 17th 2019, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

AI and machine learning are commonly described as « black boxes » that are efficient, but opaque. While complete opacity would be an exageration, it is true that many methods for explainability rely on forms of retro-engineering: we try to infer the model from its (partial, intermediary, final) results. These methods are typically based on large-scale, efficient matrix manipulation. Graphs and their extensions have shown to be visualisable and interpretable, even at large scales. In their classical formulation, they are also very similar to matrices. However, few to no machine learning method explored what graphs could contribute to its models.  This is partly due to the fact that graph computations have long been expensive, typically having polynomial running times, which is incompatible with the scale of data in most of today’s machine learning applications. However, the situation has changed: (i) the impact of AI on society makes it no longer acceptable to favour efficiency despite transparency, and (ii) recent advances in algorithmic methods on graphs demonstrates that due to the nature of real-world graphs, even some NP-hard problems become tractable. The aim of this talk is to explore this avenue of research. We will discuss the state-of-the art in learning from graph data, present some recent results showing that structure-based features indeed have the potential to make machine learning more transparent at no extra cost, and finally we will discuss future tracks of research.

Mining the Integrated Connectedness of Biomedical Systems

Prof. Dr. Natasa Przulj

December 7th 2018, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

We are faced with a flood of molecular and clinical data. Various bio-molecules interact in a cell to perform biological function, forming large, complex systems. Large-scale patient-specific omics datasets are increasingly becoming available, providing heterogeneous, but complementary information about cells, tissues and diseases. The challenge is how to mine these interacting, complex, complementary data systems to answer fundamental biological and medical questions.  Dealing with them is nontrivial, because many questions we ask to answer from them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions. We develop methods for extracting new biomedical knowledge from the wiring patterns of systems-level, heterogeneous, networked biomedical data.  Our methods link the patterns in molecular networks and the multi-scale network organization with biological function.  In this way, we translate the information hidden in the wiring patterns into domain-specific knowledge. In addition, we introduce a versatile data fusion (integration) framework that can effectively integrate the information obtained from mining molecular networks with patient-specific somatic mutation data and drug chemical data to address key challenges in precision medicine: better stratification of patients, prediction of driver genes in cancer, and re-purposing of approved drugs to particular patients and patient groups. Our new methods stem from novel network science approaches coupled with graph-regularized non-negative matrix tri-factorization, a machine learning technique for dimensionality reduction and co-clustering of heterogeneous datasets. We utilize our new framework to develop methodologies for performing other related tasks, including disease re-classification from modern, heterogeneous molecular level data, inferring new Gene Ontology relationships, and aligning multiple molecular networks.

From a static to a dynamic analysis of complex networks (soutenance HDR)

Lionel Tabourier

September 24th 2018, 11am, room 25-26/105, Jussieu

Slides

Contacts entre individus, interactions sociales, transactions économiques ou encore machines échangeant des paquets d’information, tous ces systèmes ont en commun d’être constitués d’éléments en interaction et dépourvus de coordination par un « cerveau central ». Par conséquent, la structure de leurs interactions résultent de processus décentralisés, qui sont souvent mal connus. Depuis les années 90, il a été mis en évidence que la représentation en graphe de tels systèmes amenait à la découverte de propriétés communes, cela a permis l’utilisation de méthodes transverses pour les décrire et en comprendre les mécanismes sous-jacents. Ces études ont ensuite évolué pour constituer un champ de recherche à part entière : l’analyse de réseaux complexes. Parce qu’elles sont simples et qu’il existe un important volume de connaissance en théorie et en algorithmique de graphes, les représentations en graphes de tels systèmes en interaction ont mené à d’importants succès. Cependant, l’accès généralisé à des jeux de données en ligne a également mis en évidence la nécessité de prendre en compte l’aspect fondamentalement dynamique des données d’interaction. Mon travail de recherche touche à plusieurs aspects de l’évolution d’une représentation statique à une représentation dynamique de telles données. Celui-ci est organisé en trois axes distincts : le premier concerne la description de processus dynamiques sur des réseaux évoluant dans le temps, et plus précisément les phénomènes de diffusion. Le second axe se rapporte au problème de la prévision d’interactions dans un réseau temporel. Enfin, le troisième s’interroge sur la modélisation de la structure des interactions au moyen de réseaux aléatoires qui imitent la structure des données réelles.

Two ways you did not know mobile networks could be useful

Marco Fiore

Monday, September 24th 2018,  4pm, room 24-25/405

Slides

Mobile networks provide support to a variety of communication-based services that are steadily changing our lives. However, they are also pervasive infrastructures that can be used in unconventional ways unrelated to communication. Specifically, mobile networks can be seen as large-scale remote sensing platform capable of providing fine-grained information about a large (and increasing) fraction of the worldwide population. In this talk, I will discuss two case studies of remote sensing based on mobile networks: land use mapmaking, i.e., the detection of arrangements and activities in a target geographical region, and population density estimation, i.e., the monitoring of dwelling units and people presence. Solutions to both these problems have important applications in, e.g., urban planning and transportations, and can benefit from approaches that take advantage of the mobile network infrastructure.

Examining Supreme Court Networks to Understand its Operation

Patrick Doreian

Friday, June 22th 2018, 14h, salle 26-00/101

The nature of the Supreme Court, its decisions, the principle of Stare Decisis are described. This is followed by a listing of the Supreme Court networks that are considered. This includes: the citation network of decisions citing earlier decisions (1789-2001); coherent clusters of decisions obtained by examining the extent to which they are co-cited by later decisions; signed two-mode networks featuring Justices and their votes for annual terms of the Court; one-mode signed networks of justices with the extent to which they vote with or against each other; and signed networks of Courts overturning decisions of prior Courts or their own decisions. Strategies for analyzing these networks are presented along with the subsequent results. Throughout, the network results are linked to substantive topics and constitutional principles. A long-term research agenda is presented.

Hierarchical Clustering Beyond the Worst-Case

Vincent Cohen-Addad

Wednesday, May 16th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, Dasgupta recently proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective. In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic block model (HSBM), and show that in certain regimes the SVD approach of McSherry combined with specific linkage methods results in a clustering that give an O(1) approximation to Dasguptas cost function. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.

Applications of Graph Sampling

Shweta Jain

Thursday, May 3th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Large graphs have become commonplace making many of the traditional graph-theoretic algorithms infeasible. Moreover, sometimes we don’t have access to the whole graph. This has led us to revisit graph algorithms and necessitated graph sampling. In this talk, I will explore 2 applications of graph sampling. In the first application, called TurnShadow, we propose a method for efficient counting of k-cliques. It uses the classic result called Turn’s theorem to give provable error bounds. We also do extensive evaluation of the method on real-world graph instances and demonstrate that it is fast and extremely accurate. In the second application, we propose a method called SADDLES to estimate the degree distribution when we are given only limited access to the graph, and accessing the graph is costly. We assume we have access to uniform at random (u.a.r ) vertex, u.a.r neighbor of a vertex and the degree of the vertex. We compare SADDLES with many other state-of-the-art methods and demonstrate that SADDLES requires far fewer samples to achieve the same degree of accuracy.    

Story Forest: Organizing Massive News Documents via AI and Natural Language Processing

Di Niu

Friday, May 4th, 2018, 11h, salle 26-00/428, Campus Jussieu

Slides

I will describe our recent experience of implementing a news content organization system in collaboration with Tencent that can discover hot events from vast streams of breaking news and connect events into stories for easy viewing. Our real-world system has distinct requirements in contrast to previous studies on document topic modeling and detection, in that 1) an event does not only contain articles of a similar topic, but is a cluster of documents that report exactly the same physical incidence; 2) we must evolve news stories in a logical and online manner. In solving these challenges, we propose Story Forest, a state-of-the-art news content organization system based on artificial intelligence and natural language processing. I will briefly describe the key enabling technologies in Story Forest, including identifying the relationship between text objects, e.g., whether they talk about the same event or whether one article is a follow-up of another, based on deep learning. Our system has been deployed in Tencent QQ Browser mobile app.

When Diversity Is Needed… But Not Expected!

Sylvain Castagnos

Thursday, May 17th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Recent studies have highlighted the correlation between users’ satisfaction and diversity within recommenders, especially the fact that diversity increases users’ confidence when choosing an item. Understanding the reasons of this positive impact on recommenders is now becoming crucial. Based on this assumption, we designed a user study that focuses on the utility of this new dimension, as well as its perceived qualities. This study has been conducted on 250 users and it compared 5 recommendation approaches, based on collaborative filtering, content-based filtering and popularity, along with various degrees of diversity. Results show that, when recommendations are made explicit, diversity may reduce users’ acceptance rate. However, it helps increasing users’ satisfaction. Moreover, this study highlights the need to build users’ preference models that are diverse enough, so as to generate good recommendations.

Using network analysis to identify generic legal trends in European Human Rights Law

Henrik Palmer Olsen

Jeudi 16 Novembre 2017, 14h, salle 26-00/332, Campus Jussieu

This presentation explores how the complete collected set of judgments from the European Court of Human Rights can be presented as a network of case to references, and how this network can be analysed via the use of already known network analysis approaches. The presentation will first show how network analysis can be used to calculate the main legal content of a case (i.e. which legal right the case is mostly concerned with and for which the case is therefore a precedent). It will then move on to showing how, Page-rank of a case in the total network of cases can be compared to the Page-rank of the same case in its local network and how this comparison can be used to further calculate generic centrality in the overall network. In using this novel approach, it is possible to identify cases that have strong precedent, not for a specific right, but for more generic legal content that the Court uses in dealing with applications from citizens.

Gendarmes, Voleurs et Topologie algébrique

David Ellison

Jeudi 15 Juin 2017 à 11h, Salle 24-25/405, Campus Jussieu

Le jeu du gendarme et du voleur, introduit par Alain Quilliot dans sa thèse en 1978, est un jeu à deux joueurs sur un graphe. Le gendarme commence en choisissant son point de départ sur un sommet du graphe ; puis le voleur choisit le sien. Ensuite, ils se déplacent chacun leur tour le long des arêtes du graphe. La question est de savoir si le gendarme a une stratégie qui lui permet d’attraper le voleur. Dans le cas contraire, la question devient : combien faut-il de gendarmes pour attraper le voleur ? Quilliot a démontré dans sa thèse qu’un seul gendarme suffit à attraper le voleur si et seulement si le graphe est démontable, c’est-à-dire si et seulement si on peut le réduire à un seul sommet en retirant successivement des sommets où le voleur peut être coincé. Il s’ensuit que les graphes démontables correspondent à la classe d’homotopie du point, et que certains invariants homotopiques, comme les groupes d’homologie, permettent de découvrir des propriétés structurelles des graphes où le gendarme peut attraper le voleur.