Neighbour-distinguishing decompositions of graphs

Mohammed SENHAJI

Vendredi 15 mars 2019, 14hrs, salle 25-26/105, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

The main question that we explore was introduced by Karonski, Luczak and Thomason in 2004 : Can we weight the edges of a graph G , with weights 1 ,2 , and 3 , such that any two of adjacent vertices of G are distinguished by the sum of their incident weights ? This question later becomes the famous 1-2-3 Conjecture.In this presentation we explore several variants of the 1-2-3 Conjecture, and their links with locally irregular decompositions. We are interested in both optimisation results and algorithmic problems. We first introduce an equitable version of the neighbour-sum-distinguishing edge-weightings, that is a variant where we require every edge weight to be used the same number of times up to a difference of 1. After that we explore how neighbour-sum-distinguishing weightings behave if we require sums of neighbouring vertices to differ by at least 2. Namely, we present results on the smallest maximal weight needed to construct such weightings for some classes of graphs, and study some algorithmic aspects of this problem. Due to the links between neighbour-sum-distinguishing edge weightings and locally irregular decompositions, we also explore the locally irregular index of subcubic graphs, along with other variants of the locally irregular decomposition problem. Finally, we present a more general work toward a general theory unifying neighbour-sum-distinguishing edge-weightings and locally irregular decompositions.

Minorities in Networks

Claudia Wagner

Lundi 28 janvier 2018, 11hrs, salle 24-25/405, LIP6, UPMC. 4 Place Jussieu, 75005, Paris.

Networks are the infrastructure of our social and professional life andalso of modern information systems where billions of documents andentities are interlinked. However, not all nodes are equal in thesenetworks. Often we observe attributes (e.g. gender or ethnicity) thatdefine the group membership of a node. In this talk I will explore therole of minorities in social networks and information networks, provideempirical evidence for the disadvantage of minorities and discussfactors that may place minorities at a disadvantage.

What graphs can contribute to a more transparent artificial intelligence

Tiphaine Viard

January 17th 2019, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

AI and machine learning are commonly described as « black boxes » that are efficient, but opaque. While complete opacity would be an exageration, it is true that many methods for explainability rely on forms of retro-engineering: we try to infer the model from its (partial, intermediary, final) results. These methods are typically based on large-scale, efficient matrix manipulation. Graphs and their extensions have shown to be visualisable and interpretable, even at large scales. In their classical formulation, they are also very similar to matrices. However, few to no machine learning method explored what graphs could contribute to its models.  This is partly due to the fact that graph computations have long been expensive, typically having polynomial running times, which is incompatible with the scale of data in most of today’s machine learning applications. However, the situation has changed: (i) the impact of AI on society makes it no longer acceptable to favour efficiency despite transparency, and (ii) recent advances in algorithmic methods on graphs demonstrates that due to the nature of real-world graphs, even some NP-hard problems become tractable. The aim of this talk is to explore this avenue of research. We will discuss the state-of-the art in learning from graph data, present some recent results showing that structure-based features indeed have the potential to make machine learning more transparent at no extra cost, and finally we will discuss future tracks of research.

Analyzing the dynamics of information diffusion in social networks

Ayan Kumar Bhomwick

Wednesday 19th 2018

Slides

No abstract available.

Easy-Mention: a model-driven mention recommendation heuristic to boost your tweet popularity

Soumajit Pramanik, Mohit Sharma, Maximilien Danisch, Qinna Wang, Jean‑Loup Guillaume, Bivas Mitra

International Journal of Data Science and Analytics, vol. 7 (2), 2018

This paper investigates the role of mentions on tweet propagation. We propose a novel tweet propagation model SIR MF based on a multiplex network framework which allows to analyze the effects of mentioning on final retweet count. The basic bricks of this model are supported by a comprehensive study of multiple real datasets, and simulations of the model show a nice agreement with the empirically observed tweet popularity. Studies and experiments also reveal that follower count, retweet rate and profile similarity are important factors for gaining tweet popularity and allow to better understand the impact of the mention strategies on the retweet count. Interestingly, we experimentally identify a critical retweet rate regulating the role of mention on the tweet popularity. Finally, our data-driven simulations demonstrate that the proposed mention recommendation heuristic Easy-Mention outperforms the benchmark Whom-To-Mention algorithm.

Download

A Modular Overlapping Community Detection Algorithm: Investigating the « From Local to Global » Approach

Maximilien Danisch, Noé Gaumont, Jean‑Loup Guillaume

16th Cologne-Twente Workshop on Graphs and Combinatorial Optimization, 2018

We propose an overlapping community detection algorithm following a “from local to global approach”: our algorithm finds local communities one by one by repetitively optimizing a quality function that measures the quality of a community. Then, as some extracted local communities can be very similar to each-other, a cleaning procedure is applied to obtain the global overlapping community structure. Our algorithm depends on three modules: (i) a quality function, (ii) an optimization heuristic and (iii) a cleaning procedure. Various such modules can be independently plugged in. We show that, using default modules, our algorithm improves over a state-of-the-art method on some real-world graphs with ground truth communities. In the future we would like to study which combination of modules performs best in practice and make our code parallel.

Download

Pattern Matching in Link Streams: a Token-based Approach

Clément Bertrand, Hanna Klaudel, Matthieu Latapy et Frédéric Peschanski

Petri Nets, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many ap- plications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algo- rithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical per- formances.

Listing k-cliques in Sparse Real-World Graphs

Maximilien Danisch, Oana Balalau and Mauro Sozio

WWW, 2018

Motivated by recent studies in the data mining community whichrequire to efficiently list allk-cliques, we revisit the iconic algorithmof Chiba and Nishizeki and develop the most efficient parallel algo-rithm for such a problem. Our theoretical analysis provides the bestasymptotic upper bound on the running time of our algorithm forthe case when the input graph is sparse. Our experimental evalua-tion on large real-world graphs shows that our parallel algorithm isfaster than state-of-the-art algorithms, while boasting an excellentdegree of parallelism. In particular, we are able to list allk-cliques(for anyk) in graphs containing up to tens of millions of edges aswell as all10-cliques in graphs containing billions of edges, withina few minutes and a few hours respectively. Finally, we show howour algorithm can be employed as an effective subroutine for find-ing thek-clique core decomposition and an approximatek-clique densest subgraphs in very large real-world graphs.

La sainte famille des Cahiers du cinéma

Olivier Alexandre

Vrin, Philosophie et cinéma, 2018

Plus célèbre revue de cinéma au monde, les « Cahiers » occupent une place singulière dans le domaine de la critique. De crises en renaissances, ils continuent d’incarner un passé élevé au rang de mythe. Leur capacité à marier les contraires, entre gloire et marginalité, sens aigu de l’histoire et rendezvous manqués, révèle la part tragique du critique : ce travailleur sans métier, auteur sans profession, ni cinéaste ni enseignant, pas tout à fait journaliste ni complétement écrivain. À partir d’une enquête auprès de collaborateurs passés par les Cahiers du cinéma au cours des 50 dernières années, ce livre propose une réponse à cette question laissée en suspens depuis leur fondation : qu’est-ce qu’un critique?

L’analyse des opinions politiques sur Twitter : Défis et opportunités d’une approche multi-échelle

Marta Severo and Robin Lamarche-Perrin

Revue française de sociologie: Big Data, Sociétés et Sciences Sociales, 2018

Des blogs et forums aux pages Facebook et comptes Twitter, le récent déluge des données numériques du web a fortement affecté la recherche en sciences sociales. Cette nouvelle catégorie d’information, utile à l’extraction des opinions politiques, se présente comme une alternative aux techniques traditionnelles telles que les sondages. Premièrement, en réalisant un état de l’art des études de l’opinion s’appuyant sur les données Twitter, cet article vise à mettre en relation les méthodes d’analyse utilisées dans ces études et les définitions de l’opinion politique qui y sont suggérées. Deuxièmement, cet article étudie la faisabilité de réaliser des analyses multi-échelles en sciences sociales concernant l’étude de l’opinion politique en exposant les mérites de plusieurs méthodes, allant des méthodes orientées contenus aux méthodes orientées interactions, de l’analyse statistique à l’analyse sémantique, des approches supervisées aux approches non supervisées. Le résultat de notre démarche est ainsi d’identifier les tendances futures de la recherche en sciences sociales concernant l?étude de l’opinion politique.

Caractériser lanalogie entre automates cellulaires déterministes et systèmes physiques

Lionel Tabourier

Lato Sensu, Revue de la Société de Philosophie des Sciences, Vol.5, n°2, 2018

La classification de Wolfram des automates cellulaires déterministes repose sur l’analogie entre le comportement dynamique des automates et celui de systèmes physiques au cours d’une transition de phase. Pour évaluer la valeur scientifique de la classification, longuement débattue, on doit s’interroger sur les caractéristiques de cette analogie. Nous établissons ici quels éléments, présents dans les transitions de phase, n’ont pas d’équivalent dans le domaine des automates. Ensuite, nous discutons la notion de potentiel d’une analogie en la comparant à deux autres exemples de la littérature.

Centrality metrics in dynamic networks: a comparison study

Marwan Ghanem, Clémence Magnien and Fabien Tarissan

IEEE Transactions on network science and engineering, 2018

For a long time, researchers have worked on defining different metrics able to characterize the importance of nodes in static networks. Recently, researchers have introduced extensions that consider the dynamics of networks. These extensions study the time-evolution of the importance of nodes, which is an important question that has yet received little attention in the context of temporal networks. They follow different approaches for evaluating a node’s importance at a given time and the value of each approach remains difficult to assess. In order to study this question more in depth, we compare in this paper a method we recently introduced to three other existing methods. We use several datasets of different nature, and show and explain how these methods capture different notions of importance. We also show that in some cases it might be meaningless to try to identify nodes that are globally important. Finally, we highlight the role of inactive nodes, that still can be important as a relay for future communications.

Download

Quantifying the diversity in users activity: an example study on online music platforms

Poulain Rémy and Tarissan Fabien

SNAMS, 2018

Whether it be through a problematic related to information ranking (e.g. search engines) or content recommendation (on social networks for instance), algorithms are at the core of processes selecting which information is made visible. Those algorithmic choices have in turn a strong impact on users activity and therefore on their access to information. This raises the question of measuring the quality of the choices made by algorithms and their impact on the users. As a first step into that direction, this paper presents a framework to analyze the diversity of the information accessed by the users. By depicting the activity of the users as a tripartite graph mapping users to products and products to categories, we analyze how categories catch users attention and in particular how this attention is distributed. We then propose the \emph{(calibrated) herfindahl diversity} score as a metric quantifying the extent to which this distribution is diverse and representative of the existing categories. In order to validate this approach, we study a dataset recording the activity of users on online music platforms. We show that our score enables to discriminate between very specific categories that capture dense and coherent sub-groups of listeners, and more generic categories that are distributed on a wider range of users. Besides, we highlight the effect of the volume of listening on users attention and reveal a saturation effect above a certain threshold.

Download

Mining the Integrated Connectedness of Biomedical Systems

Prof. Dr. Natasa Przulj

December 7th 2018, 14:00. Salle 24-25/405, LIP6 – UMPC, Sorbonne Université. 4 Place Jussieu, 75005 Paris.

We are faced with a flood of molecular and clinical data. Various bio-molecules interact in a cell to perform biological function, forming large, complex systems. Large-scale patient-specific omics datasets are increasingly becoming available, providing heterogeneous, but complementary information about cells, tissues and diseases. The challenge is how to mine these interacting, complex, complementary data systems to answer fundamental biological and medical questions.  Dealing with them is nontrivial, because many questions we ask to answer from them fall into the category of computationally intractable problems, necessitating the development of heuristic methods for finding approximate solutions. We develop methods for extracting new biomedical knowledge from the wiring patterns of systems-level, heterogeneous, networked biomedical data.  Our methods link the patterns in molecular networks and the multi-scale network organization with biological function.  In this way, we translate the information hidden in the wiring patterns into domain-specific knowledge. In addition, we introduce a versatile data fusion (integration) framework that can effectively integrate the information obtained from mining molecular networks with patient-specific somatic mutation data and drug chemical data to address key challenges in precision medicine: better stratification of patients, prediction of driver genes in cancer, and re-purposing of approved drugs to particular patients and patient groups. Our new methods stem from novel network science approaches coupled with graph-regularized non-negative matrix tri-factorization, a machine learning technique for dimensionality reduction and co-clustering of heterogeneous datasets. We utilize our new framework to develop methodologies for performing other related tasks, including disease re-classification from modern, heterogeneous molecular level data, inferring new Gene Ontology relationships, and aligning multiple molecular networks.

From a static to a dynamic analysis of complex networks (soutenance HDR)

Lionel Tabourier

September 24th 2018, 11am, room 25-26/105, Jussieu

Slides

Contacts entre individus, interactions sociales, transactions économiques ou encore machines échangeant des paquets d’information, tous ces systèmes ont en commun d’être constitués d’éléments en interaction et dépourvus de coordination par un « cerveau central ». Par conséquent, la structure de leurs interactions résultent de processus décentralisés, qui sont souvent mal connus. Depuis les années 90, il a été mis en évidence que la représentation en graphe de tels systèmes amenait à la découverte de propriétés communes, cela a permis l’utilisation de méthodes transverses pour les décrire et en comprendre les mécanismes sous-jacents. Ces études ont ensuite évolué pour constituer un champ de recherche à part entière : l’analyse de réseaux complexes. Parce qu’elles sont simples et qu’il existe un important volume de connaissance en théorie et en algorithmique de graphes, les représentations en graphes de tels systèmes en interaction ont mené à d’importants succès. Cependant, l’accès généralisé à des jeux de données en ligne a également mis en évidence la nécessité de prendre en compte l’aspect fondamentalement dynamique des données d’interaction. Mon travail de recherche touche à plusieurs aspects de l’évolution d’une représentation statique à une représentation dynamique de telles données. Celui-ci est organisé en trois axes distincts : le premier concerne la description de processus dynamiques sur des réseaux évoluant dans le temps, et plus précisément les phénomènes de diffusion. Le second axe se rapporte au problème de la prévision d’interactions dans un réseau temporel. Enfin, le troisième s’interroge sur la modélisation de la structure des interactions au moyen de réseaux aléatoires qui imitent la structure des données réelles.

Two ways you did not know mobile networks could be useful

Marco Fiore

Monday, September 24th 2018,  4pm, room 24-25/405

Slides

Mobile networks provide support to a variety of communication-based services that are steadily changing our lives. However, they are also pervasive infrastructures that can be used in unconventional ways unrelated to communication. Specifically, mobile networks can be seen as large-scale remote sensing platform capable of providing fine-grained information about a large (and increasing) fraction of the worldwide population. In this talk, I will discuss two case studies of remote sensing based on mobile networks: land use mapmaking, i.e., the detection of arrangements and activities in a target geographical region, and population density estimation, i.e., the monitoring of dwelling units and people presence. Solutions to both these problems have important applications in, e.g., urban planning and transportations, and can benefit from approaches that take advantage of the mobile network infrastructure.

Pattern Matching in Link Streams: Timed-Automata with Finite Memory

Clément Bertrand, Frédéric Peschanski, Hanna Klaudel and Matthieu Latapy

Scientific Annals of Computer Science, 2018

Link streams model the dynamics of interactions in complex distributed systems as sequences of links (interactions) occurring at a given time. Detecting patterns in such sequences is crucial for many applications but it raises several challenges. In particular, there is no generic approach for the specification and detection of link stream patterns in a way similar to regular expressions and automata for text patterns. To address this, we propose a novel automata framework integrating both timed constraints and finite memory together with a recognition algorithm. The algorithm uses structures similar to tokens in high-level Petri nets and includes non-determinism and concurrency. We illustrate the use of our framework in real-world cases and evaluate its practical performances.

Download

Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Audrey Wilmet, Tiphaine Viard, Matthieu Latapy and Robin Lamarche-Perrin

TMA Conference 2018, Vienna

Precise detection and identification of anomalous events in IP traffic are crucial in many applications. This paper intends to address this task by adopting the link stream formalism which properly captures temporal and structural features of the data. Within this framework we focus on finding anomalous behaviours with the degree of IP addresses over time. Due to diversity in IP profiles, this feature is typically distributed heterogeneously, preventing us to find anomalies. To deal with this challenge, we design a method to detect outliers as well as precisely identify their cause in a sequence of similar heterogeneous distributions. We apply it to a MAWI capture of IP traffic and we show that it succeeds at detecting relevant patterns in terms of anomalous network activity.Degree-based Outliers Detection within IP Traffic Modelled as a Link Stream

Download

Examining Supreme Court Networks to Understand its Operation

Patrick Doreian

Friday, June 22th 2018, 14h, salle 26-00/101

The nature of the Supreme Court, its decisions, the principle of Stare Decisis are described. This is followed by a listing of the Supreme Court networks that are considered. This includes: the citation network of decisions citing earlier decisions (1789-2001); coherent clusters of decisions obtained by examining the extent to which they are co-cited by later decisions; signed two-mode networks featuring Justices and their votes for annual terms of the Court; one-mode signed networks of justices with the extent to which they vote with or against each other; and signed networks of Courts overturning decisions of prior Courts or their own decisions. Strategies for analyzing these networks are presented along with the subsequent results. Throughout, the network results are linked to substantive topics and constitutional principles. A long-term research agenda is presented.

Hierarchical Clustering Beyond the Worst-Case

Vincent Cohen-Addad

Wednesday, May 16th, 2018, 11h, salle 26-00/101, Campus Jussieu

Slides

Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, Dasgupta recently proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective. In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic block model (HSBM), and show that in certain regimes the SVD approach of McSherry combined with specific linkage methods results in a clustering that give an O(1) approximation to Dasguptas cost function. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.