Réseaux sociaux égocentriquesActive learning on networks : you must label all nodes. You can ask the true labels of 42 nodes. Choose wisely.

Jean-Baptiste Rouquier

28 janvier 2010


In many networks, vertices have hidden attributes that are correlated with the networks topology. For instance, in social networks, people are more likely to be friends if they are demographically similar. In food webs, predators typically eat prey of lower body mass. We explore a setting in which the networks topology is known, but these attributes are not. If each vertex can be queried, learning the value of its hidden attributes but only at some cost then we need an algorithm which chooses which vertex to query next, in order to learn as much as possible about the attributes of the remaining vertices. We assume that the network is generated by a probabilistic model, but we make no assumptions about the assortativity or disassortativity of the network. We then query the vertex with the largest mutual information between its type and that of the others (a well-known approach in active learning) or with the largest average agreement between two independent samples of the Gibbs distribution which agree on its type. We test these approaches on two networks with known attributes, the Karate Club network and a food web of species in the Weddell Sea. In several cases, we found that the average agreement algorithm performs better than mutual information, and both perform better than simpler heuristics. The algorithms appear to explore the network intelligently, first querying vertices at the centers of communities, and then vertices along the boundaries between communities.