LNCS, proceedings of the 3-rd international conference Web-Age Information Management WAIM'02, 2002, Beijing, Chine. Abstract published in the proceedings of the 11-th international conference World Wide Web WWW'02, 2002, Honolulu, Hawaï
In this paper, we propose a set of simple and efficient methods based on standard, free and widely available tools, to store and manipulate large sets of URLs and large parts of the Web graph. Our aim is both to store efficiently the URLs list and the graph in order to manage all the computations in a computer central memory. We also want to make the conversion between URLs and their identifiers as fast as possible, and to obtain all the successors of an URL in the Web graph efficiently. The methods we propose make it possible to obtain a good compromise between these two challenges, and make it possible to manipulate large parts of the Web graph.