Review of "Improving Search in Peer-to-Peer Networks"
David Deschenes
April 24, 2003
Advanced Topics in Networking

Summary
In this paper [2], Beverly Yang and Hector Garcia-Molina three distributed search algorithms for use in loosely structured peer-to-peer networks such as Gnutella. Each of the algorithms is predicated on the simple goal of reducing the number of hosts that process a given query. The algorithms are compared to existing approaches, and several experiments show that they provide better user experiences while also improving the load on the network.

The three algorithms presented by the authors are: iterative deepening, directed BFS, and local indices. In the iterative deepening approach, multiple breadth-first searches are initiated with successively larger depth limits, until the query is satisfied or a maximum depth has been reached. Additionally, the authors introduce messages which are used to reduce the message processing costs for hosts that might see the query message multiple times. The directed BFS approach, on the other hand, makes use of heuristics to determine the subset of neigbors to which a query should be forwarded. Among the heuristics experimented with by the authors were: select the neighbor that has returned the highest number of results in the past, and select the neighbor that has forwarded the largest number of messages thus far. Finally, when using the local indices approach, hosts in the network replicate the file indices of those hosts within a specified radius. Accordingly, the authors had described the messages that would enable hosts to share their indices.

Evaluations of the three alrogithms were based upon a dataset that was collected from the Gnutella network. The authors focused on determining the load generated by an algorithm (average aggregate bandwidth and average aggregate processing cost) as well as the quality of the user experience (number of results, satisfaction and time to satisfaction). The authors found that:

Review
This paper is very well written and presents some really interesting approaches to solving the scalability problems of broadcast search in unstructred P2P networks. The authors do an especially good job of determining a model for evaluating their algorithms. Unlike [1], in which the authors assume that every query is issued in hopes of finding a single file, the authors here assume that a query may be issued in hopes of find a number of different or similar files.

Unfortunately, the authors present algorithms (except directed BFS) which would individually require changes to the protocol used by the network. Considering the difficulty in convincing numerous vendors to move in such a direction simultaneously, such approaches might not be very useful in practice. Furthermore, I find it interesting that the authors don't look at the caching of results as a way to enable local indices. Doing so might provide insights into how the local indices approach could be accomplised without requiring changes to the network protocol.

Also, it might have been interesting to see how the performances of the algorithms with respect to frequent queries fared when compared to their performances with respect to infrequent queries. If they had done so, the authors might have found that the algorithms biased performance towards frequent queries, result in worse user experiences for those issuing infrequent ones. Balancing the performance of a network with respect to a gradient of query frequencies is something that was addressed in [1].

Finally, the authors suggest that the search algorithms presented might be useful to unstructured networks in general. However, they only provide evaluations with respect to Gnutella. It might behoove the authors to evaluate their algorithms with respect to a number of network topologies and file distributions, much as the authors did in [1].

Citations

[1] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and Replication in Unstructured Peer-to-Peer Networks, Proceedings of ACM SIGMETRICS, 2002.

[2] Beverly Yang and Hector Garcia-Molina. Improving Search in Peer-to-Peer Networks, In Proceedings of ICDCS, July 2002.