Summary
In this paper [2], Beverly Yang and Hector Garcia-Molina three distributed
search algorithms for use in loosely structured peer-to-peer networks such
as
Gnutella. Each of the algorithms is predicated on the simple goal of
reducing
the number of hosts that process a given query. The algorithms are
compared to
existing approaches, and several experiments show that they provide better
user experiences while also improving the load on the network.
The three algorithms presented by the authors are: iterative deepening, directed BFS, and local indices. In the iterative deepening approach, multiple breadth-first searches are initiated with successively larger depth limits, until the query is satisfied or a maximum depth has been reached. Additionally, the authors introduce messages which are used to reduce the message processing costs for hosts that might see the query message multiple times. The directed BFS approach, on the other hand, makes use of heuristics to determine the subset of neigbors to which a query should be forwarded. Among the heuristics experimented with by the authors were: select the neighbor that has returned the highest number of results in the past, and select the neighbor that has forwarded the largest number of messages thus far. Finally, when using the local indices approach, hosts in the network replicate the file indices of those hosts within a specified radius. Accordingly, the authors had described the messages that would enable hosts to share their indices.
Evaluations of the three alrogithms were based upon a dataset that was collected from the Gnutella network. The authors focused on determining the load generated by an algorithm (average aggregate bandwidth and average aggregate processing cost) as well as the quality of the user experience (number of results, satisfaction and time to satisfaction). The authors found that:
2. Using the heuristic that selects the neighbor that has returned the highest number of results in the past produces highest probability of satisfaction for a query under the directed BFS approach. Also, even the most costly heuristics under the directed BFS approach required roughly 73% less processing cost and 65% less bandwidth than did BFS.
3. The reccomended configuration of the local indices approach achieved cost savings of 61% with respect to bandwidth and 49% with respect to processing costs, while requiring a significantly small index size.
Unfortunately, the authors present algorithms (except directed BFS) which would individually require changes to the protocol used by the network. Considering the difficulty in convincing numerous vendors to move in such a direction simultaneously, such approaches might not be very useful in practice. Furthermore, I find it interesting that the authors don't look at the caching of results as a way to enable local indices. Doing so might provide insights into how the local indices approach could be accomplised without requiring changes to the network protocol.
Also, it might have been interesting to see how the performances of the algorithms with respect to frequent queries fared when compared to their performances with respect to infrequent queries. If they had done so, the authors might have found that the algorithms biased performance towards frequent queries, result in worse user experiences for those issuing infrequent ones. Balancing the performance of a network with respect to a gradient of query frequencies is something that was addressed in [1].
Finally, the authors suggest that the search algorithms presented might be useful to unstructured networks in general. However, they only provide evaluations with respect to Gnutella. It might behoove the authors to evaluate their algorithms with respect to a number of network topologies and file distributions, much as the authors did in [1].
Citations
[1] Qin Lv, Pei Cao, Edith Cohen, Kai Li, and Scott Shenker. Search and Replication in Unstructured Peer-to-Peer Networks, Proceedings of ACM SIGMETRICS, 2002.
[2] Beverly Yang and Hector Garcia-Molina. Improving Search in Peer-to-Peer Networks, In Proceedings of ICDCS, July 2002.