Replication Strategies in P2P networks
Kiran Komaravolu

Peer to Peer networks may be classified into centralized and decentralized systems. Centralized systems such as Napster have a central directory to which users can submit queries. The more popular ones are decentralised systems where the nodes form an ad-hoc overlay network.

Decentralized systems may be furthur classified into structured and unstructured systems. Structured systems, such as CAN, CHORD etc., have a close coupling between the P2P topology and location of data. The advantage of these systems is that searches are more accurate and faster. But it is very costly to build a system these systems, and since the structure of the ad-hoc networks changes rapidly, it may not be that feasible to build these systems.

Unstructured systems include Gnutella, Kazaa, Morpheus etc. where the topology of the system is not related to the location of the data. These systems are very easy to build. But searches on these systems are not very accurate. Searching in unstructured systems is blind, a node not containing any data related to a query may receive the query. Gnutella uses a flodding algorithm to search. To improve searching we must reduce the number of nodes that need to be probed before the query is resolved. FastTrak based clients (Kazaa, Morphues) designate high bandwidth clients and replicate the index of other nodes, as a result a single probe emulates the behavior of probing multiple nodes.

The authors suggest three ways of replicating these indices: Uniform, Proportional and Square- root replication. In uniform replication, everything is replicated equally, wheras in proportional replication, more popular queries are more replicated.

For proportional Replication: Pi = CiQi / ? CjQj
Uniform Replication : Pi = Ci/ ?Cj

Square Root Replication: Pi = Ci ?Qi / ? Cj?Qj

Pi = Resources allocated to item i
Ci = Size of item i.
Qi = Query rate for item i.

Uniform, and Proportional are two extremes of a large family of search algorithms. Surprisingly both have almost similar search sizes for successful queries. Proportional allocation makes it easier to find popular items, but overall its search size is almost the same as uniform. Square root allocation which falling between uniform and proportional strategies, minimizes expected search sizes of successful queries. The article was a bit complex but with simple results. The authors have shown a simple model for replication. It would be interesting to know which of these (or a different one) do the real world systems use (the authors mention they use implicit strategies). Overall it was pretty good paper.