David Manura

2002-01-16

CSE-498 (Adv. Networks)

 

Free Riding on Gnutella—A Review

 

Although the paper1 makes some positive contributions, it is not recommended for publication.  Indeed, the paper provides clear and quantitative evidence of uneven file sharing in a Gnutella network: 70% of Gnutella users share no files, and nearly 50% of all observed responses are returned by the top 1% of sharing hosts.  However, statistical methods used to argue about the distribution of free-riding by domain could be flawed, various conclusions are weakly supported, and the rationale for the paper—that the observed level of free-riding could cause Gnutella to “collapse”—is wholly unsubstantiated by experiment or argument.

 

In summary, the authors monitor Gnutella messages passing through a single peer in a Gnutella network to determine the distribution of total number of files shared per peer and the distribution of queries responded to per peer.  A small percentage of peers are found to share a large percentage of the files, and a small percentage of peers are found to generate a large percentage of the responses to queries.  These variables are also measured by domain, and the high correlation coefficients on graphs of total shared files and total queries responded vs. peers in each domain are used to argue that “free-riders” (peers that share few or unpopular files) are distributed fairly uniformly across domains.

 

Why is it useful to know if there is an even distribution of peers across domains?  Further, it has not been shown that there is an even distribution of sharing/responding across domains.  An attempt only has been made to show that the distribution is even across domains grouped by the same number of peers.

 

Experiment and hypothesis should be more cleanly separated into separate sections.  The experimental section (3) should avoid debatable statements, interpretation of results, and speculation. 

 

The interpretation of the reported correlation coefficients may be seriously flawed.  One problem is that Figures 3a-4b each have most points closely distributed (somewhat randomly) around (0, 0), but each has one or two additional outliers.  Irrespective of the exact location of these outliers, that fact that these few points are outliers causes the R2 value to increase.  To consider an extreme example, if a single outlier existed at (N, 0) on each of the plots, R2 would approach 1 as N approached infinity.  The deviations on Figure 3b are not acknowledged and are obscured by the outlier.

 

In essence, the reader would like to know how the ratio of files shared or query responses to peer count would vary with peer count.  Obviously, the confidence interval on this ratio would be smaller for the larger domains, but this difference in certainty is not visible with the choice of display format.

 

It may be that 70% of Gnutella users share no files, but what is the impact of this?  What if 70% of Gnutella users ran few or no queries?  If web browsers and web servers were viewed as peers, then a large fraction of those peers (i.e. the web browsers) would be seen as sharing no files, yet the web is still successful. 

 

How are results affected when monitoring at one or a few network locations rather than globally? 

 

“Quality” should be defined.  It is stated that Figure 5 indicates little relationship between quantity and quality.  In fact, Figure 5 shows that the peers that share a lot of files have a less-than proportional number of queries responses.  That is, the hit rate per file (which is one measure of quality) is negatively correlated with the number of files shared.  A large collection of random files would be expected to result in an increases number of coincidental hits; that is, hits that are not valuable to the person who ran the query.

 

These are just some of the problems.  To be accepted, the paper should be placed on a firmer footing.  In particular, a justification for the value of the research—why the measured disparity in file sharing will degrade system performance to unusable levels—should be provided, as well as stronger justification for the conclusions based on the experimental evidence.

 

[1] Eytan Adar and Bernardo A. Huberman.  “Free Riding on Gnutella.”  Internet Exologies Area, Xerox Palo Alto Research Center.  http://citeseer.nj.nec.com/adar00free.html