**Data Documentation:** IEEE International Conference on Bioinformaticsand Biomedicine (BIBM 2011)

**VASP-S: A Volumetric Analysis and Statistical Model for Predicting Steric Influences on Protein-Ligand Binding Specificity **

*Brian Y. Chen, Soutir Bandyopadhyay*

**Supplimental Data**

This supplimental material is supplied to provide the details of the data described in the paper.

**SUPPLEMENT 1: Log-Normal Distributions are better approximations of the distribution of fragment volumes than other distributions**

The following histograms indicate the distribution of fragment volumes between pairs of cavities derived from structures in the trypsin (left) and enolase (right) subfamilies. Fragments with volumes near zero dominated, though both distributions exhibited a heavy positive tail.

The following two graphs superpose a histogram of the log of fragment volumes versus the best fitting gaussian distribution. The log of fragment volumes were highly gaussian, indicating that the fragment volumes themselves fit a log-normal distribution.

The following graphs plot quantiles of the distribution of Observed Trypsin fragment volume versus Quantiles of theoretical best fitting distributions: The straight diagonal line indicates the x=y line. If the theoretical function fits the sample data perfectly, the quantile-quantile plot would exactly follow the diagonal line. Since the fit is imperfect, and since biological data exhibits some noise, we can evaluate how good the fit is based on how closely the plots adhere to the diagonal. Comparing all distributions, it is clear that the lognormal distribution is the best fit of the functions considered.

The theoretical distributions considered were the Gamma distribution, the Generalized Extreme Value distribution, the Log-normal distribution (gaussian), the Pareto distribution and the Weibull distribution.

Quantiles of the distribution of Observed Trypsin fragment volume versus Quantiles of theoretical best fitting distributions:

Quantiles of the distribution of Observed Enolase fragment volume versus Quantiles of theoretical best fitting distributions: