Assessing the Impact of Sparsification on LSI Performance

Full Paper (6 pages)
Postscript (616KB) PDF (254KB)
April Kontostathis, William M. Pottenger, and Brian D. Davison

Abstract
We describe an approach to information retrieval using Latent Semantic Indexing (LSI) that directly manipulates the values in the Singular Value Decomposition (SVD) matrices. We convert the dense term by dimension matrix into a sparse matrix by removing a fixed percentage of the values. We present retrieval and runtime performance results, using seven collections, which show that using this technique to remove up 70% of the values in the term by dimension matrix results in similar or improved retrieval performance (as compared to LSI), while reducing memory requirements and query response time. Removal of 90% of the values results in significantly reduced memory requirements and dramatic improvements in query response time. Removal of 90\% of the values degrades retrieval performance slightly for smaller collections, but improves retrieval performance by 60% on the large collection we tested.

In Proceedings of The Grace Hopper Celebration of Women in Computing, Chicago, October 2004.

Back to Brian Davison's publications


Last modified: 2 August 2004
Brian D. Davison