- CSE 345/445 WWW Search Engines was previously taught in Spring 2007 and Fall 2004.
- Some early history of WWW search engines.
- A
chapter on text analysis is available from the book,
*Modeling the Internet and the Web*, by Baldi, Frasconi, and Smyth (Wiley, 2003). - Learn about Zipf's Law.

- Get math definitions.

- Soundex used in census records of the national archives
- Soundex history and improvements
- Check out an online matrix reference manual.
- A simple Java demo graphing eigenvectors
- Some more formal discussion of eigenvectors can be found here and here, or most any textbook on linear algebra.
- Sample eigenvector code (power method) from Dr. Dobbs Journal.
- Information and specifications for building crawlers that obey robots.txt. Here's another useful page.
- The Cornell SMART stop list
- The Porter stemmer
- The WARC, Web ARChive file format
- An excellent tutorial on Zipf, Power-laws, and Pareto
- A new IR textbook: Search Engines: Information Retrieval in Practice, by Croft, Metzer, and Strohman.
- A new web graph book: A Course on the Web Graph, by Anthony Bonato

