Search engine resources
Useful links
- CSE 345/445 WWW Search
Engines was previously taught in
Spring
2007 and Fall
2004.
- Some early history
of WWW search engines.
- A
chapter on text analysis is available from the book, Modeling
the Internet and the Web, by Baldi, Frasconi, and Smyth (Wiley,
2003).
- Learn about Zipf's Law.
- Get math definitions.
- Soundex
used in census records of the national archives
- Soundex history and
improvements
- Check out an online
matrix reference manual.
- A
simple Java demo graphing eigenvectors
- Some more formal discussion of eigenvectors can be found
here and
here,
or most any textbook on linear algebra.
- Sample eigenvector code (power method) from Dr. Dobbs Journal.
- Information and specifications for building crawlers that obey robots.txt. Here's another useful page.
- The Cornell
SMART stop list
- The Porter
stemmer
- The WARC,
Web ARChive file format
- An excellent tutorial
on Zipf, Power-laws, and Pareto
- A new IR textbook: Search Engines:
Information Retrieval in Practice, by Croft, Metzer, and Strohman.
- A new web graph book: A Course on
the Web Graph, by Anthony Bonato
This page is
http://www.cse.lehigh.edu/~brian/course/2009/searchengines/resources.html
Last revised: 1 February 2009.