WWW Search Engines Fall 2004 Sample student-submitted questions for exam #2. =============================================== [Questions marked with an * are considered good ones by the instructor.] 1.* q.) What is link nepotism and why is it a problem? 2. q.) Explain the primary difference between Hubs and Authorities in the context of link analysis. 3. What is the "random surfer," in what analysis technique was it introduced, and why is it an important factor? 4.* Name two downsides to the PageRank algorithm. 5.* Topic drift problem means that the most highly ranked authorities and hubs tend not to be about the original topic. What causes this problem? What do you suggest to solve this problem? 6.* Describe the HITS algorithm in pseudocode. 7.* Explain the Rank-and-File algorithm. Suggest some modification to the procedure and mention their implications. 8. What are the three manifestations of authoring idioms? 9.* Give 2 advantages and disadvantages of HITS when compared to PageRank 10.* What is link nepotism and why is it a problem on the web? 11.* Define co-citation and bibliographic coupling. 12. What is the difference between a web link graph and a citation graph? 13.* Explain the multi-host nepotistic problem of "Clique Attacks". Why do HITS and PageRank fall prey to this, and how can they mitigate the problem? 14.* In the paper, "What's New on the Web? The Evolution of the Web from a Search Engine Perspective" the authors focus on Link-structure evolution. Briefly explain what this is, as well as why it is important for search engines. 15.* What is the key difference between PageRank and HITS, discuss their good points and weak points. 16. What is the graph model of HITS? What is the problem in such graph model, give one solution to address it? 17.* Describe the system of Bharat and Henzinger 18.* Describe Davison's approach to nepotistic links 19.* Suggest 3 different improvements to the original Google PageRank algorithm 20. What is an eigenvector, as it applies to our subject? 21.* Give two examples of differences between PageRank and HITS 22.* When can apparent link nepotism be unintensional, and what are ways to avoid overvaluing such nepotistic links?