Sample student-generated questions for Spring 2007. ========================================================================== 1 Define synonymy and polysemy and give an example of each. 2 List several different types of queries handled by today's search engines. 3 Why is compressing the posting list important? What do you gain from compressing the posting list? 4 What is a recommender system? Give some examples of types of recommender systems and point advantages and disadvantages on them. 5 Explain precision and recall. Why is there a tradeoff between the two? 6 What is Collaborative Filtering? What are some of the challenges involved in making a Collaborative Filtering system? 7 What is Zipf's Law, and what is its significance for search (specifically, web search)? 8 What are the motivations for clustering, and how do bottom-up and top-down clustering differ? 9 State the difference between lossless compression and lossy compression and give several examples of each technique. 10 Define precision and recall and describe one method used to measure precision vs. recall. 11 With term queries what are the benefits and shortcomings of using phrases? 12 What are two ways to solve the problem of non-uniform spelling of terms? 13 What's the advantage and drawback of peronalized search compared to normal search? 14 What could be a good compression for term frequencies and document numbers? 15 Why is determining result relevance such a difficult question? 16 Why are stop lists not necessarily a good solution for determining what to index and what not to index? 17 Storage H(barber) = 010010 H(banana) = 001000 H(today) = 000010 H(glass) = 100010 H(tomato) 010000 (a)Compute the signature of the query “I like the glass tomato” given the signatures above. (b)Are there any unwanted terms caught in this signature? If so, which ones and why. (c)Given the five terms above, build a suffix tree containing all terms. 18 Describe fully Bottom-Up and Top-Down clustering. 19 What is the importance of a page rank? 20 How can the search for a webpage that is to be submitted to a search engine be optimized? 21 Describe the benefits of using a prefix/suffix tree as well as the disadvantages, and give one example where a prefix/suffix tree may be useful. 22 Explain the process of generating an inverted file, and why it is a frequently used algorithm. 23 What is stemming, and what are some of the advantages and disadvantages associated with it? 24 What is meant by a Nearest Neighbor Classification, and what are the pro's and cons? 25 Explain k-means with hard assignment. 26 How are Self Organizing Maps different from k-means? 27 Give a brief overview of the functioning of a search engine. 28 What's the difference between a Web directory like Yahoo and a Web search engine like Google?