Chad Hogg Review of "Mining the Semantic Web: Requirements for Machine Learning" by Ciravegna and Chapman This paper describes the technological challenges that must be overcome in order to effectively use machine learning techniques to automatically create semantic markup for existing web documents. The authors review an existing system, Armadillo, that attempts to perform this task but is limited in scope. There are two primary requirements specified in the paper: the capability to scale to the size of the web and the capability to handle heterogeneity of data formats with minimal user intervention. There are several distracting grammatical errors within the paper that should have been caught in proofreading, but the writing is generally quite legible. The authors' constant use of abbreviations for short phrases that do not need them is more of a problem; the reader must constantly expand tokens such as "SM" (Semantic Web) and "ML" (Machine Learning). Furthermore, the abbreviation "CS" is used twice without being defined anywhere that I noticed. There are also a few places where the author's meaning is not clear, including the paradox "... unstructured documents such as semi-structured [documents] ..." and the mis-definition "... Web dynamicity, pages are often built or modified in a careless way". I have not been able to find any formal definition of the term "dynamicity", but it seems quite likely that it refers to the dynamic nature of the web, not the carelessness with which its components are created. Although the citation style is annoying, the reference list appears to be relatively complete. However, Ciravegna's paper describing the (LP)2 algorithm was notably absent; finding some information about the algorithm required careful searching. I am not a semantic web researcher, nor am I an expert on knowledge representation. Despite this, the content of this paper seems rather obvious. Anyone who is performing data mining at the level of the web should be intimately familiar with the scale of the data and the challenges it entails. The problem of heterogeneity may be less well-studied, but I recall discussing such issues before, most likely in response to a section of Chakrabarti's textbook. Furthermore, the title of the paper does not seem to accurately describe its content: the task discussed here is not the mining of the semantic web, but rather the mining of the existing web to create the semantic web. I would not accept this paper as anything more than a technical report.