Editors' Introduction to ``Document Image Understanding and Retrieval,'' (Junichi Kanai & Henry S. Baird, Guest Editors) special issue of Computer Vision and Image Understanding journal, Vol. 70, No. 3, June 1998.

Vast archives of information, handwritten and machine printed on paper, have accumulated over centuries. Advances in computer and communication technologies now offer drastically improved ways to store, retrieve, and distribute their contents. Billions of paper documents wait to be made accessible via electronic media.

Document image understanding and retrieval research seeks to discover methods for automatically extracting and organizing information from handwritten and machine printed paper documents containing text, line drawings, maps, music scores, etc. Its characteristic problems include some of the earliest attacked by computer-vision pioneers. The field has long been distinguished by close and productive ties between the academic and commercial communities. Today, document analysis research supports a viable industry which, stimulated by the growing demand for digital archives, the proliferation of inexpensive personal document scanners, and the ubiquity of FAXes, is poised for rapid growth. But the performance of these technologies still lags far behind human abilities. Many technical problems, critically important on both theoretical and practical grounds, remain open.

We are pleased to offer a collection of state-of-the-art papers touching on topics of current research interest. We begin with Doermann's up-to-date critical survey of the literature on document image retrieval, which reveals the rich interplay between the document analysis and information retrieval research communities. One example of this genre is the strikingly versatile language-independent text categorization system described by Bayer, Kressel, Mogg-Schneider, and Renz. Chen and Bloomberg show that English-language textual document images can be summarized without any resort to image pattern recognition (this won the Outstanding Paper award at the 1997 IAPR International Conference on Document Analysis and Recognition). Such surprising instances of non-trivial yet ``OCR-free'' document processing may be harbingers of a new generation of architectures for document analysis systems.

Document images are usually compressed before being exchanged and archived. It is sometimes possible to analyze compressed document images without fully decompressing them. Spitz demonstrates that non-trivial characteristics, such as skew angles and specially designed logos, can be extracted directly and extremely rapidly from images compressed by the CCITT Group III and IV methods. Kia, Doermann, Rosenfeld, and Chellappa provide a compression technique for document images which explicitly enables such ``compressed-domain'' processing, as one of several improvements to a symbolic-compression system.

The great variety of geometric arrangements of text blocks on printed pages poses daunting challenges. Antonacopoulos' fast `white-space tiling' method copes well with an unusually wide range of skewed, non-rectangular layouts. Kise and Sato, motivated by similar goals and similarly choosing to analyze the white background, prove that methods based on area Voronoi diagrams are also effective -- an example of the continuing relevance of computational geometry to document analysis.

Of course the generic problem of segmentation -- the partitioning of complex images into regions which we can more easily recognize or analyze further -- pervades the field. Hu and Yan attempt the segmentation of handwriting, in off-line (static) images, into individual characters. Hidden Markov model (HMM) techniques, having been applied with notable success in speech recognition, are increasingly being adapted to selected sub-problems in document analysis. Knerr, Augustin, Baret, and Price apply HMMs to word recognition in handwritten checks.

The robust detection of `graphical primitives' such as straight lines and circular arcs is an inescapable subtask in graphics recognition. Wenyin and Dori present a painstaking study of software-engineering aspects arising in algorithms for this purpose. Ogier, Mullot, Labiche, and Lecouter give an architectural tour of a complete system for the knowledge-guided interpretation of city maps, putting to use some general principles of human visual perception. This is one of many experiments within our field -- still cautiously exploratory -- in the exploitation of cognitive science.

These eleven papers were subjected to the exhaustive CVIU process of review and revision. We would like enthusiastically to thank our twenty-seven highly professional referees for admirable devotion to their anonymous duties, and to the authors for responding to the referees' advice gracefully and thoroughly. Finally, but no less ardently, we are grateful to Editor-in-Chief Avi Kak for his kind invitation to us to assemble these papers, and to him and Karen Rado and other CVIU Editorial Office staff for their unfailing support and advice during our protracted labor.

We hope that this Special Issue will stimulate greater understanding, mutual interest, and collaboration between the computer vision and document image analysis research communities. Only a decade ago these now divergent communities were unified. We continue to share our most strongly held aspiration: to build machines able to infer complete and highly accurate interpretations of the contents of complex images -- whether scenes of the 3-D physical world or the `visible speech' of 2-D documents.

Last updated August 18, 1999.