Indexing of Documents
Indexing of Documents
the process of expressing the main subject or theme of a text in a document in the terms of an information retrieval language. Indexing facilitates the retrieval of a text from a number of others. Either an entire document or a part can be indexed. Text headings are often used for indexing. The secondary subjects and themes are omitted; thus the texts in which the subject or theme of the retrieval request is not primary will not be located in retrieval.
There are two basic types of indexing: classification and coordinate. With classification indexing, or classifying, the texts are included in an appropriate class (one or several) depending on their content. All texts with basically the same semantic content are brought together. The index number of this class is assigned to each text within it, and the number then serves as its search specification.
In coordinate indexing, the basic semantic content of the text is expressed by a list of significant words selected either from the text itself or its headings or from a special normative dictionary. In the first instance, such lexical units are termed key words, and in the second, descriptors. Each key word or descriptor designates a class that potentially includes all the texts that have the word in the basic semantic content.
The logical formation of the classes that are designated by all the words expressing in their aggregate the basic semantic content of the text creates a certain complex class. The complex class constructed in this manner is designated by a list of key words or descriptors, and this list serves as the search specification for the given text or as an expression of the semantic content of the request in an information retrieval language. Thus, with coordinate indexing, the semantic content of the text is expressed by an indication of its coordinates in a certain n-dimensional space.
A variation of coordinate indexing is permutation, or cyclical, indexing, which is based on the use of key words from the headings of the text and consists in the sequential arrangement of all the key words of the headings with the context into a retrieval column, where the key words are arranged in alphabetical order.
More complex information retrieval languages have been developed on the basis of coordinate indexing. The basic advantage of coordinate indexing over classification is that coordinate indexing does not pose any difficulties in retrieving the texts, no matter what logic is used.
A special type of indexing is the analysis of the semantic content of the text through its appended bibliography, which gives the authors and bibliographic descriptions of works referred to in the text. This process is used to compile indexes of cited literature and is a very effective instrument not only for retrieving documents, but also for resolving problems in science studies or prognosis.
REFERENCESMikhailov, A. I., A. I. Chernyi, and R. S. Giliarevskii. Osnovy informatiki, 2nd ed. Moscow, 1968. Pages 179–222, 244–515.
Sharp, J. R. Some Fundamentals of Information Retrieval. London, 1965. Pages 11–120, 156–203.
Stevens, M. E. Automatic Indexing: A State-of-the-Art Report. Washington, DC, 1965. (National Bureau of Standards Monograph No. 91.)
A. I. CHERNYI