Indexing of Documents

The following article is from The Great Soviet Encyclopedia (1979). It might be outdated or ideologically biased.

Indexing of Documents

 

the process of expressing the main subject or theme of a text in a document in the terms of an information retrieval language. Indexing facilitates the retrieval of a text from a number of others. Either an entire document or a part can be indexed. Text headings are often used for indexing. The secondary subjects and themes are omitted; thus the texts in which the subject or theme of the retrieval request is not primary will not be located in retrieval.

There are two basic types of indexing: classification and coordinate. With classification indexing, or classifying, the texts are included in an appropriate class (one or several) depending on their content. All texts with basically the same semantic content are brought together. The index number of this class is assigned to each text within it, and the number then serves as its search specification.

In coordinate indexing, the basic semantic content of the text is expressed by a list of significant words selected either from the text itself or its headings or from a special normative dictionary. In the first instance, such lexical units are termed key words, and in the second, descriptors. Each key word or descriptor designates a class that potentially includes all the texts that have the word in the basic semantic content.

The logical formation of the classes that are designated by all the words expressing in their aggregate the basic semantic content of the text creates a certain complex class. The complex class constructed in this manner is designated by a list of key words or descriptors, and this list serves as the search specification for the given text or as an expression of the semantic content of the request in an information retrieval language. Thus, with coordinate indexing, the semantic content of the text is expressed by an indication of its coordinates in a certain n-dimensional space.

A variation of coordinate indexing is permutation, or cyclical, indexing, which is based on the use of key words from the headings of the text and consists in the sequential arrangement of all the key words of the headings with the context into a retrieval column, where the key words are arranged in alphabetical order.

More complex information retrieval languages have been developed on the basis of coordinate indexing. The basic advantage of coordinate indexing over classification is that coordinate indexing does not pose any difficulties in retrieving the texts, no matter what logic is used.

A special type of indexing is the analysis of the semantic content of the text through its appended bibliography, which gives the authors and bibliographic descriptions of works referred to in the text. This process is used to compile indexes of cited literature and is a very effective instrument not only for retrieving documents, but also for resolving problems in science studies or prognosis.

REFERENCES

Mikhailov, A. I., A. I. Chernyi, and R. S. Giliarevskii. Osnovy informatiki, 2nd ed. Moscow, 1968. Pages 179–222, 244–515.
Sharp, J. R. Some Fundamentals of Information Retrieval. London, 1965. Pages 11–120, 156–203.
Stevens, M. E. Automatic Indexing: A State-of-the-Art Report. Washington, DC, 1965. (National Bureau of Standards Monograph No. 91.)

A. I. CHERNYI

The Great Soviet Encyclopedia, 3rd Edition (1970-1979). © 2010 The Gale Group, Inc. All rights reserved.
References in periodicals archive ?
* Automatic archiving and indexing of documents following the creation of each scanned, digital image
Visionet's VisiLoanReview (VLR) Platform is a solution for mortgage lenders that eliminates the manual splitting and indexing of documents that impedes the loan process.
The IRISPowerscan simplifies scanning, classifying, and indexing of documents efficiently.
The imageRUNNER 1435iF and 1435i MFP models also include standard color scanning supporting a range of capabilities such as Compact PDF for reduced network traffic, Searchable PDF for efficient reuse and indexing of documents, Scan to USB and LDAP corporate address book lookup.
Appropriate indexing of documents enables more direct access to individual documents, which can help save time, as can storing the most frequently accessed documents on faster tier-1 storage devices.
Using Autonomy's IDOL (Intelligent Data Operating Layer) engine to take advantage of its robustness, scalability and adaptability, the project incorporated a variety of search and discovery functionality: simple metadata search; full-text indexing of documents and associated digital artefacts; textmining of full-text documents; automatic subject classification; dynamic clustering and serendipitous browsing; term-based document classification and visualisation approaches to search results.
One way that semantic indexing is distinguished from traditional subject indexing of documents is that it focuses on concepts rather than the documents as a whole.
The IFAC KnowledgeNet will scan websites for key terms and phrases and will deliver the results in terms of relevance, but its search function will reportedly deliver targeted information based on an indexing of documents from the websites of the IFAC member bodies involved.
Unlike other general search engines, IFACnet.com focuses specifically on websites that provide information developed for professional accountants in business, and its search function delivers the most relevant information possible based on an indexing of documents from the website of more than a dozen IFAC member bodies.
Half of this amount was to go towards the translation and indexing of documents for possible use in future war crimes prosecutions of Iraqi leaders.
Lessons gleaned from this material will be applied first to quality issues dealing with the selection and indexing of documents for inclusion in a database (that is, what determines quality in the development of a product or service on the input side).