Information Retrieval

information retrieval

[‚in·fər′mā·shən ri‚trē·vəl]
(computer science)
The technique and process of searching, recovering, and interpreting information from large amounts of stored data.

the process of locating in a certain set of texts (documents) all those devoted to a requested subject or that contain facts or information necessary to the user.

Information retrieval is accomplished by means of an information retrieval system and is performed manually or with the use of mechanization or automation. Human beings are indispensable in information retrieval. Depending on the character of the information contained in the texts output by the information retrieval system, information retrieval can be documentary, including bibliographic, or factual. Information retrieval must be distinguished from logical information processing, without which direct replies to the questions posed by a human being is impossible. In information retrieval, only the information that was input to the information retrieval system is sought—only that information can be found.

Before input into an information retrieval system, the basic semantic content (theme or subject) of a text (document) is determined, which is then translated into and recorded in an information retrieval language. This entry is termed the retrieval form of the text. The same procedure is also followed when information recorded in a specific form are input to an information retrieval system. A processed request is also translated into the information retrieval language, forming a retrieval instruction. Since the retrieval forms of the texts and the retrieval instructions are recorded in one and the same language, whose expressions permit only one interpretation, it is possible to compare them formally, ignoring meaning. Specific rules (match criteria) are given for this that establish at what degree of formal coincidence of the retrieval form with the retrieval instruction the text should be considered to correspond to the information request and be subject to output.

The technical efficiency of information retrieval is characterized by two relative indicators—the precision coefficient (the ratio of the number of texts answering the information request to the total number of texts in a given output) and the recall coefficient (the ratio of the number of texts answering an information request to the total number of such texts contained in the given information retrieval systems). The permissible values of these indicators depend on the specific features of the information requirements. For example, in the retrieval of patent descriptions with the aim of examining the patent’s claim to innovation, 100 percent completeness is necessary in the output; in retrieval oriented toward the ordinary researcher or engineer, a precision of about 80 percent and a completeness of about 50 percent are considered very good.

Information retrieval may be of two types—selective (or addressed) dissemination of information or retrospective retrieval. In the selective dissemination of information, information retrieval is carried out according to the constant demands of a certain number of users (subscribers); it is performed periodically (usually weekly or biweekly) and only on the body of texts entered into the information retrieval system during this time. An efficient feedback is established between the information retrieval system and the subscriber: the subscriber reports to what degree a text corresponds to his request and whether he needs a copy of the complete text, as well as on the correspondence of this text to his information requirements. This feedback allows his requirements to be made more precise, permits the system to react quickly to the changes in these requirements, and allows the work of the system to be optimized. In retrospective retrieval the information retrieval system searches throughout the whole stored body of texts for texts containing the required information in answer to single requests.

The further development of information retrieval is directed toward its mechanization and automation and makes use of punched cards for manual handling (edged-notched, slotted, and visual selection cards), keypunch machines, and digital computers, as well as special technical methods—microphotographic, magnetic, and videotape information recording.


