# information theory

Also found in: Dictionary, Thesaurus, Medical, Financial, Acronyms, Wikipedia.

## information theory

or## communication theory,

mathematical theory formulated principally by the American scientist Claude E. Shannon**Shannon, Claude Elwood,**

1916–2001, American applied mathematician, b. Gaylord, Michigan. A student of Vannevar Bush at the Massachusetts Institute of Technology (MIT), he was the first to propose the application of symbolic logic to the design of relay circuitry with his

**.....**Click the link for more information. to explain aspects and problems of information and communication. While the theory is not specific in all respects, it proves the existence of optimum coding schemes without showing how to find them. For example, it succeeds remarkably in outlining the engineering requirements of communication systems and the limitations of such systems.

In information theory, the term *information* is used in a special sense; it is a measure of the freedom of choice with which a message is selected from the set of all possible messages. Information is thus distinct from meaning, since it is entirely possible for a string of nonsense words and a meaningful sentence to be equivalent with respect to information content.

### Measurement of Information Content

Numerically, information is measured in bits (short for *binary digit*; see binary system**binary system,**

numeration system based on powers of 2, in contrast to the familiar decimal system, which is based on powers of 10. In the binary system, only the digits 0 and 1 are used.**.....** Click the link for more information. ). One bit is equivalent to the choice between two equally likely choices. For example, if we know that a coin is to be tossed but are unable to see it as it falls, a message telling whether the coin came up heads or tails gives us one bit of information. When there are several equally likely choices, the number of bits is equal to the logarithm of the number of choices taken to the base two. For example, if a message specifies one of sixteen equally likely choices, it is said to contain four bits of information. When the various choices are not equally probable, the situation is more complex.

Interestingly, the mathematical expression for information content closely resembles the expression for entropy**entropy**

, quantity specifying the amount of disorder or randomness in a system bearing energy or information. Originally defined in thermodynamics in terms of heat and temperature, entropy indicates the degree to which a given quantity of thermal energy is available for doing**.....** Click the link for more information. in thermodynamics. The greater the information in a message, the lower its randomness, or "noisiness," and hence the smaller its entropy. Since the information content is, in general, associated with a source that generates messages, it is often called the entropy of the source. Often, because of constraints such as grammar, a source does not use its full range of choice. A source that uses just 70% of its freedom of choice would be said to have a relative entropy of 0.7. The redundancy of such a source is defined as 100% minus the relative entropy, or, in this case, 30%. The redundancy of English is estimated to be about 50%; i.e., about half of the elements used in writing or speaking are freely chosen, and the rest are required by the structure of the language.

### Analysis of the Transfer of Messages through Channels

A message proceeds along a channel from the source to the receiver; information theory defines for any given channel a limiting capacity or rate at which it can carry information, expressed in bits per second. In general, it is necessary to process, or encode, information from a source before transmitting it through a given channel. For example, a human voice must be encoded before it can be transmitted by telephone. An important theorem of information theory states that if a source with a given entropy feeds information to a channel with a given capacity, and if the source entropy is less than the channel capacity, a code exists for which the frequency of errors may be reduced as low as desired. If the channel capacity is less than the source entropy, no such code exists.

The theory further shows that noise**noise,**

any signal that does not convey useful information. Electrical noise consists of electrical currents or voltages that interfere with the operation of electronic systems.**.....** Click the link for more information. , or random disturbance of the channel, creates uncertainty as to the correspondence between the received signal and the transmitted signal. The average uncertainty in the message when the signal is known is called the equivocation. It is shown that the net effect of noise is to reduce the information capacity of the channel. However, redundancy in a message, as distinguished from redundancy in a source, makes it more likely that the message can be reconstructed at the receiver without error. For example, if something is already known as a certainty, then all messages about it give no information and are 100% redundant, and the information is thus immune to any disturbances of the channel. Using various mathematical means, Shannon was able to define channel capacity for continuous signals, such as music and speech.

### Bibliography

See C. E. Shannon and W. Weaver, *The Mathematical Theory of Communication* (1949); M. Mansuripur, *Introduction to Information Theory* (1987); J. Gleick, *The Information: A History, a Theory, a Flood* (2011).

## Information Theory

the mathematical discipline that studies the processes of storage, transformation, and transmission of information. Information theory is an essential part of cybernetics.

At the basis of information theory lies a definite method for measuring the quantity of information contained in given data (“messages”). Information theory proceeds from the idea that the messages designated for retention in a storage device or for transmission over a communication channel are not known in advance with complete certainty. Only the set from which these messages may be selected is known in advance and, at best, how frequently certain of these messages are selected (that is, the probability of the messages). In information theory it is shown that the “uncertainty” encountered in such circumstances admits of a quantitative expression and that precisely this expression (and not the specific nature of the messages themselves) determines the possibility of their storage and transmission.

As such a “measure of uncertainty” in information theory one uses the number of binary digits (bits) necessary to record an arbitrary message from a given source. More precisely, one looks at all possible methods for representing the messages by sequences of the symbols 0 and 1 (binary codes) that satisfy two conditions: (a) different sequences correspond to different messages and (b) upon the transcription of a certain sequence of messages into coded form this sequence must be unambiguously recoverable. Then as a measure of the uncertainty one takes the average length of the coded sequence that corresponds to the most economical method of encoding; one binary digit serves as the unit of measurement.

For example, let certain messages *x*_{1}, *x*_{2}, and *x*_{3} appear with probabilities of ½, ⅜, and ⅛, respectively. Any code that is too short, such as

*x*_{1} = 0, *x*_{2} = 1, *x*_{3} = 01

is unsuitable since it violates condition (b). Thus, the sequence 01 can denote *x*_{1},*x*_{2}*x*_{3} The code

*x*_{1} = 0, *x*_{2} = 10, *x*_{3} = 11

satisfies conditions (a) and (b). To it corresponds an average length of a coded sequence equal to

It is not hard to see that no other code can give a smaller value, that is, the code indicated is the most economical. In accordance with our choice of a measure for uncertainty, the uncertainty of the given information source should be taken equal to 1.5 binary units.

Here it is appropriate to note that “message,” “communication channel,” and other terms are understood very broadly in information theory. Thus, from the viewpoint of information theory, an information source is described by enumerating the set *x*_{1}, *x*_{2}, … of possible messages (which can be the words of some language, results of measurements, or television pictures) and their respective probabilities *p*_{1}, *p*_{2}p,

There is no simple formula expressing the exact minimum *H*’ of the average number of bits necessary for encoding the messages *x*_{1}, *x*_{2}, …, *x*_{n} through the probabilities *p*_{1}, *p*_{2}, … *P*_{n} of these messages. However, the specified minimum is not less than the value

(where log_{2}*a* denotes the logarithm of the quantity *a* to base 2) and may not exceed it by more than one unit. The quantity *H* (the entropy of the set of messages) possesses simple formal properties, and for all conclusions of information theory that are of an asymptotic character, corresponding to the case *H′*→ ∞, the difference between *H* and *H′* is absolutely immaterial. Accordingly, the entropy is taken as the measure of the uncertainty of the messages from a given source. In the example above, the entropy is equal to

From the viewpoint stated, the entropy of an infinite aggregate, as a rule, turns out to be infinite. Therefore, when applied to an infinite collection it is treated differently: a certain precision level is assigned, and the concept of £-entropy is introduced as the entropy of the information recorded with a precision of e, if the message is a continuous quantity or function (for example, of time).

Just as with the concept of entropy, the concept of the amount of information contained in a certain random object (random quantity, random vector, or random function) relative to another is introduced at first for objects with a finite number of possible values. Then the general case is studied with the help of a limiting process. In contrast to entropy, the amount of information, for example, in a certain continuously distributed random variable relative to another continuously distributed variable, very often turns out to be finite.

The concept of a communication channel is of an extremely general nature in information theory. In essence, a communication channel is given by specifying a set of “admissible messages” at the “channel input,” a set of “output messages,” and a collection of conditional probabilities for receiving one or another message at the output for a given input message. These conditional probabilities describe the effect of “noise” distorting the transmitted information. “Connecting” any information source to the channel, one may calculate the amount of information contained in the messages at the output relative to that at the input. The upper limit of these amounts of information, taken with all admissible sources, is termed the capacity of the channel. The capacity of a channel is its fundamental information characteristic. Regardless of the effect (possibly strong) of noise in the channel, at a definite ratio of the entropy of the incoming information to the channel capacity, almost error-free transmission is possible with the correct coding.

Information theory searches for methods for transmitting information that are optimal with respect to speed and reliability, having established theoretical limits to the quality attainable. Clearly, information theory is of an essentially statistical character; therefore, a significant portion of its mathematical methods is derived from probability theory.

The foundations of information theory were laid in 1948–49 by the American scientist C. Shannon. The contribution of the Soviet scientists A. N. Kolmogorov and A. Ia. Khinchin was introduced into its theoretical branches and that of V. A. Kotel’-nikov, A. A. Kharkevich, and others into the branches concerning applications.

### REFERENCES

Iaglom, A. M., and I. M. Iaglom.*Veroiatnost’ i informatsiia*, 2nd ed. Moscow, 1960.

Shannon, C. “Statisticheskaia teoriia peredachi elektricheskikh signalov.” In

*Teoriia peredachi elektricheskikh signalov pri nalichii pomekh: Sb. perevodov*. Moscow, 1953.

Goldman, S.

*Teoriia informatsii*. Moscow, 1957. (Translated from English.)

*Teoriia informatsii i ee prilozheniia: Sb. perevodov*. Moscow, 1959.

Khinchin, A. Ia. “Poniatie entropii v teorii veroiatnostei.”

*Uspekhi matematicheskikh nauk*, 1953, vol. 8, issue 3.

Kolmogorov, A. N.

*Teoriia peredachi informatsii*. Moscow, 1956. (Academy of Sciences of the USSR. Session on the scientific problems of the automation of production. Plenary session.)

Peterson, W. W.

*Kody, ispravliaiushchie oshibki*. Moscow, 1964.

(Translated from English.)

IU. V. PROKHOROV

## information theory

[‚in·fər′mā·shən ‚thē·ə·rē]## Information theory

A branch of communication theory devoted to problems in coding. A unique feature of information theory is its use of a numerical measure of the amount of information gained when the contents of a message are learned. Information theory relies heavily on the mathematical science of probability. For this reason the term information theory is often applied loosely to other probabilistic studies in communication theory, such as signal detection, random noise, and prediction. *See* Electrical communications

In designing a one-way communication system from the standpoint of information theory, three parts are considered beyond the control of the system designer: (1) the source, which generates messages at the transmitting end of the system, (2) the destination, which ultimately receives the messages, and (3) the channel, consisting of a transmission medium or device for conveying signals from the source to the destination. The source does not usually produce messages in a form acceptable as input by the channel. The transmitting end of the system contains another device, called an encoder, which prepares the source's messages for input to the channel. Similarly the receiving end of the system will contain a decoder to convert the output of the channel into a form that is recognizable by the destination. The encoder and the decoder are the parts to be designed. In radio systems this design is essentially the choice of a modulator and a detector.

A source is called discrete if its messages are sequences of elements (letters) taken from an enumerable set of possibilities (alphabet). Thus sources producing integer data or written English are discrete. Sources which are not discrete are called continuous, for example, speech and music sources. The treatment of continuous cases is sometimes simplified by noting that signal of finite bandwidth can be encoded into a discrete sequence of numbers.

The output of a channel need not agree with its input. For example, a channel might, for secrecy purposes, contain a cryptographic device to scramble the message. Still, if the output of the channel can be computed knowing just the input message, then the channel is called noiseless. If, however, random agents make the output unpredictable even when the input is known, then the channel is called noisy. *See* Communications scrambling, Cryptography

Many encoders first break the message into a sequence of elementary blocks; next they substitute for each block a representative code, or signal, suitable for input to the channel. Such encoders are called block encoders. For example, telegraph and teletype systems both use block encoders in which the blocks are individual letters. Entire words form the blocks of some commercial cablegram systems. It is generally impossible for a decoder to reconstruct with certainty a message received via a noisy channel. Suitable encoding, however, may make the noise tolerable.

Even when the channel is noiseless, a variety of encoding schemes exists and there is a problem of picking a good one. Of all encodings of English letters into dots and dashes, the Continental Morse encoding is nearly the fastest possible one. It achieves its speed by associating short codes with the most common letters. A noiseless binary channel (capable of transmitting two kinds of pulse 0, 1, of the same duration) provides the following example. Suppose one had to encode English text for this channel. A simple encoding might just use 27 different five-digit codes to represent word space (denoted by #), A, B, . . . , Z; say # 00000, A 00001, B 00010, C 00011, . . . , Z 11011. The word #CAB would then be encoded into 00000000110000100010. A similar encoding is used in teletype transmission; however, it places a third kind of pulse at the beginning of each code to help the decoder stay in synchronism with the encoder.