Statistical Linguistics

Statistical Linguistics


a discipline that studies the quantitative laws of language as revealed in written texts.

Statistical linguistics assumes that certain numerical features and the functional dependences among them that are found in a limited group of texts typify language as a whole or one of its functional styles, such as that of publicist writings, literature, or scholarly or scientific works. The numerical features that are most commonly studied and that have the greatest practical value are the relative frequencies of such linguistic units as letters, phonemes, syllables, words, and syntactic constructions; of these units’ classes, for example, vowels, consonants, and parts of speech; and of these units’ combinations, for example, sequences of n letters.

Data on word frequency and sometimes on word-group frequency are reflected in frequency dictionaries. Functional dependence, which plays an important role in statistical linguistics, provides an approximate description of the relationship between a word’s frequency and its rank in a sequence according to diminishing frequency (Zipfs law). Statistical linguistics also studies the relationship between word frequency and word length (number of syllables) and between the number of a word’s meanings and the increase in the word’s frequency.

The data collected by the procedures of statistical linguistics are used to reveal the stylistic features of authors, ascertain the sources of texts, decipher historical writings, and resolve problems in stenography, communications theory, and information science. In order to obtain numerical frequencies, statistical linguistics uses the methods of mathematical statistics, as well as some methods of information theory. To establish relationships between the data obtained and to select the most important among them, statistical linguistics employs mathematical models that are based on the concepts of probability theory and mathematical linguistics.

Statistical linguistics may be more broadly understood as the use of statistical methods for verifying linguistic hypotheses that may also be qualitative in nature.


Golovin, B. N. lazyk istatistika. Moscow, 1971.
Frumkina, R. M. “Statisticheskie metody i strategüa lingvisticheskogo issledovaniia.” hv. AN SSSR: Seriia literatury i iazyka, 1975, vol. 34, no. 2.
Shteinfel’dt, E. A. Chastotnyislovar’ sovremennogo russkogo iazyka. Tallinn, 1963.
Herdan, G. The Advanced Theory of Language as Choice and Chance. Berlin, 1966.
Muller, C. Initiation á la statistique linguistique. Paris, 1968.


References in periodicals archive ?
The software, based on real-time statistical linguistics, statistical semantics, and entity extraction, offers users a range of advanced search features, personalization options, shortcuts, and personal bookmarks.
Exalead uses, among other things, statistical linguistics when scouring the Web and has the capability to work with any language.

Full browser ?