# Mathematical Linguistics

Also found in: Dictionary, Thesaurus.

## Mathematical Linguistics

a mathematical discipline that develops a formal apparatus for describing the structure of natural languages and of some formal languages.

Mathematical linguistics arose in the 1950’s as a result of the urgent need to clarify basic concepts in linguistics. Mathematical linguistics chiefly makes use of algebra, the theory of algorithms, and the theory of automatons. Although not a part of linguistics, mathematical linguistics has developed in close relation to it. A linguistic field of investigation that employs mathematics is sometimes called mathematical linguistics.

The mathematical description of language is founded on F. de Saussure’s concept of language as a mechanism whose functioning is revealed in the speech habits of its users. Speech results in “correct texts,” that is, sequences of speech units that adhere to definite laws, many of which can be described mathematically. The study of the methods of mathematically describing correct texts (primarily, sentences) is one branch of mathematical linguistics and is called the theory of descriptive methods for syntactic structures. To describe the syntactic structure of a sentence, we may either isolate its constituents— words that function as complete syntactic units—or indicate for each word those words (if any) that are directly dependent on it. Thus, in the sentence *loshadi kushaiut oves* (horses eat oats), a description using the first method will yield the following constituents: the entire sentence *I*, each separate word, and the phrase *C = Kushaiut oves* (eat oats; see Figure 1).

The second method yields the scheme shown in Figure 2. The mathematical means used to describe sentence structure is called a tree of constituents (first method) or a tree of syntactic subordination (second method).

Another branch of mathematical linguistics, and one that occupies a central place in it, is the theory of formal grammars, whose chief proponent is N. Chomsky. Chomsky studies methods of describing the lawlike regularities that characterize not only isolated texts, but the entire set of correct texts in a given language. These lawlike regularities are described by constructing a “formal grammar”—an abstract device that can produce, by means of a uniform procedure, correct texts in a given language and that permits the description of the structure of these texts.

The most widely used type of formal grammar is generative, or Chomskian, grammar. Generative grammar is an ordered system *Г = 〈 V, W, I, R 〉*, where *V* and *W* are disjoint finite sets, *I* is an element of *W*, and *R* is a finite set of rules of the type φ→ψ, where φ and ψ are chains (finite sequences) of elements in *V* and *W*. If φ → ψ is a rule of grammar Γ and ω_{1} and ω_{2} are chains of elements in *V* and *W*, we say that the chain ω_{1}ψω_{2} can be immediately derived in Γ from ωφω_{2}. If ξ_{0}, ξ_{1}, …, ξ* _{n}* are chains and the chain ξ

_{i}is immediately derivable from, ξ

_{i−1}for every

*i*= 1, …

*n*, we say that ξ

*is derivable from ξ*

_{n}_{0}in Γ. The set of chains of elements in

*V*derivable in Γ from

*I*is called the language that can be generated by the grammar Γ. If all rules in Γ have the form

*A*→ ψ, where

*A*is an element of

*W*, Γ is called a contextless, or context-free, grammar.

Interpreted linguistically, the elements of *V* are generally words, the elements of *W* are symbols for grammatical categories, and *I* is the symbol for the category “sentence.” In a context-free grammar, the derivation of a sentence yields a tree of constituents in which each constituent consists of words that derive from a single element of *W*, so that, the grammatical category of each constituent is indicated. Thus, if a given grammar contains the rules

(1) *I* → *S _{x,y}*, nom

*V*

_{y}(2) *V _{y}* →

*V*, acc

_{y}^{t}S_{x,y}(3) *S*_{masc, sing, acc} → *oves*

(4) *S*_{fem, pl, nom} → *loshadi*

(5) *V*_{pl}^{t} → *kushaiut*

where *V _{y}* denotes the category “verb group in number

*y*,”

*v*denotes the category “transitive verb in number

_{y}^{t}*y*,” and

*S*denotes the category “noun of gender

_{x,y,z}*x*, in number

*y*, and in case

*z*,” the sentence

*loshadi kushaiut oves*has the derivation depicted in Figure 3. Formal grammars are used to describe not only natural languages but also formal languages, particularly programming languages.

Mathematical linguistics also deals with the study of analytic models of language. In these models, formal constructions are produced on the basis of the intuitive knowledge of certain speech data (for example, sets of correct sentences); formal constructions provide information about the structure of the language. The application of mathematical linguistics to real languages is part of the study of linguistics.

### REFERENCES

Chomsky, N.*Sintakticheskie struktury*. In

*Novoe v lingvistike*, issue 2. Moscow, 1962. (Translated from English.)

Gladkii, A. V., and I. A. Mel’chuk.

*Elementy matematicheskoi lingvistiki*. Moscow, 1969.

Marcus, S.

*Teoretiko-mnozhestvennye modeli iazykov*Moscow, 1970. (Translated from English.)

Gladkii, A. V.

*Formal’nye grammatiki i iazyki*. Moscow, 1973.

A. V. GLADKII