Also found in: Acronyms.

the languages of the peoples who inhabit or once inhabited the earth. The total number of the languages of the world is between 2,500 and 5,000; the exact number is impossible to establish because of the arbitrariness of the distinction between different languages and different dialects of the same language.

The following are the most widespread languages of the world (the number of speakers in 1975 is given in millions): Chinese (800), English (350), Russian (240), Spanish (210), Hindi and the closely related Urdu (200), Indonesian (130), Arabic (127), Bengali (125), Portuguese (115), Japanese (111), German (100), French (90), Italian (65), Punjabi (60), Telugu (52), Korean (52), Marathi (48), Tamil (47), and Ukrainian (45).

The languages of the world are divided into language families according to genetic relations. Each family is made up of groups of related languages that were dialects of the same language in the past or that have become part of the same linguistic alliance.

The Indo-European language family (1.86 billion speakers) has been the subject of the most thorough study. It developed from a group of closely related dialects whose speakers in the third millennium B.C. spread southward in Southwest Asia from the northern Black Sea region and the Caspian region. Texts from the second millennium B.C. have provided information about such Indo-European languages of Asia Minor as cuneiform Hittite and other Anatolian languages, including Palaic and Luwian. These Indo-European languages subsequently became extinct; elements of them, however, were preserved in the hieroglyphic Luwian, Lycian, and Lydian languages of the first millennium B.C. The Cretan-Mycenaean Linear B texts, also from the second millennium B.C, have provided information about one of the dialects of Ancient Greek. Inscriptions in the Carian language, the oldest of which dates from the seventh century B.C., do not lend themselves to linguistic study.

An invasion into the Middle East in the second millennium B.C. by speakers of Aryan (Indo-Iranian) Indo-European dialects similar to Greek is evidenced by Mesopotamian Aryan words and names found in Southwest Asian texts. The modern Nuristani, or Kafiri, languages of Afghanistan, which constitute an intermediate group between the two primary groups of the Aryan languages, Indian and Iranian, can be traced back to the ancient dialects of the Aryan tribes. The Indian and Iranian languages, along with the Greek and Armenian languages, make up the eastern group of the Indo-European family; within the group, Greek and Armenian, which became separated from Indo-Iranian before the second millennium B.C, are considered special subgroups.

Early texts in Old Indie were written around the first millennium B.C. The Middle Indie languages, or Prakrits, were derived from Old Indie and led to the development of the Modern Indie languages, which include Hindi, Urdu, Bengali, Marathi, Punjabi, Rajasthani, Gujarati, and Oriya. Closely related to Old Indie are the Ancient Iranian languages of the first millennium B.C, including Old Persian and Avestan, which are historically related to the Middle and Modern Iranian languages. The Eastern Middle Iranian languages include Sogdian (the language of communication among the peoples of Middle Asia), Sakian, and Khwarazmian; the Western Middle Iranian languages include Middle Persian and Parthian. The Modern Iranian languages are also divided into eastern and western branches. The western group includes modern Persian (Farsi), Tadzhik, Kurdish, Baluchi, Tat, and Talysh. Belonging to the eastern group are Pashto; Ossetic, which is historically associated with the Eastern Iranian language of the Scythians; the Pamir languages; and Yagnobi, elements of which derive from Sogdian.

Written texts from the first millennium B.C. document the existence of several languages in the Western Indo-European group. Among them are the Italic languages, which included the extinct, undocumented Osco-Umbrian branch, and Latin, which, along with Faliscan, belonged to the Latino-Faliscan group. The Venetic group was related to the Latino-Faliscan group. After the fall of the Roman Empire, the Romance languages developed from dialects of Latin. The Romance group includes Spanish, Portuguese, Galician (which is closely related to Portuguese), Catalan, French, Provencal, the Rhaeto-Romance languages, Italian, Sardinian, Dalmatian (which became extinct in the late 19th century), and Rumanian, as well as various Balkan-Romance languages and dialects that are related to Rumanian.

Related to the Italic languages are the Celtic languages. Among them are the Gaulish subgroup, including the extinct Gaulish language; the Gaelic subgroup, including Irish, Scottish, and Manx (the language of the Isle of Man); and the Brythonic subgroup, including Breton, Welsh, and the extinct Cornish language. Ancient inscriptions in the Ibero-Celtic (Celtiberian) language were deciphered in Spain in the 1970’s.

The western group of ancient Indo-European languages includes, in addition to the Italic and Celtic languages, the extinct Illyrian language, which has been preserved in inscriptions in the Messapian language found in Italy and in personal names found in Southern and Central Europe. Also belonging to the western group are the Germanic languages, which are made up of three subgroups: the East Germanic languages, which include the extinct Gothic language; the North Germanic, or Scandinavian, languages, to which belong Swedish, Danish, Norwegian, Faroese, and Icelandic; and the West Germanic languages, among which are English, Frisian (very similar to English), Dutch, Boer (Afrikaans), German, and Yiddish.

The Balto-Slavic languages form an intermediate group between the Western Indo-European languages (Celtic, Italic, Germanic, and Illyrian) and the Eastern Indo-European languages (Aryan, Greek, and Armenian). The Baltic languages are divided into two branches: the West Baltic group, which includes the extinct Old Prussian, and the East Baltic group, which includes Lithuanian and Latvian. The Slavic languages are subdivided into three groups. The East Slavic group is made up of Russian, Ukrainian, and Byelorussian. The West Slavic group includes Czech, Slovak, Polish, Wendish, and the extinct Polabian language, which was spoken in the basin of the upper course of the Elbe River, where the river is known as the Labe. The South Slavic group includes Old Church Slavic and the historically related Bulgarian and Macedonian languages, Serbo-Croatian, and Slovene.

The extinct Tocharian languages, which are documented in texts dating from the fifth to eighth centuries A.D. that were found in Central Asia, also belong to the group of ancient Indo-European languages that once occupied an intermediate position between the eastern and western languages. Our knowledge of many extinct Indo-European languages is based on very meager data. Phrygian, for example, is attested by inscriptions found in Asia Minor, where the Phrygians moved from the Balkan Peninsula in the second millennium B.C. Information is equally scarce about a number of other languages. These include Thracian, a Balkan language, which, like Illyrian, is linked to modern Albanian in a special Albanian subgroup of Indo-European; Old Macedonian, which is related to Greek; Philistine, which resembles pre-Greek Indo-European, or Pelasgian; Ligurian; and the language of the Lepontine inscriptions found in northern Italy, which is related to the Celtic languages.

The Hamito-Semitic language family (also known as the Afro-Asiatic or Afrasian language family) (191 million speakers) includes the Semitic, Egyptian and Coptic, and Berber languages. Also in the group are the Cushitic languages, including Somali, and the Chad languages, the most widespread of which is Hausa. The Semitic group consists of two principal branches, Eastern and Western, which are in turn subdivided into Northern (or Western, in the narrow sense of the word) and Southern subgroups. The Eastern Semitic languages are represented by the extinct Accadian language of Assyria and Babylonia. The Northern subgroup of the Western Semitic languages includes the Ca-naanite and Aramaic languages. Among the Canaanite languages are the extinct Old Canaanite, whose oldest cuneiform texts, found in northern Syria in 1974, date from the second half of the third millennium B.C; Phoenician, the language of Phoenicia and of the Phoenician settlements in the Mediterranean region, including Carthage, where the Phoenician-Punic language was in use; Moabite; and Hebrew in both its ancient and modern forms. It has also been suggested that the Ugaritic language used in the Ras Shamra texts found on the site of ancient Ugarit belongs to the Canaanite languages. In the early first millennium A.D., Aramaic was the most widespread language in the Middle East; subsequently, however, it was almost completely supplanted by Arabic. Modern Assyrian, or Aisorian, is historically related to the Eastern Aramaic dialects. The Southern subgroup of Western Semitic languages includes Arabic, the South Arabic languages, and the Ethiopic languages, which are related to the South Arabic languages; the most widespread of the Ethiopic languages is Amharic, the official language of Ethiopia.

The Kartvelian (South Caucasian) family (3.7 million speakers) includes Georgian, Mingrelian, Chan (which together with Mingrelian forms the Zan, or Mingrelian-Chan, subgroup), and Svanetian. Some scholars believe that the Kartvelian languages are related to the North Caucasian languages and that, together, these two groups form a Caucasian, or Ibero-Caucasian, family. This hypothesis, however, has not yet been proved.

The North Caucasian languages include the Abkhazo-Adyg languages (900,000 speakers) and the Nakho-Dagestanian, or Northeast Caucasian languages (2 million speakers). The Abkhazo-Adyg family includes Abkhazian, Abaza, Adygei, Kabar-din-Cherkess, and Ubykh. The Nakho-Dagestanian languages are divided into the Veinakh, or Chechen-Ingush, langugages, which include Chechen, Ingush, and Bats, and the Dagestan languages, which comprise about 30 mountain languages of Dagestan. These include the Avar-Andi-Dido subgroup, whose most widespread language is Avar; the Darghin-Lak subgroup, made up of Darghin and Lak; and the Lezghian-Tabasaran subgroup, which includes Lezghian and Tabasaran. There are also other classifications of these languages. The Caucasian languages are sometimes joined with the Basque language of Spain and southwestern France in the Euskara-Caucasian family, but the relationship between Basque and the Caucasian languages has not yet been confirmed.

The Finno-Ugric, or Ugro-Finnish, languages (23 million speakers) are divided into two basic subgroups: Finnish and Ugric. The Ugric subgroup includes the Ob’ Ugric languages of Western Siberia: Khanty, or Ostyak, and Mansi, or Vogul. Also included in the Ugric subgroup is Hungarian, whose speakers had already settled far to the west at the end of the first millennium A.D. and found themselves separated from other speakers of Ob’ Ugric languages. The Finnish subgroup includes the Permian languages, among which are Komi-Permiak, Komi (Komi-Zyrian), and Udmurt (Votyak); the Balto-Finnic-Volga languages, among which are the Mordovian languages Erzia-Mordovian and Moksha-Mordovian, Mari, and Lapp, which is spoken in Murmansk Oblast of the USSR and in Scandinavia; and the Balto-Finnic languages, which are made up of Finnish, Estonian, and several less common languages.

The Finno-Ugric languages are related to the Samoyedic languages of the Far North of the USSR (25,000 speakers), which include Nenets, Enets, Nganasani and Selkup. The Finno-Ugric and Samoyedic languages are often joined in a Uralic, or Finno-Ugric-Samoyedic, family. The disappearing Yukaghir language of northern Siberia has an affinity with the Uralic languages. In the opinion of some scholars, the Uralic and Altaic languages should be combined in a larger Uralic-Altaic family. The Altaic language family (97 million speakers) includes the Turkic, Mongolian, and Manchu-Tungus languages. Some scholars argue that the Korean and Japanese languages also belong to the Altaic family. According to many scholars, the Turkic, Mongolian, and Manchu-Tungus languages constitute not a single family but a linguistic alliance.

The Turkic languages (89 million speakers) include the following groups: the Bulgar group, which includes Chuvash; the Southwestern group, which includes Turkish, Azerbaijani, Turkmen, and several other languages; the Northwestern group, including Tatar, Kazakh, Bashkir, Karaite, Kumyk, Nogai, and Kara Kalpak, as well as Kirghiz, which is joined with the Altaic languages in a special Kirghiz-Kipchak group; the Southeastern group, including Uzbek and modern Uighur; and the Northeastern group, which includes Yakut and various other languages of Siberia and the Altai, as well as the dead Turkic languages with the oldest extant texts—Old Uighur and Old Turkic and the language of the Orkhon-Enisei inscriptions.

The modern Mongolian languages (4.2 million speakers) include Buriat, Mongolian proper, Kalmyk, the Oirat language of Central Asia (which is related to Kalmyk) and Afghan Mongol. The Manchu-Tungus languages (3.6 million speakers) include Manchu, which is gradually going out of use; Evenki; Even, which is closely related to Evenki; and various other languages of Eastern Siberia and the Far East.

Japanese is closely related to the Ryukyuan language of the Ryukyu Islands. Both languages have features in common with the Austronesian languages. The position of Japanese and Ryukyuan among the language families of the world, however, is not yet clearly defined.

A sizable part of the population of India, primarily in the south, speaks languages of the Dravidian family (154 million speakers). In addition to Tamil, the Dravidian family includes Malayalam and Kannada (which are related to Tamil), as well as Telugu, Kui, Gondi, and Brahui, which are spoken in northwestern India. Scholars have hypothesized a relationship between the Dravidian languages and the Uralic languages, as well as a tie between Dravidian and the extinct Elam language, one of the ancient languages of Southwest Asia.

According to a hypothesis formulated by V. M. Illich-Svitych, the Indo-European, Hamito-Semitic, Kartvelian, Uralic, Altaic, and Dravidian languages together form a single Nostratic family, which is sometimes referred to as the Hyperborean or Boreal family. Some scholars also assign to this family the North Caucasian languages and the languages of the Chukchi-Kamchatka group, which includes Chukchi, Koriak, and Itel’men.

J. H. Greenberg has classified the languages spoken by most of the population of Africa south of the Sahara into the Niger-Kordofanian (Congo-Kordofanian), Nile-Saharan, and Khoisan families. The Niger-Kordofanian family (213 million speakers) is composed of two groups: the Niger-Congo group and the Kordo-fanian group. Within the Niger-Kordofanian family is the extensive Benue-Congo subgroup, which, in addition to such languages as Tiv and Ibibio, includes the Bantu languages, the most important of which are Swahili, Ruanda, Kirundi, Kikongo, Luba (Kiluba), Luganda, Lingala, Sotho, and Zulu. Also belonging to the Niger-Kordofanian family are the West Atlantic subgroup, including Fulani, Wolof, and Kissi; the Mande subgroup, including Malinke, Bambara, Soninke and Mende; the Voltaic subgroup, including Mossi, or More, Grusi, and Lobi; and the Kwa subgroup, including Akan, Ewe, Yoruba, and Ibo. Among the languages in the small Kordofanian group are Koalib, Tegali, and Talodi.

The Nilo-Saharan family (23 million speakers) is composed of six groups: Songhai, Saharan, Maba, Fur, Chari-Nile, and Koma. The largest group, Chari-Nile, includes the Nilotic languages, among which are Dinka, Nuer, Joluo, and Nubian. The Saharan group includes the Kanuri and Tubu languages. Each of the other groups contains one language, which bears the group’s name, or several closely related languages.

The Hottentot languages, which are often joined with the Bushman languages in the Khoisan language group (250,000 speakers), are given a special classification in Southern Africa; several East African languages, such as Sandawe and Hatsa, are grouped with the Khoisan languages.

Many linguists divide the Sino-Tibetan family (865 million speakers) into only two groups: the Chinese and Tibeto-Burman groups. Some scholars have also identified Thai, Miao-Yao, and Vietnamese groups. The Chinese group consists of Chinese, with its many dialects divided in seven main groups. The Hui language (Dungan) of China and Middle Asia is historically linked with the northern group, which is the largest. The Tibetan languages, Burmese, and the languages of the Kachin (Chingpaw) subgroup belong to the Tibeto-Burman group of the Sino-Tibetan family.

The classification of the Thai group (52 million speakers) has not been conclusively determined. Some scholars assign it to the Sino-Tibetan family; others group it with the Austroasiatic family. Some linguists argue that the Thai group is an independent family distantly related to the Austroasiatic language family.

A number of scholars, relying on the similarity of many languages of Southeast and Southern Asia, assign to the Austroasiatic family (65 million speakers) the following groups: Vietnamese, Mon-Khmer, Palaung-Wa, Malacca, Khasi, Nicobarese, Munda, and Miao-Yao. Many languages of these groups, which were once widespread in all the countries of Indochina and India, have survived only in regions that are almost inaccessible.

The languages of the Austronesian, or Malayo-Polynesian, family (191 million speakers) are found among most of the peoples of Indonesia, the Philippines, Malaysia, and the countries of Oceania, with the exception of New Guinea. The Austronesian family has been subdivided into four groups: Indonesian, Polynesian, Micronesian, and Melanesian. New classifications have been made, but none of them is generally accepted. Some linguists, on the basis of W. Schmidt’s research, merge the Austroasiatic and Austronesian families and form an Austric superfa-mily that comprises most of the languages of Southeast Asia and Oceania.

The majority of the languages of the Australian aborigines (more than 100,000 speakers) constitute a single family of languages. Several Papuan languages are related to this Australian family. Greenberg joins a considerable number of the Papuan languages (3.1 million speakers) and the Andamanese languages, which are becoming extinct, into an Indo-Pacific family. The Burushaski language (50,000 speakers) is completely distinct from the other languages of India. The Ainu language of northern Japan (20,000 speakers) is isolated from the other languages of Eastern Asia. Also given a special classification are various other unrelated languages of Northeast Asia that are by convention called Paleo-Asiatic, or, more accurately, Paleosiberian (literally, “ancient Siberian”) languages. Among these languages are Nivkh, or Giliak, and Eskimo, which is closely related to Aleutian, or Unangan. The Eskimo and Aleutian languages are often grouped in a special Eskimo-Aleutian family. Like Ainu, the Paleosiberian languages have several features in common with certain Indian languages of North America. The western Paleosiberian languages are sometimes classified as languages of the Eniseian, or Ket, group, made up of Ket, or Enisei-Ostyak, which is spoken by the inhabitants of several villages in the Enisei basin, and several extinct languages of Western Siberia that are related to Ket, including Kot and Asan. In the opinion of many scholars, the Eniseian, or Ket, languages are related to Tibeto-Burmese.

There is a sizable number of families of Indian languages in North and South America (33 million speakers). According to the classifications of the American scholars Greenberg, E. Sapir, and McQuown, these families form ten large groups, or “superfamilies.” Several scholars, including Sapir, have linked the languages of the Na-Dene group with the Sino-Tibetan languages and other languages of Eurasia. The Na-Dene group includes the Athapaskan languages, the most widespread of which are Navajo, Ilingit, and Haida. Some scholars believe that the speakers of these languages came to the Americas later than members of the other language groups.

The other language families of North America are joined in the following basic groups: Algonquian-Mosan; which includes the Algonquian and Salish languages; Hokan-Siouan, which includes the Hokan, Iroquois, and Natchez-Muskogean languages; Tarasco, which is spoken in Mexico; Aztec-Tanoan, or Uto-Aztec-Tanoan, including the Náhuatl language of Mexico, which was the language of the ancient Aztec state; and Penutian. According to the hypothesis of the American scholar B. Whorf, the last two groups are distantly related to the Central American Maya-Zoque group, which includes the Mayan language of the Yucatán Peninsula. According to Greenberg’s classification, the Maya-Zoque languages are included in the Penutian group.

The most important Indian language of Central and South America is Quechuan, which is spoken in Peru, Ecuador, and Bolivia and in neighboring regions of Argentina, Chile, and Colombia. Quechuan was the chief language of the ancient Peruvian Inca state, which had a distinctive, highly developed culture. Quechuan is joined with Aymaran and several language groups, including Araucanian, Arawakan, and Tupi-Guarani, in the Andean-Equatorial group. Guaraní, which is affiliated with this group, is the chief language of Paraguay and is used by various Indian groups in Brazil, Bolivia, and Argentina.

The other language families of Central and South America are classified in the following groups: Macro-Otomanguean, which includes the Otomian-Mixtecan-Zapotecan and Chinantecan families; Macro-Chibchan, which includes the Chibchan and Miskito-Matagalpan families and the Lenca, Paya, and Xinca languages; and Ge-Pano-Carib, which includes the Cariban, Ge, Panoan, Witotoan, Mataco-Mataguayo, Guaycuruan, and Masc-oian families and the Mosetene, Nambicuara, Bororo, Carajá, and Botocudo languages.

According to the hypothesis of the American scholar E. Matteson, all the Indian languages of the Americas, which are termed Amerindian languages, are related to one another, with the possible exceptions of the Carib languages in the south and the Na-Dene languages in the north. M. Swadesh, who argued that all the Indian languages of the Americas are related, proposed that all the modern languages of the world developed from dialects of a single language family that existed in the Old World several tens of millennia ago. According to Swadesh, other families also existed at that time but they disappeared, leaving no evidence of their existence. The methods introduced by Swadesh to determine the time of decline of a language family are widely used in modern linguistics in the field of glottochronology, or lexical statistics. These methods do not give reliable results, however, when applied to periods of time that are more than 4,000–5,000 years in length. The conclusions made by Swadesh and his predecessor A. Trombetti about the single origin, or monogenesis, of all the languages of the Americas and the Old World are therefore not reliable. It may be assumed that a large number of language families have disappeared without leaving evidence of their existence; the surviving texts of certain ancient languages, in particular, the hieroglyphic languages of Crete, remain undeciphered. Many languages of ancient human cultures have not yet been placed in the genealogical classification of languages.

The extinct languages of Southwest Asia are sometimes grouped under the term “Asianic,” but in reality they belong to different language families. Hurrian, which is preserved in texts from the third and second millennia B.C. found in the areas of Mesopotamia and Syria, was closely related to Urartean, the language of ancient Urartu, which was located near Lake Van. Sumerian, which is preserved in extremely ancient.texts from the late fourth millennium B.C., is given a special classification; it may be linked to the languages of Central and Eastern Asia. Hattic, which had already become extinct in Asia Minor by the second millennium B.C., the Hurrian-Urartean family, and Sumerian are sometimes linked with the Caucasian languages, but these relationships are not yet generally recognized. The genetic relationships of Etruscan are uncertain.

Substantial parts of the classification of the languages of the Americas, Africa, and Southeast Asia have not yet been fully developed. The picture of the linguistic history of man that is presented by contemporary scholarship is therefore an approximate one.


