Samiksha Jaiswal (Editor)

Austroasiatic languages

Updated on
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Proto-language:  Proto-Mon–Khmer
Glottolog:  aust1305
ISO 639-5:  aav
Austroasiatic languages
Geographic distribution:  South and Southeast Asia
Linguistic classification:  One of the world's primary language families
Subdivisions:  Munda Khasi–Palaungic Khmuic Pakanic Vietic Katuic Bahnaric Khmer Pearic Nicobarese Aslian Monic Shompen?

The Austroasiatic languages, in recent classifications synonymous with Mon–Khmer, are a large language family of continental Southeast Asia, also scattered throughout India, Bangladesh, Nepal and the southern border of China. The name Austroasiatic comes from the Latin words for "south" and "Asia", hence "South Asia". Of these languages, only Vietnamese, Khmer, and Mon have a long-established recorded history, and only Vietnamese and Khmer have official status as modern national languages (in Vietnam and Cambodia, respectively). On the subnational level, Khasi has official status in Meghalaya while Santali, Ho and Mundari are official languages of Jharkhand. In Myanmar, the Wa language is the de facto official language of Wa State. The rest of the languages are spoken by minority groups and have no official status.


Ethnologue identifies 168 Austroasiatic languages. These form thirteen established families (plus perhaps Shompen, which is poorly attested, as a fourteenth), which have traditionally been grouped into two, as Mon–Khmer and Munda. However, one recent classification posits three groups (Munda, Nuclear Mon-Khmer and Khasi–Khmuic) while another has abandoned Mon–Khmer as a taxon altogether, making it synonymous with the larger family.

Austroasiatic languages have a disjunct distribution across India, Bangladesh, Nepal and Southeast Asia, separated by regions where other languages are spoken. They appear to be the autochthonous languages of Southeast Asia, with the neighboring Indo-Aryan, Tai–Kadai, Dravidian, Austronesian, and Sino-Tibetan languages being the result of later migrations.


The Austroasiatic languages are well known for having a "sesquisyllabic" pattern, with basic nouns and verbs consisting of a reduced minor syllable plus a full syllable. Many of them also have infixes. The Austroasiatic languages are further characterized as having unusually large vowel inventories and employing some sort of register contrast, either between modal (normal) voice and breathy (lax) voice or between modal voice and creaky voice. Languages in the Pearic branch and some in the Vietic branch can have a three- or even four-way voicing contrast. However, some Austroasiatic languages have lost the register contrast by evolving more diphthongs or in a few cases, such as Vietnamese, tonogenesis. Vietnamese has been so heavily influenced by Chinese that its original Austroasiatic phonological quality is obscured and now resembles that of South Chinese languages, whereas Khmer, which had more influence from Sanskrit, has retained a more typically Austroasiatic structure.


Much work has been done on the reconstruction of Proto-Mon–Khmer in Harry L. Shorto's Mon–Khmer Comparative Dictionary. Little work has been done on the Munda languages, which are not well documented. With their demotion from a primary branch, Proto-Mon–Khmer becomes synonymous with Proto-Austroasiatic.

Paul Sidwell (2005) reconstructs the consonant inventory of Proto-Mon–Khmer as follows:

This is identical to earlier reconstructions except for . is better preserved in the Katuic languages, which Sidwell has specialized in. Sidwell (2011) suggests that the likely homeland of Austroasiatic is the middle Mekong, in the area of the Bahnaric and Katuic languages (approximately where modern Laos, Thailand, and Cambodia come together), and that the family is not as old as frequently assumed, dating to perhaps 2000 BCE.

Internal classification

Linguists traditionally recognize two primary divisions of Austroasiatic: the Mon–Khmer languages of Southeast Asia, Northeast India and the Nicobar Islands, and the Munda languages of East and Central India and parts of Bangladesh. However, no evidence for this classification has ever been published.

Each of the families that is written in boldface type below is accepted as a valid clade. By contrast, the relationships between these families within Austroasiatic is debated. In addition to the traditional classification, two recent proposals are given, neither of which accept traditional "Mon–Khmer" as a valid unit. However, little of the data used for competing classifications has ever been published, and therefore cannot be evaluated by peer review.

In addition, there are suggestions that additional branches of Austroasiatic might be preserved in substrata of Acehnese in Sumatra (Diffloth), the Chamic languages of Vietnam, and the Land Dayak languages of Borneo (Adelaar 1995).

Sidwell (2009, 2011)

Paul Sidwell (2009a), in a lexicostatistical comparison of 36 languages which are well-known enough to exclude loan words, finds little evidence for internal branching, though he did find an area of increased contact between the Bahnaric and Katuic languages, such that languages of all branches apart from the geographically distant Munda and Nicobarese show greater similarity to Bahnaric and Katuic the closer they are to those branches, without any noticeable innovations common to Bahnaric and Katuic. He therefore takes the conservative view that the thirteen branches of Austroasiatic should be treated as equidistant on current evidence. Sidwell & Blench (2011) discuss this proposal in more detail, and note that there is good evidence for a Khasi–Palaungic node, which could also possibly be closely related to Khmuic. If this would the case, Sidwell & Blench suggest that Khasic may have been an early offshoot of Palaungic that had spread westward. Sidwell & Blench (2011) suggest Shompen as an additional branch, and believe that a Vieto-Katuic connection is worth investigating. In general, however, the family is thought to have diversified too quickly for a deeply nested structure to have developed, since Proto-Austroasiatic speakers are believed by Sidwell to have radiated out from the central Mekong River valley relatively quickly.

Previously existent branches

Roger Blench (2009) also proposes that there might have been other primary branches of Austroasiatic that are now extinct, based on substrate evidence in modern-day languages.

  • Pre-Chamic languages (the languages of coastal Vietnam prior to the Chamic migrations). Chamic has various Austroasiatic loanwords that cannot be clearly traced to existing Austroasiatic branches (Sidwell 2006).
  • Acehnese substratum (Sidwell 2006). Acehnese has many basic words that are of Austroasiatic origin, suggesting that either Austronesian speakers have absorbed earlier Austroasiatic residents in northern Sumatra, or that words might have been borrowed from Austroasiatic languages in southern Vietnam — or perhaps a combination of both.
  • Bornean substrate languages (Blench 2010). Blench cites Austroasiatic-origin words in modern-day Bornean branches such as Land Dayak (Bidayuh, Dayak Bakatiq, etc.), Dusunic (Central Dusun, Visayan, etc.), Kayan, and Kenyah, noting especially resemblances with Aslian. As further evidence for his proposal, Blench also cites ethnographic evidence such as musical instruments in Borneo shared in common with Austroasiatic-speaking groups in mainland Southeast Asia.
  • Lepcha substratum ("Rongic"). Many words of Austroasiatic origin have been noticed in Lepcha, suggesting a Sino-Tibetan superstrate laid over an Austroasiatic substrate. Blench (2013) calls this branch "Rongic" based on the Lepcha autonym Róng.
  • Other languages with proposed Austroasiatic substrata are:

  • Jiamao, based on evidence from the register system of Jiamao, a Hlai language (Thurgood 1992). Jiamao is known for its highly aberrant vocabulary.
  • Gérard Diffloth (2005)

    Diffloth compares reconstructions of various clades, and attempts to classify them based on shared innovations, though like other classifications the evidence has not been published. As a schematic, we have:

    Or in more detail,

  • Munda languages (India)
  • Koraput: 7 languages
  • Core Munda languages
  • Kharian–Juang: 2 languages
  • North Munda languages
  • Korku Kherwarian: 12 languages
  • Khasi–Khmuic languages (Northern Mon–Khmer)
  • Khasian: 3 languages of eastern India and Bangladesh
  • Palaungo-Khmuic languages
  • Khmuic: 13 languages of Laos and Thailand
  • Nuclear Mon–Khmer languages
  • Khmero-Vietic languages (Eastern Mon–Khmer)
  • Nico-Monic languages (Southern Mon–Khmer)
  • Nicobarese: 6 languages of the Nicobar Islands, a territory of India.
  • This family tree is consistent with recent studies of migration of Y-Chromosomal haplogroup O2a1-M95. However, the dates obtained from by Zhivotovsky method DNA studies are several times older than that given by linguists. The route map of the people with haplogroup O2a1-M95, speaking this language can be seen in this link. Other geneticists criticise the Zhivotovsky method.

    Ilia Peiros (2004)

    Peiros is a lexicostatistic classification, based on percentages of shared vocabulary. This means that languages can appear to be more distantly related than they actually are due to language contact. Indeed, when Sidwell (2009a) replicated Peiros's study with languages known well enough to account for loans, he did not find the internal (branching) structure below.

  • Nicobarese
  • Munda–Khmer
  • Munda
  • Mon–Khmer
  • Khasi
  • Nuclear Mon–Khmer
  • Mangic (Mang + Palyu) (perhaps in Northern MK)
  • Vietic (perhaps in Northern MK)
  • Northern Mon–Khmer
  • Palaungic
  • Khmuic
  • Central Mon–Khmer
  • Khmer dialects
  • Pearic
  • Asli-Bahnaric
  • Aslian
  • Mon–Bahnaric
  • Monic
  • Katu–Bahnaric
  • Katuic
  • Bahnaric
  • Diffloth (1974)

    Diffloth's widely cited original classification, now abandoned by Diffloth himself, is used in Encyclopædia Britannica and—except for the breakup of Southern Mon–Khmer—in Ethnologue.

  • Munda
  • North Munda
  • Korku
  • Kherwarian
  • South Munda
  • Kharia–Juang
  • Koraput Munda
  • Mon–Khmer
  • Eastern Mon–Khmer
  • Khmer (Cambodian)
  • Pearic
  • Bahnaric
  • Katuic
  • Vietic (includes Vietnamese)
  • Northern Mon–Khmer
  • Khasi (Meghalaya, India)
  • Palaungic
  • Khmuic
  • Southern Mon–Khmer
  • Mon
  • Aslian (Malaya)
  • Nicobarese (Nicobar Islands)
  • Writing systems

    Other than Latin-based alphabets, many Austroasiatic languages are written with the ancient Khmer alphabet, Thai alphabet and Lao alphabet. Vietnamese divergently had an indigenous script based on Chinese logographic writing. This has since been supplanted by the Latin alphabet in the 20th century. The following are examples of past-used alphabets or current alphabets of Austroasiatic languages.

  • Chữ Nôm
  • Khmer alphabet
  • Mon script
  • Ol Chiki alphabet (Santali alphabet)
  • Sorang Sompeng alphabet (Sora alphabet)
  • Varang Kshiti (Ho alphabet)
  • Khom script (used for a short period in the early 20th century for indigenous languages in Laos)
  • Austroasiatic migrations

    According to Chaubey et al. (2010), "AA speakers in India today are derived from dispersal from Southeast Asia, followed by extensive sex-specific admixture with local Indian populations." According to Riccio et al. (2011), the Munda people are likely descended from Austroasiatic migrants from southeast Asia. According to Zhang et al. (2015), Austroasiatic (male) migrations from southeast Asia into India took place after the last Glacial maximum, circa 10,000 years ago.


    Austroasiatic languages Wikipedia