Southeast Asia and the Pacific
Austronesian Paiwanic? Malayo-Polynesian
Philippine Batanic (may be branch of Northern Philippine) Bornean (geographic) Nuclear Malayo-Polynesian
The Malayo-Polynesian languages are a subgroup of the Austronesian languages, with approximately 385.5 million speakers. The Malayo-Polynesian languages are spoken by the Austronesian people of the island nations of Southeast Asia and the Pacific Ocean, with a smaller number in continental Asia. Cambodia, Laos, and Vietnam serve as the northwest geographic outlier, going well into the Malay peninsula. On the northern most geographical outlier does not pass beyond the north of Pattani, which is located in southern Thailand. Malagasy is spoken in the island of Madagascar located off the eastern coast of Africa in the Indian Ocean. Part of the language family shows a strong influence of Sanskrit and particularly Arabic as the Western part of the region has been a stronghold of Buddhism, Hinduism, and, since the 10th century, Islam.
Two morphological characteristics of the Malayo-Polynesian languages are a system of affixation and the reduplication (repetition of all or part of a word, such as wiki-wiki) to form new words. Like other Austronesian languages they have small phonemic inventories; thus a text has few but frequent sounds. The majority also lack consonant clusters (e.g., [str] in English). Most also have only a small set of vowels, five being a common number.
The Nuclear Malayo-Polynesian languages are spoken by about 230 million people and include Malay (Indonesian and Malaysian), Sundanese, Javanese, Bugise, Balinese, Acehnese; and also the Oceanic languages, including Tolai, Gilbertese, Fijian, and Polynesian languages such as Hawaiian, Māori, Samoan, Tahitian, and Tongan.
The Philippine languages are spoken by around 100 million people and include Tagalog (Filipino), Cebuano, Ilokano, Hiligaynon, Central Bikol, Waray, and Kapampangan, each with at least three million speakers.
In Northern Borneo, the most widely spoken language is Kadazan-Dusun, with over 200,000+ speakers.
The Malayo-Polynesian languages share several phonological and lexical innovations with the eastern Formosan languages, including the leveling of proto-Austronesian *t, *C to /t/ and *n, *N to /n/, a shift of *S to /h/, and vocabulary such as *lima "five" which are not attested in other Formosan languages. However, it does not align with any one branch. A 2008 analysis of the Austronesian Basic Vocabulary Database suggests the closest connection is with Paiwan, though it only assigns that connection a 75% confidence level.
Malayo-Polynesian consists of a large number of small local language clusters, with the one exception being Oceanic, the only large group which has been reconstructed and is indisputably valid. All other large groups within Malayo-Polynesian are disputed. The family has traditionally been divided into Western ("Hesperonesian"), Central, and Eastern branches. However, there is little support for these groups; Central MP languages are distinctive because they are typologically Melanesian due to substratum effects of the Papuan languages of eastern Indonesia, as similarly are the Eastern MP languages, while the Western branch is simply the branches which have not undergone such extensive contact-induced change.
Wouk and Ross (2002) proposed a Nuclear Malayo-Polynesian branch, based on a consistent simplification of the Austronesian alignment in the syntax of the proto-Malayo-Polynesian language, which is found throughout Indonesia apart from much of Borneo and the north of Sulawesi. Because Nuclear MP included some Western MP languages along with Central–Eastern MP, Wouk and Ross split Western MP into an "Inner" group on Sulawesi and the Sunda Islands, which together with Central–Eastern formed Nuclear Malayo-Polynesian, and an "Outer" group on Borneo and the Philippines. Both are remnant groups with negative definitions: Outer WMP (Borneo–Philippines) are those Malayo-Polynesian languages which are not Nuclear MP; while Inner WMP (Sunda–Sulawesi) are those Nuclear languages which are not Central–Eastern MP, which is itself a dubious group. Although Nuclear MP was defined using syntactic data, it finds moderate support from lexical data.