The origins of the field can be traced back to Charles Darwin who wrote in his Descent of Man:
"When we treat of sexual selection we shall see that primeval man, or rather some early progenitor of man, probably first used his voice in producing true musical cadences, that is in singing, as do some of the gibbon-apes at the present day; and we may conclude from a widely-spread analogy, that this power would have been especially exerted during the courtship of the sexes,—would have expressed various emotions, such as love, jealousy, triumph,—and would have served as a challenge to rivals. It is, therefore, probable that the imitation of musical cries by articulate sounds may have given rise to words expressive of various complex emotions."
This theory of a musical protolanguage has been revived and re-discovered repeatedly, often without attribution to Darwin.
Two major topics for any subfield of evolutionary psychology are the adaptive function (if any) and phylogenetic history of the mechanism or behavior of interest including when music arose in human ancestry and from what ancestral traits it developed. Current debate addresses each of these.
One part of the adaptive function question is whether music constitutes an evolutionary adaptation or exaptation (i.e. by-product of evolution). Steven Pinker, in his book How the Mind Works, for example, argues that music is merely "auditory cheesecake"—it was evolutionarily adaptive to have a preference for fat and sugar but cheesecake did not play a role in that selection process. This view has been directly countered by numerous music researchers.
Adaptation, on the other hand, is highlighted in hypotheses such as the one by Edward Hagen and Gregory Bryant which posits that human music evolved from animal territorial signals, eventually becoming a method of signaling a group's social cohesion to other groups for the purposes of making beneficial multi-group alliances.
The evolutionary switch to bipedalism may have influenced the origins of music. The background is that noise of locomotion and ventilation may mask critical auditory information. Human locomotion is likely to produce more predictable sounds than those of non-human primates. Predictable locomotion sounds may have improved our capacity of entrainment to external rhythms and to feel the beat in music. A sense of rhythm could aid the brain in distinguishing among sounds arising from discrete sources and also help individuals to synchronize their movements with one another. Synchronization of group movement may improve perception by providing periods of relative silence and by facilitating auditory processing. The adaptive value of such skills to early human ancestors may have been keener detection of prey or stalkers and enhanced communication. Thus, bipedal walking may have influenced the development of entrainment in humans and thereby the evolution of rhythmic abilities. Primitive hominids lived and moved around in small groups. The noise generated by the locomotion of two or more individuals can result in a complicated mix of footsteps, breathing, movements against vegetation, echoes, etc. The ability to perceive differences in pitch, rhythm, and harmonies, i.e. “musicality,” could help the brain to distinguish among sounds arising from discrete sources, and also help the individual to synchronize movements with the group. Endurance and an interest in listening might, for the same reasons, have been associated with survival advantages eventually resulting in adaptive selection for rhythmic and musical abilities and reinforcement of such abilities. Listening to music seems to stimulate release of dopamine. Rhythmic group locomotion combined with attentive listening in nature may have resulted in reinforcement through dopamine release. A primarily survival-based behavior may eventually have attained similarities to dance and music, due to such reinforcement mechanisms . Since music may facilitate social cohesion, improve group effort, reduce conflict, facilitate perceptual and motor skill development, and improve trans-generational communication, music-like behavior may at some stage have become incorporated into human culture.
Another proposed adaptive function is creating intra-group bonding. In this aspect it has been seen as complementary to language by creating strong positive emotions while not having a specific message people may disagree on. Music's ability to cause entrainment (synchronization of behavior of different organisms by a regular beat) has also been pointed out. A different explanation is that signaling fitness and creativity by the producer or performer in order to attract mates. Still another is that music may have developed from human mother-infant auditory interactions (motherese) since humans have a very long period of infant and child development, infants can perceive musical features, and some infant-mother auditory interaction have resemblances to music.
Part of the problem in the debate is that music, like any complex cognitive function, is not a holistic entity but rather modular—perception and production of rhythm, melodies, harmony and other musical parameters may thus involve multiple cognitive functions with possibly quite distinct evolutionary histories.
"Musilanguage" is a term coined by Steven Brown to describe his hypothesis of the ancestral human traits that evolved into language and musical abilities. It is both a model of musical and linguistic evolution and a term coined to describe a certain stage in that evolution. Brown argues that both music and human language have origins in a "musilanguage" stage of evolution and that the structural features shared by music and language are not the results of mere chance parallelism, nor are they a function of one system emerging from the other. This model argues that "music emphasizes sound as emotive meaning and language emphasizes sound as referential meaning." The musilanguage model is a structural model of music evolution, meaning that it views music’s acoustic properties as effects of homologous precursor functions. This can be contrasted with functional models of music evolution, which view music’s innate physical properties to be determined by its adaptive roles.
The musilanguage evolutionary stage is argued to exhibit three properties found in both music and language: lexical tone, combinatorial phrase formation, and expressive phrasing mechanisms. Many of these ideas have their roots in existing phonological theory in linguistics, but Brown argues that phonological theory has largely neglected the strong mechanistic parallels between melody, phrasing, and rhythm in speech and music.
Joseph Jordania has suggested that music (as well as several other universal elements of contemporary human culture, including dance and body painting) was part of a predator control system used by early hominids. He suggested that rhythmic loud singing and drumming, together with the threatening rhythmic body movements and body painting, was the core element of the ancient "Audio-Visual Intimidating Display" (AVID). AVID was also a key factor in putting the hominid group into a specific altered state of consciousness which he calls "battle trance" where they would not feel fear and pain, and would be religiously dedicated to group interests. Jordania suggested that listening and dancing to the sounds of loud rhythmic rock music, used in many contemporary combat units before the combat missions is directly related to this. Apart from the defense from predators, Jordania suggested that this system was the core strategy to obtain food via confrontational, or aggressive scavenging.
Apart from loud rhythmic singing-stomping-dancing, Jordania also suggested that soft humming could have played an important role in the early human (hominid) evolution as contact calls. Many social animals produce seemingly haphazard and indistinctive sounds (like chicken cluck) when they are going about their everyday business (foraging, feeding). These sounds have two functions: (1) to let group members know that they are among kin and there is no danger, and (2) in case of the appearance of any signs of danger (suspicious sounds, movements in a forest), the animal that notices danger first, stops moving, stops producing sounds, remains silent and looks in the direction of the danger sign. Other animals quickly follow suit and very soon all the group is silent and is scanning the environment for the possible danger. Charles Darwin was the first to notice this phenomenon on the example of the wild horses and the cattle. Jordania suggested that for humans, as for many social animals, silence can be a sign of danger, and that's why gentle humming and musical sounds relax humans (see the use of gentle music in music therapy, lullabies)