Categorical perception

Updated on Nov 20, 2024

Edit

Comment

Categorical perception is the experience of percept invariances in sensory phenomena that can be varied along a continuum. Multiple views of a face, for example, are mapped onto a common identity, visually distinct objects such as cars are mapped into the same category and variable speech sounds are perceived as discrete phonemes. Within a particular part of the continuum, the percepts are perceived as the same, with a sharp change of perception at the position of the continuum where there is identity change. Categorical perception is opposed to continuous perception, the perception of different sensory phenomena as being located on a smooth continuum.

How the neural systems in the brain engages in this many-to-one mapping is a major issue in cognitive neuroscience. Categorical perception (CP) can be inborn or can be induced by learning. Initially it was taken to be peculiar to speech and color perception. However CP turns out to general, and related to how neural networks in our brains detect the features that allow us to sort the things in the world into separate categories by "warping" perceived similarities and differences so that they compress some things into the same category and separate others into different ones.

An area in the left prefrontal cortex has been localized as the place in the brain responsible for phonetic categorical perception and possibly other types of categorical perception.

Categorization

A category, or kind, is a set of things. Membership in the category may be (1) all-or-none, as with "bird": Something either is a bird or it isn't a bird; a penguin is 100% bird, a dog is 100% not-bird. In this case we would call the category "categorical." Or membership might be (2) a matter of degree, as with "big": Some things are more big and some things are less big. In this case the category is "continuous" (or rather, degree of membership corresponds to some point along a continuum). There are range or context effects as well: elephants are relatively big in the context of animals, relatively small in the context of bodies in general, if we include planets.

Many categories, however, particularly concrete sensori-motor categories (things we can see and touch), are a mixture of the two: categorical at an everyday level of magnification, but continuous at a more microscopic level. An example of this is color categories: Central reds are clearly reds, and not shades of yellow. But in the orange region of the spectral continuum, red/yellow is a matter of degree; context and contrast effects can also move these regions around somewhat. Perhaps even with "bird," an artist or genetic-engineer could design intermediate cases in which their "birdness" was only a matter of degree.

Resolving the "blooming, buzzing confusion"

Categories are important because they determine how we see and act upon the world. As William James noted, we do not see a continuum of "blooming, buzzing confusion" but an orderly world of discrete objects. Some of these categories are "prepared" in advance by evolution: The frog's brain is born already able to detect "flies"; it needs only normal exposure rather than any special learning in order to recognize and catch them. Humans have such innate category-detectors too: The human face itself is probably an example. So too are our basic color categories, although one implication of the Sapir–Whorf hypothesis (Whorf 1956; also called the "linguistic relativity" hypothesis) might be that colors are determined by how culture and language happen to subdivide the spectrum.

But if one opens up a dictionary at random and picks out a content word, chances are that it names a category we have learned to detect, rather than one that our brains were innately prepared in advance by evolution to detect. The generic human face may be an innate category for us, perhaps even the various basic emotions it can express, but surely all the specific people we know and can name are not. "Red" and "yellow" may be inborn, but "scarlet" and "crimson"?

The motor theory of speech perception

And what about the very building blocks of the language we use to name categories: Are our speech-sounds —/ba/, /da/, /ga/ —innate or learned? The first question we must answer about them is whether they are categorical categories at all, or merely arbitrary points along a continuum. It turns out that if one analyzes the sound spectrogram of ba and pa, for example, both are found to lie along an acoustic continuum called "voice-onset-time." With a technique similar to the one used in "morphing" visual images continuously into one another, it is possible to "morph" a /ba/ gradually into a /pa/ and beyond by gradually increasing the voicing parameter.

Alvin Liberman and colleagues (he did not talk about voice onset time in that paper) reported that when people listen to sounds that vary along the voicing continuum, they hear only /ba/s and /pa/s, nothing in between. This effect—in which a perceived quality jumps abruptly from one category to another at a certain point along a continuum, instead of changing gradually—he dubbed "categorical perception" (CP). He suggested that CP was unique to speech, that CP made speech special, and, in what came to be called "the motor theory of speech perception," he suggested that CP's explanation lay in the anatomy of speech production.

According to the (now abandoned) motor theory of speech perception, the reason people perceive an abrupt change between /ba/ and /pa/ is that the way we hear speech sounds is influenced by how people produce them when they speak. What is varying along this continuum is voice-onset-time: the "b" in /ba/ is voiced and the "p" in /pa/ is not. But unlike the synthetic "morphing" apparatus, people's natural vocal apparatus is not capable of producing anything in between ba and pa. So when one hears a sound from the voicing continuum, their brain perceives it by trying to match it with what it would have had to do to produce it. Since the only thing they can produce is /ba/ or /pa/, they will perceive any of the synthetic stimuli along the continuum as either /ba/ or /pa/, whichever it is closer to. A similar CP effect is found with ba/da; these too lie along a continuum acoustically, but vocally, /ba/ is formed with the two lips, /da/ with the tip of the tongue and the alveolar ridge, and our anatomy does not allow any intermediates.

The motor theory of speech perception explained how speech was special and why speech-sounds are perceived categorically: sensory perception is mediated by motor production. Wherever production is categorical, perception will be categorical; where production is continuous, perception will be continuous. And indeed vowel categories like a/u were found to be much less categorical than ba/pa or ba/da.

Acquired distinctiveness

If motor production mediates sensory perception, then one assumes that this CP effect is a result of learning to produce speech. Eimas et al. (1971), however, found that infants already have speech CP before they begin to speak. Perhaps, then, it is an innate effect, evolved to "prepare" us to learn to speak. But Kuhl (1987) found that chinchillas also have "speech CP" even though they never learn to speak, and presumably did not evolve to do so. Lane (1965) went on to show that CP effects can be induced by learning alone, with a purely sensory (visual) continuum in which there is no motor production discontinuity to mediate the perceptual discontinuity. He concluded that speech CP is not special after all, but merely a special case of Lawrence's classic demonstration that stimuli to which you learn to make a different response become more distinctive and stimuli to which you learn to make the same response become more similar.

It also became clear that CP was not quite the all-or-none effect Liberman had originally thought it was: It is not that all /pa/s are indistinguishable and all /ba/s are indistinguishable: We can hear the differences, just as we can see the differences between different shades of red. It is just that the within-category differences (pa1/pa2 or red1/red2) sound/look much smaller than the between-category differences (pa2/ba1 or red2/yellow1), even when the size of the underlying physical differences (voicing, wavelength) are actually the same.

The modern definition

This evolved into the contemporary definition of CP, which is no longer peculiar to speech or dependent on the motor theory: CP occurs whenever perceived within-category differences are compressed and/or between-category differences are separated, relative to some baseline of comparison. The baseline might be the actual size of the physical differences involved, or, in the case of learned CP, it might be the perceived similarity or discriminability within and between categories before the categories were learned, compared to after.

The typical learned CP experiment would be the following: A set of stimuli is tested (usually in pairs) for similarity or discriminability. In the case of similarity, Multidimensional scaling might be used to scale the rated pairwise similarity of the set of stimuli. In the case of discriminability, same/different judgments and signal detection analysis might be used to estimate the pairwise discriminability of a set of stimuli. Then the same subjects or a different set are trained, using trial and error and corrective feedback, to sort the stimuli into two or more categories. After the categorization has been learned, similarity or discriminability are tested again, and compared against the untrained data. If there is significant within-category compression and/or between-category separation, this is operationally defined as CP.

The Whorf hypothesis

According to the Sapir–Whorf hypothesis (of which Lawrence's acquired similarity/distinctiveness effects would simply be a special case), colors are perceived categorically only because they happen to be named categorically: Our subdivisions of the spectrum are arbitrary, learned, and vary across cultures and languages. But Berlin & Kay (1969) suggested that this was not so: Not only do most cultures and languages subdivide and name the color spectrum the same way, but even for those who don't, the regions of compression and separation are the same. We all see blues as more alike and greens as more alike, with a fuzzy boundary in between, whether or not we have named the difference. This view has been challenged in a review article by Regier and Kay (2009) who discuss a distinction between the questions "1. Do color terms affect color perception?" and "2. Are color categories determined by largely arbitrary linguistic convention?". They report evidence that linguistic categories, stored in the left hemisphere of the brain for most people, do affect categorical perception but primarily in the right-eye visual field, and that this effect is eliminated with a concurrent verbal interference task.

Evolved CP

First, back to vowels. The signature of CP is within-category compression and/or between-category separation. The size of the CP effect is merely a scaling factor; it is this compression/separation "accordion effect," that is CP's distinctive feature. In this respect, the "weaker" CP effect for vowels, whose motor production is continuous rather than categorical, but whose perception is by this criterion categorical, is every bit as much of a CP effect as the ba/pa and ba/da effects. But, as with colors, it looks as if the effect is an innate one: Our sensory category detectors for both color and speech sounds are born already "biased" by evolution: Our perceived color and speech-sound spectrum is already "warped" with these compression/separations.

Learned CP

The Lane/Lawrence demonstrations, lately replicated and extended by Goldstone (1994), showed that CP can be induced by learning alone. There are also the countless categories cataloged in our dictionaries that, according to categorical perception, are unlikely to be inborn. Nativist theorists such as Fodor [1983] have sometimes seemed to suggest that all of our categories are inborn. There are recent demonstrations that, although the primary color and speech categories may be inborn, their boundaries can be modified or even lost as a result of learning, and weaker secondary boundaries can be generated by learning alone.

In the case of innate CP, our categorically biased sensory detectors pick out their prepared color and speech-sound categories far more readily and reliably than if our perception had been continuous.

Learning is a cognitive process that results in a relatively permanent change in behavior. Learning can influence perceptual processing. Learning influences perceptual processing by altering the way in which an individual perceives a given stimulus based on prior experience or knowledge. This means that the way something is perceived is changed by how it was seen, observed, or experienced before. The effects of learning can be studied in categorical perception by looking at the processes involved.

Learned categorical perception can be divided into different processes through some comparisons. The processes can be divided into between category and within category groups of comparison . Between category groups are those that compare between two separate sets of objects. Within category groups are those that compare within one set of objects. Between subjects comparisons lead to a categorical expansion effect. A categorical expansion occurs when the classifications and boundaries for the category become broader, encompassing a larger set of objects. In other words, a categorical expansion is when the "edge lines" for defining a category become wider. Within subjects comparisons lead to a categorical compression effect. A categorical compression effect corresponds to the narrowing of category boundaries to include a smaller set of objects (the "edge lines" are closer together). Therefore, between category groups lead to less rigid group definitions whereas within category groups lead to more rigid definitions.

Another method of comparison is to look at both supervised and unsupervised group comparisons. Supervised groups are those for which categories have been provided, meaning that the category has been defined previously or given a label; unsupervised groups are groups for which categories are created, meaning that the categories will be defined as needed and are not labeled.

In studying learned categorical perception, themes are important. Learning categories is influenced by the presence of themes. Themes increase quality of learning. This is seen especially in cases where the existing themes are opposite. In learned categorical perception, themes serve as cues for different categories. They assist in designating what to look for when placing objects into their categories. For example, when perceiving shapes, angles are a theme. The number of angles and their size provide more information about the shape and cue different categories. Three angles would cue a triangle, whereas four might cue a rectangle or a square. Opposite to the theme of angles would be the theme of circularity. The stark contrast between the sharp contour of an angle and the round curvature of a circle make it easier to learn.

Similar to themes, labels are also important to learned categorical perception. Labels are “noun-like” titles that can encourage categorical processing with a focus on similarities. The strength of a label can be determined by three factors: analysis of affective (or emotional) strength, permeability (the ability to break through) of boundaries, and a judgment (measurement of rigidity) of discreteness. Sources of labels differ, and, similar to unsupervised/supervised categories, are either created or already exist. Labels affect perception regardless of their source. Peers, individuals, experts, cultures, and communities can create labels. The source doesn’t appear to matter as much as mere presence of a label, what matters is that there is a label. There is a positive correlation between strength of the label (combination of three factors) and the degree to which the label affects perception, meaning that the stronger the label, the more the label affects perception.

Cues used in learned categorical perception can foster easier recall and access of prior knowledge in the process of learning and using categories. An item in a category can be easier to recall if the category has a cue for the memory. As discussed, labels and themes both function as cues for categories, and, therefore, aid in the memory of these categories and the features of the objects belonging to them.

There are several brain structures at work that promote learned categorical perception. The areas and structures involved include: neurons, the prefrontal cortex, and the inferotemporal cortex. Neurons in general are linked to all processes in the brain and, therefore, facilitate learned categorical perception. They send the messages between brain areas and facilitate the visual and linguistic processing of the category. The prefrontal cortex is involved in “forming strong categorical representations.” The inferotemporal cortex has cells that code for different object categories and are turned along diagnostic category dimensions, areas distinguishing category boundaries.

The learning of categories and categorical perception can be improved through adding verbal labels, making themes relevant to the self, making more separate categories, and by targeting similar features that make it easier to form and define categories.

Learned categorical perception occurs not only in human species but has been demonstrated in animal species as well. Studies have targeted categorical perception using humans, monkeys, rodents, birds, frogs. These studies have led to numerous discoveries. They focus primarily on learning the boundaries of categories, where inclusion begins and ends, and they support the hypothesis that categorical perception does have a learned component.

Computational and neural models

Computational modeling (Tijsseling & Harnad 1997; Damper & Harnad 2000) has shown that many types of category-learning mechanisms (e.g. both back-propagation and competitive networks) display CP-like effects. In back-propagation nets, the hidden-unit activation patterns that "represent" an input build up within-category compression and between-category separation as they learn; other kinds of nets display similar effects. CP seems to be a means to an end: Inputs that differ among themselves are "compressed" onto similar internal representations if they must all generate the same output; and they become more separate if they must generate different outputs. The network's "bias" is what filters inputs onto their correct output category. The nets accomplish this by selectively detecting (after much trial and error, guided by error-correcting feedback) the invariant features that are shared by the members of the same category and that reliably distinguish them from members of different categories; the nets learn to ignore all other variation as irrelevant to the categorization.

Brain basis

Neural data provide correlates of CP and of learning. Differences between event-related potentials recorded from the brain have been found to be correlated with differences in the perceived category of the stimulus viewed by the subject. Neural imaging studies have shown that these effects are localized and even lateralized to certain brain regions in subjects who have successfully learned the category, and are absent in subjects who have not.

Categorical perception is identified with the left prefrontal cortex with this showing such perception for speech units while this is not by posterior areas earlier in their processing such as areas in the left superior temporal gyrus.

Language-induced

Both innate and learned CP are sensorimotor effects: The compression/separation biases are sensorimotor biases, and presumably had sensorimotor origins, whether during the sensorimotor life-history of the organism, in the case of learned CP, or the sensorimotor life-history of the species, in the case of innate CP. The neural net I/O models are also compatible with this fact: Their I/O biases derive from their I/O history. But when we look at our repertoire of categories in a dictionary, it is highly unlikely that many of them had a direct sensorimotor history during our lifetimes, and even less likely in our ancestors' lifetimes. How many of us have seen a unicorn in real life? We have seen pictures of them, but what had those who first drew those pictures seen? And what about categories I cannot draw or see (or taste or touch): What about the most abstract categories, such as goodness and truth?

Some of our categories must originate from another source than direct sensorimotor experience, and here we return to language and the Whorf Hypothesis: Can categories, and their accompanying CP, be acquired through language alone? Again, there are some neural net simulation results suggesting that once a set of category names has been "grounded" through direct sensorimotor experience, they can be combined into Boolean combinations (man = male & human) and into still higher-order combinations (bachelor = unmarried & man) which not only pick out the more abstract, higher-order categories much the way the direct sensorimotor detectors do, but also inherit their CP effects, as well as generating some of their own. Bachelor inherits the compression/separation of unmarried and man, and adds a layer of separation/compression of its own.

These language-induced CP-effects remain to be directly demonstrated in human subjects; so far only learned and innate sensorimotor CP have been demonstrated. The latter shows the Whorfian power of naming and categorization, in warping our perception of the world. That is enough to rehabilitate the Whorf Hypothesis from its apparent failure on color terms (and perhaps also from its apparent failure on eskimo snow terms), but to show that it is a full-blown language effect, and not merely a vocabulary effect, it will have to be shown that our perception of the world can also be warped, not just by how things are named but by what we are told about them.

Emotion

Emotions are an important characteristic of the human species. An emotion is an abstract concept that is most easily observed by looking at facial expressions. Emotions and their relation to categorical perception are often studied using facial expressions. Faces contain a large amount of valuable information.

Emotions are divided into categories because they are discrete from one another. Each emotion entails a separate and distinct set of reactions, consequences, and expressions. The feeling and expression of emotions is a natural occurrence, and, it is actually a universal occurrence for some emotions. There are six basic emotions that are considered universal to the human species across age, gender, race, country, and culture and that are considered to be categorically distinct. These six basic emotions are: happiness, disgust, sadness, surprise, anger, and fear. According to the discrete emotions approach, people experience one emotion and not others, rather than a blend. Categorical perception of emotional facial expressions does not require lexical categories. Of these six emotions, happiness is the most easily identified.

The perception of emotions using facial expressions reveals slight gender differences based on the definition and boundaries (essentially, the “edge line” where one emotion ends and a subsequent emotion begins) of the categories. The emotion of anger is perceived easier and quicker when it is displayed by males. However, the same effects are seen in the emotion of happiness when portrayed by women. These effects are essentially observed because the categories of the two emotions (anger and happiness) are more closely associated with other features of these specific genders.

Although a verbal label is provided to emotions, it is not required to categorically perceive them. Before language in infants, they can distinguish emotional responses. The categorical perception of emotions is by a "hardwired mechanism". Additional evidence exists showing the verbal labels from cultures that may not have a label for a specific emotion but can still categorically perceive it as its own emotion, discrete and isolated from other emotions. The perception of emotions into categories has also been studied using the tracking of eye movements which showed an implicit response with no verbal requirement because the eye movement response required only the movement and no subsequent verbal response.

The categorical perception of emotions is sometimes a result of joint processing. Other factors may be involved in this perception. Emotional expression and invariable features (features that remain relatively consistent) often work together. Race is one of the invariable features that contribute to categorical perception in conjunction with expression. Race can also be considered a social category. Emotional categorical perception can also be seen as a mix of categorical and dimensional perception. Dimensional perception involves visual imagery. Categorical perception occurs even when processing is dimensional.

References

Categorical perception Wikipedia

(Text) CC BY-SA

Contents