Speech perception

Also found in: Dictionary, Thesaurus, Medical, Wikipedia.
Related to Speech perception: speech production

Speech perception

A term broadly used to refer to how an individual understands what others are saying. More narrowly, speech perception is viewed as the way a listener can interpret the sound that a speaker produces as a sequence of discrete linguistic categories such as phonemes, syllables, or words. See Psycholinguistics

Classical work in the 1950s and 1960s concentrated on uncovering the basic acoustic cues that listeners use to hear the different consonants and vowels of a language. It revealed a surprisingly complex relationship between sound and percept. The same physical sound (such as a noise burst at a particular frequency) can be heard as different speech categories depending on its context (as “k” before “ah,” but as “p” before “ee” or “oo”), and the same category can be cued by different sounds in different contexts. Spoken language is thus quite unlike typed or written language, where there is a relatively invariant relationship between the physical stimulus and the perceived category.

The reasons for the complex relationship lie in the way that speech is produced: the sound produced by the mouth is influenced by a number of continuously moving and largely independent articulators. This complex relationship has caused great difficulties in programming computers to recognize speech, and it raises a paradox. Computers readily recognize the printed word but have great difficulty recognizing speech. Human listeners, on the other hand, find speech naturally easy to understand but have to be taught to read (often with difficulty). It is possible that humans are genetically predisposed to acquire the ability to understand speech, using special perceptual mechanisms usually located in the left cerebral hemisphere. See Hemispheric laterality

Building on the classical research, the more recent work has drawn attention to the important contribution that vision makes to normal speech perception; has explored the changing ability of infants to perceive speech and contrasted it with that of animals; and has studied the way that speech sounds are coded by the auditory system and how speech perception breaks down in those with hearing impairment.There has also been substantial research on the perception of words in continuous speech.

Adult listeners are exquisitely sensitive to the differences between sounds that are distinctive in their language. The voicing distinction in English (between “b” and “p”) is cued by the relative timing of two different events (stop release and voice onset). At a difference of around 30 milliseconds, listeners hear an abrupt change from one category to another, so that a shift of only 5 ms can change the percept. On the other hand, a similar change around a different absolute value, where both sounds are heard as the same category, would be imperceptible. The term categorical perception refers to this inability to discriminate two sounds that are heard as the same speech category.

Categorical perception can arise for two reasons: it can have a cause that is independent of the listener's language—for instance, the auditory system may be more sensitive to some changes than to others; or it can be acquired as part of the process of learning a particular language. The example described above appears to be language-independent, since similar results have been found in animals such as chinchillas whose auditory systems resemble those of humans. But other examples have a language-specific component. The ability to hear a difference between “r” and “l” is trivially easy for English listeners, but Japanese perform almost at chance unless they are given extensive training. How such language-specific skills are developed has become clearer following intensive research on speech perception in infants.

Newborn infants are able to distinguish many of the sounds that are contrasted by the world's languages. Their pattern of sucking on a blind nipple signals a perceived change in a repeated sound. They are also able to hear the similarities between sounds such as those that are the same vowel but have different pitches. The ability to respond to such a wide range of distinctions changes dramatically in the first year of life. By 12 months, infants no longer respond to some of the distinctions that are outside their native language, while infants from language communities that do make those same distinctions retain the ability. Future experience could reinstate the ability, so it is unlikely that low-level auditory changes have taken place; the distinctions, although still coded by the sensory system, do not readily control the infant's behavior.

Although conductive hearing losses can generally be treated adequately by appropriate amplification of sound, sensorineural hearing loss involves a failure of the frequency-analyzing mechanism in the inner ear that humans cannot yet compensate for. Not only do sounds need to be louder before they can be heard, but they are not so well separated by the ear into different frequencies. Also, the sensorineurally deaf patient tolerates only a limited range of intensities of sound; amplified sounds soon become unbearable (loudness recruitment).

These three consequences of sensorineural hearing loss lead to severe problems in perceiving a complex signal such as speech. Speech consists of many rapidly changing frequency components that normally can be perceptually resolved. The lack of frequency resolution in the sensorineural patient makes it harder for the listener to identify the peaks in the spectrum that distinguish the simplest speech sounds from each other; and the use of frequency-selective automatic gain controls to alleviate the recruitment problem reduces the distinctiveness of different sounds further. These patients may also be less sensitive than people with normal hearing to sounds that change over time, a disability that further impairs speech perception.

Some profoundly deaf patients can identify some isolated words by using multichannel cochlear implants. Sound is filtered into different frequency channels, or different parameters of the speech are automatically extracted, and electrical pulses are then conveyed to different locations in the cochlea by implanted electrodes. The electrical pulses stimulate the auditory nerve directly, bypassing the inactive hair cells of the damaged ear. Such devices cannot reconstruct the rich information that the normal cochlear feeds to the auditory nerve. See Hearing (human), Perception, Psychoacoustics, Speech

McGraw-Hill Concise Encyclopedia of Bioscience. © 2002 by The McGraw-Hill Companies, Inc.
References in periodicals archive ?
Bilingual children achieve the same benefit in speech perception and language development after cochlear implantation as their monolingual counterpart do.
A key goal was to explore the transfer of nonlinguistic musical skills to specific aspects of speech perception. Using a baseline and posttraining paradigm, the relative efficacy of the two training programs was compared.
"Auditory-visual fusion in speech perception in children with cochlear implants", Proceedings of the National Academy of Sciences of the United States of America, 102(51), pp.18748-50, http://dx.doi.org/10.1073/pnas.0508862102
Mattingly, "The motor theory of speech perception revised," Cognition, vol.
Strange (Ed.) Speech perception and linguistic experience: Issues in cross-language research (pp.
His speech perception test score improved from 0% preoperatively to 82% (with EAS) after 14 months CI experience.
Complexity of acoustic-production-based models of speech perception. Proceedings of Acoustics' 08 (pp.
In light of the above and the fact that the same acoustic effect can be achieved by different articulatory manoeuvres (MacNeilage, 1972), the motor theory of speech perception has argued for a more specialised and abstract level of articulation, one in which motor gestures used by listeners to decode speech to key articulators are said to correspond to the speaker's intentions (Liberman & Mattingly, 1985: 23).
At the facility, ARL researchers conduct basic and applied auditory and speech perception research to help Soldiers improve situational awareness and increase survivability.
Patel opens the discussion of rhythm in speech by defining three approaches: the typological, with interests in the rhythmic similarities of human language ("stress-timed," "syllable-timed," "mora-timed"); the theoretical, which attempts to identify the principles of rhythmic shape (i.e., metrical phonology) and provide formal rules for rhythm in and across languages; the perceptual, which attempts to understand the role that rhythm and rhythmic predictability plays in speech perception. Patel's goal is to add to each of these conversations through comparison with musical rhythm.