eirias | On the perception of intonation from sinusoidal sequences (Reply)

Remez, Robert E., and Rubin, Philip E. (1984). On the perception of intonation from sinusoidal sequences. Perception and Psychophysics, 35(5): 429-440.

There have been a number of studies aimed at isolating important aspects of speech by removing certain acoustic cues (e.g., noise bursts, low-frequency burbles, particular formant transitions) from recorded speech and seeing how people respond to the leftovers. However, the authors note, sine wave speech is not an example of this, because at no point in time do the sinewave stimuli look like actual speech. I'm conceptualizing the difference in this fashion: in the first instance, you have experimenters altering the sounds horizontally (along the time dimension), so that tiny bits and pieces are chopped out and left in, whereas in the second instance, the sounds are altered, for lack of a better word, "vertically" - at all points, all frequencies except for the center frequencies of the formants are removed.

This being the case, one of the odd things about sinewave speech is that there's no fundamental frequency of phonation (need to read up on that), and yet people are still able to perceive intonation in the speech, albeit wacky intonation. This article details four experiments done to figure out whence people are getting their concepts of intonation.

Experiment 1
There are three obvious possible sources of intonation: 1) Listeners may reconstruct a kind of "fundamental contour" based on the amplitude changes in the sound samples (which, in natural speech, are related to fundamental frequency changes); 2) Listeners may reconstruct a more ordinary fundamental contour based on the harmonic relationships between the three sine-wave pitches; 3) Listeners may base their judgments based on one of the three existing sine-wave contours. This was tested by constructing examples of all five possible contours - Tone 1, Tone 2, Tone 3, Amplitude-Fundamental, and Harmonic-Fundamental - and having subjects compare the intonation pattern in the full sine-wave sound to each. Long story short, subjects paid lots of attention to Tone 1 and very little to any of the other cues. However, Tone 1 was louder than any of the other tones. Hence...

Experiment 2
In this experiment, the center frequencies were made equal and then inverted in terms of amplitude dominance. Tone 1 still won. However, it was still possible that it was not Tone 1 itself that caused this similarity, but rather, some sort of induction process using the higher tones which just happened to bear the most resemblance to Tone 1. Hence...

Experiment 3
This time, the experimenters had two comparison standards: one sine-wave sentence with all three tones, and another with only the top two tones. Subjects were again asked to compare the three possible tones to the standards and say which contour was the most similar. As expected, when compared against the three-tone sine-wave standard, Tone 1 won. However, when compared against the two-tone standard, Tone 2 won. If induction between Tones 2 & 3 were to blame for Tone 1's superiority, then its absence from the standard wouldn't change its similarity to the whole. Since the similarity ratings did change, it's clear that it really was Tone 1 that people were listening to.

The conclusion the authors would like to draw from all of this is that Tone 1 is important because it falls within a range of frequencies that the human ear pays a lot of attention to (400 - 1000 Hz). This range corresponds roughly to the third through the fifth harmonics of the average fundamental frequency in spoken language. Unfortunately, the existing data didn't rule out the possibility that subjects were just paying attention to the lowest-frequency tone in the bunch, so they decided to bring us...

Experiment 4
In this final study, the authors inserted a fourth tone into the sinusoidal complex - a plausible fundamental frequency tone, well below the "dominance region" of 400 - 1000 Hz - and asked subjects which of the four component tones bore the most resemblence to the whole. Yet again, Tone 1 beat all other comers, including the new Tone 0. This provides evidence that it is the pitches in the range of certain harmonics of the fundamental, and not the fundamental itself, which gives us most of our impression about the intonation contour of spoken language.