The stream of speech
Feb. 18th, 2002 06:14 pmRemez, Robert E., and Rubin, Philip E. (1983). The stream of speech. Scandinavian Journal of Psychology, 24: 63-66.
What is it that makes speech speech? I asked my coworker this a couple of months ago, and she laughed and told me nobody's quite sure. This article takes a stab at that question.
A natural first guess - that each speech sound, or phoneme, is represented by a single set of prominent bands of frequencies, or formants - turns out to be quite wrong. If you record the words "dog" and "dig," for instance, and try to chop off the /d/s from each and do a frequency analysis on them, you'll find they don't look the same at all. This is because the vowel that comes after the consonant influences how the consonant itself sounds.
Okay, said researchers, then perhaps the secret is in the broadband transitions between the different phonemes, or the harmonic relationships between them. This was a plausible next guess and it was widely held throughout the sixties and seventies - in fact I think there are people who still believe this. However, Remez, Rubin, and some other colleagues don't think this holds water, because they were able to synthesize sounds that didn't have any of the traditional speech-recognition cues - instead of harmonically related broadband formants, they just had three single-frequency sinewaves modulating in similar patterns - and yet the subjects were able to listen to these sounds as speech.
There are half a dozen articles by these guys on their synthetic speech methods, some of which I'll probably review later. The point of this particular article was to compare these barebones yet interpretable speech stimuli with some visual work done by G. Johansson in the 70's. Without having read his work, I think the gist was this: This guy stuck lights on people's joints and had subjects watch these people walk around in a dark room. The subjects couldn't find any sense in the dots when the actors stood still, but once they started moving, a pattern emerged ("Hey, why's that idiot running around in the dark?!"). Remez & Rubin contend that their sine wave speech stimuli work under the same principle - namely, "the value of each element is established only by virtue of the coherent configuration to which it belongs."
Bonus points to Remez and Rubin for usage of the words "terpsichoric" and "dotty" in the same sentence.
What is it that makes speech speech? I asked my coworker this a couple of months ago, and she laughed and told me nobody's quite sure. This article takes a stab at that question.
A natural first guess - that each speech sound, or phoneme, is represented by a single set of prominent bands of frequencies, or formants - turns out to be quite wrong. If you record the words "dog" and "dig," for instance, and try to chop off the /d/s from each and do a frequency analysis on them, you'll find they don't look the same at all. This is because the vowel that comes after the consonant influences how the consonant itself sounds.
Okay, said researchers, then perhaps the secret is in the broadband transitions between the different phonemes, or the harmonic relationships between them. This was a plausible next guess and it was widely held throughout the sixties and seventies - in fact I think there are people who still believe this. However, Remez, Rubin, and some other colleagues don't think this holds water, because they were able to synthesize sounds that didn't have any of the traditional speech-recognition cues - instead of harmonically related broadband formants, they just had three single-frequency sinewaves modulating in similar patterns - and yet the subjects were able to listen to these sounds as speech.
There are half a dozen articles by these guys on their synthetic speech methods, some of which I'll probably review later. The point of this particular article was to compare these barebones yet interpretable speech stimuli with some visual work done by G. Johansson in the 70's. Without having read his work, I think the gist was this: This guy stuck lights on people's joints and had subjects watch these people walk around in a dark room. The subjects couldn't find any sense in the dots when the actors stood still, but once they started moving, a pattern emerged ("Hey, why's that idiot running around in the dark?!"). Remez & Rubin contend that their sine wave speech stimuli work under the same principle - namely, "the value of each element is established only by virtue of the coherent configuration to which it belongs."
Bonus points to Remez and Rubin for usage of the words "terpsichoric" and "dotty" in the same sentence.
(no subject)
Date: 2002-02-18 12:43 pm (UTC)Essentially, because of a couple nutcase friends of mine, I'm having a hard time swallowing that things as diverse as social skills and math skills can come from the same wellspring, modified by specific factors. Unless, of course, the factors are so darn big that "g" becomes pretty irrelevant. Which is counterproductive to "g" explaining anything(not to say that other theories seem to do any better with these guys).
Or maybe I just know really, really stunted people.
(no subject)
Date: 2002-02-19 03:44 am (UTC)My psychology professors have told me that as a rule, IQ correlates positively with things like self-esteem and popularity. I don't know whether this works in the upper tail or not. The guy I worked for was pretty suspicious of any form of "intelligence" per se other than g simply because there isn't good evidence for it. It's tempting to look at IQ subtests and say, "Oh, this child is exceptionally good at spatial thinking, but sucks rocks at learning vocabulary," but those small-scale measures don't actually correlate with anything we care about. This is in large part why I'm skeptical of things like Gardner's theory of multiple intelligences - it seems like a case of "nice idea, where's your proof?" I will say that I haven't read his work myself, I've only heard other people talk about it, so it's possible that they just haven't done it justice.
I do think it's quite possible that you know a disproportionate number of maladjusted people. Few people's social circles would make good random samples, and in my own experience it seems that people with emotional issues tend to hang together - not that you're "stunted," as you put it ;), but if you have a critical mass of friends that are, they may be more likely to draw in other less well-adjusted people to the circle, etc. It's just a guess. I'm pretty sure that I know a disproportionate number of maladjusted people, myself (though not necessarily on the social domain).
(no subject)
Date: 2002-02-19 05:30 am (UTC)You should talk to the cute person about speech recognition software -- he seems to know a lot about it, and presumably it has some workable models of speech. 'Course, they may not be the models *we* use, but they might shed some light on stuff.