Deep within a New Haven laboratory, there is a tiny room meant to eat sounds up. It has low ceilings and huge gray cones sticking out from the walls, like a funhouse. When I talk, my words sound flat and far away, as if I have just stepped off an airplane and the world is still muffled, distant.
As I touch my ears, Ken Pugh, president of the Yale-affiliated Haskins Laboratories that houses this cone-filled, claustrophobic room, smiles a bit. He tells me that my reaction is not unique.
“When I bring most people in here, they say ‘I’m sorry, I must have a cold — my voice sounds so congested,’” he says, explaining that the chamber eliminates echoes, swallowing sound to create a silent baseline.
While this space is crucial for technical sound experiments, the lab’s studies go beyond isolated tests. Founded in 1935 by scientists Caryl Haskins and Franklin Cooper, Haskins brings together neuroscientists, psychologists, linguists and engineers from across the world to understand human language.
It’s not an easy task. Haskins must draw from many disciplines — speech is not simply a sound. It’s intertwined with the faces we see, the words we read, the ways we move our mouths.
Haskins isn’t just about silence in a walled-off room of cones. It’s about the roar of the real, ever-shifting human voices that stream down the white-walled hallways of Haskins and spill into the streets below. It’s about words, spoken and written. It’s about connection and communication, and about the moments when these connections crack.
I was eight when I started speech therapy. I used to cram my tongue into the side of my mouth when I made “s” sounds so they came out gargled, as if I was holding in a gulp of water as I spoke. It was subtle enough that no one thought to fix it, but as I got older people started to look at me strangely. For two years, I moved between three different speech therapists who gave me thick binders of exercises and colorful stickers but could not help me. Then I met Kathy.
Kathy was a large woman with short blonde hair who smelled of radishes and rain. She came every Thursday with her red canvas bag and sat with me as I read sentences about slippery snakes sliding down sidewalks. Kathy told me that whenever I wanted to say an “s” word, I should make a “t” sound and slide it down from there. So for a year, I went to Tsammy’s house and played tsoccer in the fall. After a while I started going to Sammy’s house. I started playing soccer.
It is not until I speak with Jonathan Preston, a Haskins scientist and former speech therapist, that I realize my initial struggle to improve through traditional speech therapy was not unique. As a speech therapist in the public schools of Rochester, New York, so many of the kids he worked with had “been in and out of speech therapy for years but were just not making any progress.”
At Haskins, Preston has been testing a novel technology that might help many who struggle in a society that puts so much value on clear speech: ultrasound visual feedback. Kids can observe real-time images of their tongue as they make sounds. This allows them to visualize exactly what’s going wrong and make adjustments.
“It’s sort of like a video game for them — we point to areas on the screen and tell them to try to hit them with their tongue,” Preston tells me. “And we’ve had some really stirring success stories. There’s been some kids who had been in speech therapy for four to five years, and in a relatively short period of time after we started them on ultrasound visual feedback, we got them to the point where they didn’t need speech therapy anymore. That’s huge.”
While Preston acknowledges that the technology is not a cure-all, he hopes that it could be integrated into therapy settings to help those who are struggling – those who don’t have a Kathy.
Julia Irwin, a language researcher at Haskins, finds herself fascinated by faces. As a Ph.D. student, while everyone else was studying infant crying, she remembers peering down at the babies, startled not by their sounds, but rather by their expressions.
“Understanding faces is so important in understanding communication,” Irwin says, sitting across from me at a long, gleaming table at Haskins. “Faces mark our identity. They communicate whether we are happy or sad. And they give us speech information.”
Recently, Irwin has moved beyond babies to study how seeing faces influences speech perception. An example: let’s say you listen to the syllable “ba.” Then you watch a video clip in which the same sound is dubbed over a person whose mouth is saying “ga.” Strangely, most of us hear neither “ba” nor “ga” — we hear “da.” Known as the McGurk effect, it’s an illusion. The brain meshes audio and visual inputs.
As Irwin worked with toddlers in separate studies, she began to notice that certain kids weren’t talking as readily. They would make repetitive motions, refusing to make eye contact. She identified these features as pre-diagnosis signs of autism. And it seemed children with autism had more trouble reading speech information from faces: in the McGurk experiment, they were much more likely to hear a sound closer to the audio “ba,” instead of the melded “da.”
Irwin believes this deficit could be due to years of avoiding looking at faces and mouths, a common characteristic of autism. So she thought if autistic children engaged visually with speaking faces, some of their social skills could be improved.
From this hope, the iPad app “Listening to Faces” was born. Children watch a speaking face say a particular word. Unlike McGurk, the face and sound do match up. Yet the word is one that is easily confused with other words if it was guessed based on sound alone. After the face says the word, the children are given several cartoon options to match it: a sleek orange fox, a pair of purple-and-pink striped socks, a man getting an electric shock. If they get two wrong in a row, they are prompted to “Look to the Mouth!”
In an initial study, autistic children showed improvement in audiovisual speech perception.
“It’s very exciting because ultimately, we do all this basic scientific work with the purpose of intervening,” Irwin says. “We have a mandate to use science to give back.”
Like most children, I learned by sounding out words: the word “tulip” was a series of separate, stumbling sounds before it became the flowers I ran to in our backyard each morning.
Pugh says this is how children typically learn: since the written word is a series of individual speech sounds, children must first distinguish those that make up “tulip” and then connect them. When awareness of speech sounds is muddled, reading problems can result. Research at Haskins suggests that disorders like dyslexia can be predicted by early problems in distinguishing individual speech sounds. Yet a recent study by Pugh showed that dyslexic individuals process certain kinds of visual-spatial information more efficiently — many dyslexic individuals excel in art, architecture, and design.
“Many children with dyslexia have gone through their whole life being told that they are not smart,” Pugh says. “The results of this study could direct them toward careers and bring out enthusiasm and self-confidence.”
As I walk with Pugh through the halls of the laboratories, past Buzz Light year posters meant to keep children occupied during brain scans, past the world’s first speech synthesizer in a glass case and a room studded with tiny cameras that capture micro-movements of the mouth, he calls the lab a “strange little utopian community.” I understand what he means.
Everything that seems to hum through Haskins, from the dance of ultrasound tongues to that bizarre “da” in McGurk experiments, refuses to exist in isolation. We don’t hear speech just with our ears – we also hear with our eyes and mouths. I thought reading was just about understanding text, but reading, too, is woven into how we talk, how we listen.
Sometimes I find this deep interdependence unnerving. Can I trust myself if what I hear a person say is changed by the way their mouth moves? If how I read may be tinged with how I learned to say “s” at age nine, sitting with Kathy at the kitchen table, reading about the slippery snake?
And yet at other moments, it seems somehow complete. When I finally step out of the muffled room of cones, my voice sounds deep and startling and very close.