Speech Startup VocaliD Creates Personalized Voices With Crowd’s Help
[Updated, 10am. See below.] Losing the ability to speak is harrowing, and particularly poignant in conditions such as amyotrophic lateral sclerosis (known as ALS), stroke, or cerebral palsy. Yet many who can’t speak still can make some sounds with their voice—think gleeful yelps of joy or sorrowful shrieks of pain.
Those sounds are key to the work of Rupal Patel and her team at Belmont, MA-based startup VocaliD, who say they can now recreate a person’s voice, digitally.
For years, people with speaking difficulties have used computer-generated voices to communicate—among the most emblematic being renowned physicist Stephen Hawking, who has ALS. Users type in words to a computer or pick buttons that represent words to have a machine speak for them. The result is usually a robotic-sounding voice, generic and monotone, something that doesn’t account for individuality. And, for children with speech problems, it can feel awkward to take on a replacement that sounds like an adult, which account for most robotic voices.
VocaliD creates a computer voice that sounds like the user once did—or, in the case of someone who has never spoken, sounds like what they would have. Patel, a speech clinician and scientist who is a tenured professor at Northeastern University, uses the small noises from something like a laugh or hum from a person who can’t speak and records it. From the recording, Patel and her colleagues run the voice through their own computer software, which analyzes it for pitch, loudness, and rhythm, among myriad other factors.
“My thought was that those sounds have some kind of specificity,” Patel says. “We started trying to figure out if there’s a way we can take whatever is leftover in their voice, and create a unique computerized voice that would sound like them.”
Because the data from a client’s sample is limited, the company also gathers information from volunteers who offer to “donate” their own voices to be the cornerstone of building a new voice. VocaliD asks people to visit its website and record themselves speaking approximately 3,500 short sentences per user (only a few hundred at a time). The company then matches the volunteers with its clients based on age, height, gender, and other details that impact the way a voice sounds.
With those donor recordings, VocaliD’s software studies the donor’s speech characteristics—in essence, the way their lips move to enunciate vowels and consonants. For some who can’t speak, like an ALS patient, the main thing restricting speaking is an inability to move the tongue and lips, which filter sound and allow for speech.
VocaliD blends the two data sets to create a new, unique voice, Patel says. It is careful not to incorporate too much of the donor’s sound characteristics—an important point, since there is far more detailed data from people with fuller speaking abilities, she says.
“I want to make a blend,” she says. “I want to make it sound different enough from the donor, yet as clear as the donor.”
It’s still very early, but Patel has found definitive interest in the technology. Since launching an Indiegogo campaign last month, the startup has raised about $92,000, which is above the $70,000 goal. That includes four people who have contributed $10,000 each to be the first to receive a VocaliD voice in 2015. (One of the four is waiting to complete their own $10,000 Indiegogo Life campaign, the personal fundraiser part of the company.)
A VocaliD voice won’t typically be that costly, according to the company. Around 30 others have bought a voice for the pre-order price of $1,000, Patel says, which is lower than the cost of actually creating it. The company expects to make money through a subscription service, instead. Those pre-orders should be delivered in 2016, she says.
The subscription pays for some of the costs of building the device that VocaliD didn’t recoup in the original purchase price, as well as updates, Patel says. People can use the voice on specialized tablets made for voice communication, which allow users to select words or images, Patel says. It can also be loaded onto their own computer or smartphone, where a person can type in sentences, Patel says.
During the pre-order, the company is charging $20 per month, though that will depend on demand, Patel says. [An earlier version of this article had information about future pricing, which was erroneous. That sentence has been removed—Eds.]
Though she began work on the technology seven years ago, Patel didn’t make VocaliD a business until May 2014. She received a $150,000 grant from the National Science Foundation in December. Meanwhile, the startup went through the MassChallenge accelerator program last year.
VocaliD began the Indiegogo campaign to prove that there is demand in the market. The funds from crowdsourcing will help the company fulfill any pre-orders for 2015, and Patel expects she’ll need to raise a few hundred thousand dollars more. In addition to grant funds, she may pursue angel investors or further crowdfunding. The company is also working to improve its technology and reduce the cost of producing the voices, Patel says.
Patel is talking to organizations like the ALS Association about potentially working together. ALS patients may be able to store their voice in the company’s Voicebank, where VocaliD stores donors’ recordings, before the disease takes a patient’s ability to speak, Patel says. More than 10,000 people are actively working on or have completed donating their voice, she says.
All of which speaks, quite literally, to Patel’s greater goal. “I would love to save the voices of people who are losing them,” she says.