MIT Spinout Affectiva Adds Voice Analysis to Its Emotion-Sensing Tech

Xconomy Boston — 

Machines are getting better at understanding human speech. Soon, they could also be able to perceive our emotions from the way we talk—and respond accordingly.

The latest step toward that longstanding vision comes from Affectiva, an MIT Media Lab spinout that has been developing emotion-sensing technology and products since 2009. In recent years, Boston-based Affectiva has focused on a vision-based approach, using webcams and optical sensors to analyze people’s facial expressions and non-verbal cues, for applications in areas such as advertising, marketing, and video games.

Now, Affectiva is expanding into voice analysis. Today, at the company’s Emotion AI Summit, it announced the release of a cloud-based software product that it says can measure emotion in speech. In a press release, Affectiva says the product observes changes in “speech paralinguistics, tone, volume, speed, and voice quality to distinguish anger, laughter, arousal, and the speaker’s gender.”

“We’ve been primarily focused on facial expressions,” says Affectiva co-founder and CEO Rana el Kaliouby (pictured above) in a phone interview. “But our vision is to build this multi-modal emotion A.I. platform that senses emotions the way humans do.”

Programming computers to understand human emotion has been talked about for decades, but the field has started moving closer to its ultimate goal in recent years, thanks in part to advances in computer vision, speech recognition, deep learning, and related technologies.

Affectiva isn’t the only company working on this, of course. Other emotion-sensing companies include Emotient, a San Diego-based startup reportedly acquired by Apple last year, and Eyeris, based in the Bay Area; both of them are focused on facial analysis. Meanwhile, speech-based emotion analysis companies include EMOSpeech and Vokaturi.

But few companies have attempted to combine facial and speech analysis for measuring emotions, el Kaliouby says. (IDAvatars, based in Mequon, WI, is one of them.) It’s a complex task, she adds.

“We have to be very thoughtful about how we fuse them together,” she says.

Often, the emotions conveyed by the voice and facial expressions will match—you might clearly hear anger in the voice and see it on the person’s face. But sometimes the vocal and facial signals “may disagree,” el Kaliouby says. “And in that case, you need to make sure the system has learned how to handle these types of situations,” she adds.

Affectiva—which has raised more than $26 million from investors such as Kleiner Perkins Caufield Byers and Horizon Ventures—is generating revenue mainly by selling its facial-analysis technology for market research and testing advertisements, el Kaliouby says. More than 1,400 brands worldwide use Affectiva’s technology for those purposes. She declined to share revenue figures, but says the 50-person company is not profitable.

Speech analysis could open up more business opportunities, she says. Affectiva’s technology could help make personal virtual assistants like Siri, Cortana, and Alexa more emotionally aware, which could enable them to have more engaging conversations and “build relationships with the user,” el Kaliouby says. That could, in turn, help the companies behind those virtual assistants establish a “deeper connection with their consumers,” she adds.

Call centers are another potential customer group for Affectiva. Voice-based emotion sensing could help automated customer service agents recognize that a caller is upset, and adapt to that, el Kaliouby says. The technology could also aid call center operations by analyzing the stress level of human workers, she adds. (Cogito, another Boston company, already sells voice analysis software to call centers. El Kaliouby says she sees Cogito as a potential partner, rather than a competitor.)

Affectiva is also getting interest from car companies, el Kaliouby says. Imagine a virtual assistant embedded in the car’s dashboard, making suggestions to the driver about routes to take to avoid traffic, or possible places the person might want to stop and visit. The software “needs to know if it’s annoying you, confusing you, or overloading you,” she says. “You could imagine your car making a suggestion, you scowl at it. ‘Sorry, you didn’t like that, I’ll try something different.’ We’re definitely on that path.”

And as automakers make progress on autonomous vehicles, they need to think about the kind of “in-car experience” they want to create for people, el Kaliouby says.

“What kind of branding do they want?” she says. “Is it exciting? Is it calming? Is it fun? [Does it allow] you to be productive? Understanding the sentiment of the occupants in the car is going to be really critical.”

Affectiva has been working on the speech software for about a year, el Kaliouby says. The company hired Taniya Mishra last year to be its lead speech scientist. She previously worked at Franklin, MA-based Interactions, which develops virtual assistants that can handle customer service requests.

Affectiva’s new offering is an application programming interface (API) that other companies can integrate in their devices and other software-based products. Affectiva will charge customers licensing fees to use the software, or perhaps set up revenue-sharing agreements, el Kaliouby says.

The API is a beta version that can be used to analyze speech recordings. The plan is to release a version of the software in the next few months that can analyze speech in real time, making calculations on the device rather than in the cloud, el Kaliouby says.

Part of the idea with the initial rollout is that early adopters of the software will share data with Affectiva so it can improve the technology, she says.

“Data is really critical because it allows us to improve the accuracy but also customize [the technology] for certain use cases,” she says.