New Speech Recognition Engine Under the Hood at Vlingo; Startup Dumps IBM and Nuance for AT&T

Vlingo, the Cambridge, MA-based startup that makes a suite of speech-to-text applications used by millions of iPhone, BlackBerry, and Nokia mobile device owners, is about to get a brain transplant of sorts. It said today that it will largely abandon a core speech-recognition engine developed by IBM and maintained by Nuance Communications in favor of a system from AT&T Labs in New Jersey.

As part of the shift, says Vlingo CEO Dave Grannan, Vlingo and AT&T have agreed to a long-term strategic alliance. Vlingo’s speech scientists will be able to modify and improve the source code for the AT&T technology, called Watson, while AT&T will take a minority ownership stake in Vlingo. All of Vlingo’s applications will be running on top of the AT&T speech-recognition system by the first quarter of 2010, Grannan says.

Vlingo’s own speech scientists have developed software that exploits information collected from users—the way a Bostonian’s pronunciation of a dictated phrase like “I parked my car” might differ from a New Yorker’s, for example—to build statistical models that help improve speech-recogition accuracy over time. These models provide supplemental input that helps to guide a core speech-recognition engine as it transforms speech sounds into text. Vlingo didn’t build its own core engine—it has long licensed that part of its system from IBM.

The switch from IBM’s engine to AT&T’s is a “best of all worlds” situation for Vlingo, in Grannan’s words. For one thing, he says, the Watson technology simply works better than the IBM recognizer. “Watson is superior on speed and base-level accuracy,” he says. Once the transition is complete, users of Vlingo’s iPhone, BlackBerry, and Nokia apps should notice fewer wrong guesses in the transcriptions of their utterances. Grannan says they’ll also see a few new features, such as automatic punctuation, that Vlingo can now add because it will be able to tinker with Watson’s innards.

But just as important, the switch will help Vlingo disentangle itself from its strained relationship with Nuance.

Burlington, MA-based Nuance (NASDAQ: NUAN) is one of the Boston area’s biggest high-tech firms, and it is the world’s largest specialized provider of speech-related technologies. It offers software for mobile speech recognition that competes directly with Vlingo’s. In June 2008, after losing out to Vlingo on a contract to supply Yahoo with speech-recognition technology for its oneSearch service, Nuance hit Vlingo with a lawsuit alleging that the startup’s technology for improving the accuracy of computerized speech recognition over time overlaps with a 2004 Nuance patent.

Then, early in 2009, IBM assigned the rights to most of its speech-recognition asset to Nuance. This put Vlingo in the awkward position of relying on a technology that’s maintained and supported by its main rival—and of paying royalty checks to the same company it’s battling in court.

Vlingo isn’t breaking its three-year contract with IBM, and may actually continue to use the IBM speech recognizer in simple deployments, Grannan says. But by moving its main products to the AT&T technology, “We now have what we think is a much more strategic partner in the space,” he says.

But Grannan says he’s under no illusion that the AT&T deal will make the Nuance lawsuit go away. “Once we migrate everything to AT&T, Nuance can continue to sue us if they want for alleged infringement that happened with the IBM recognizer that they themselves are licensing to us,” he says. “I don’t know exactly how that works in their minds, but I don’t think the legal angle has ever been the primary motivation.”

Meanwhile, Vlingo—which is the first company to license AT&T’s Watson technology for commercial use—will be able to build on the new core engine to do some nifty new stuff, Grannan says. By applying Vlingo’s own language models, “We got great performance out of the IBM recognizer, and it’s going to get even better with Watson,” he says. “And at the feature level, there will be lots of gee-whizzy things that we’ll be able to do quickly because of our low-level access [to Watson], like automatic punctuation in e-mails and text messages.”

Having AT&T’s core engine under the hood, in other words, “will take our industry-leading position up a notch,” Grannan says. He says that Vlingo and AT&T may also explore new markets for the technology, such as voice-recognition systems for cable set-top boxes and automobiles.

Wade Roush is a freelance science and technology journalist and the producer and host of the podcast Soonish. Follow @soonishpodcast

Trending on Xconomy