Redstart Systems’ Voice Command Software Replaces the Keyboard and Mouse—and Not Just for Dictation

If you want to dictate notes into your computer without typing, speech recognition software like Dragon Naturally Speaking, from Burlington, MA-based Nuance Communications (NASDAQ: NUAN), works surprisingly well these days. Even without training, dictation software can hit accuracy rates of 99 percent; once it learns your personal speech patterns, it’s nearly flawless.

But using speech commands to do almost anything else on your computer is far more difficult. Nuance says the latest version of Dragon Naturally Speaking can be used to control applications such as Microsoft Word, Outlook Express, and Internet Explorer, and the software even includes “voice shortcuts” that let users interact with search engines using natural-language utterances like “Search the Web for global warming articles.” But for complex, oft-repeated command-and-control operations, like opening or closing windows or moving blocks of text in a document, using natural language commands can be tedious. It also tends to be slower than using mouse and keyboard commands, since the software has to spend a good deal of time figuring out what you meant before it acts. For the large group of computer users who turn to speech recognition software because of repetitive strain injuries (RSI)—and who aren’t supposed to touch their computers at all, lest they aggravate their condition—that’s a dangerous situation.

That’s the problem—and the opportunity—that Boston-based Redstart Systems has set out to address. After nearly 15 years of behind-the-scenes software development, the tiny, self-funded startup today launched a program called Utter Command that vastly speeds up command-and-control operations for Windows computer users who already have Dragon Naturally Speaking Professional.

The secret to Utter Command isn’t speech recognition—it depends on Naturally Speaking as its speech engine—but rather its ability to parse “stacked,” shorthand commands. For example, instead of laboriously saying, “Move the cursor to the end of the sentence, select the last three words of the sentence, and delete them,” an Utter Command user would simply say “End 3 befores delete.” (In this example, “befores” is shorthand for “words before”—and is a good example of the way Utter Command clips things down.)

At $395, the new program isn’t cheap. But it may be a worthwhile investment for people who really can’t touch their computers. And if you view it as a powerful add-on that makes up for features missing in Naturally Speaking Pro, which retails for $899, the price tag seems even more reasonable.

Utter Command OverviewRedstart president and founder Kimberly Patch, a science writer who has worked at PC Week and Technology Research News, says she first conceived the software in the mid-1990s when she developed a repetitive strain injury from typing on her computer. “I started out using Dragon Dictate 1.0, but I got frustrated with it and starting writing macros to speed things up,” Patch says. “But I’d forget half the macros I wrote, and then I’d have to rewrite them. I realized it was easier to remember standardized commands. I was writing about things like cognition and linguistics, and it turned out that this made sense according to the cognitive studies; there are MRI studies that show that certain things are easier to say than others.”

What Patch was discovering (as she explains in a series of papers on the Redstart website) was that sticking to a small set of commands, and arranging them according to a precise grammar, might actually create a lower cognitive load on a user than trying to speak to a computer as if it were a person. It would probably ease the load on the computer, too, since the software wouldn’t have to anticipate all the different ways a person might phrase a command in natural language.

Patch started writing down the commands in her grammar to make sure she was using them consistently. A bit later on, she found a programmer to help her incorporate the commands and the grammar into an application. And about five years ago, she decided to turn the application into a product.

But getting it working the way she wanted and writing up the documentation “took a lot longer than I thought it would,” she says. On top of that, there was an ethical concern. “With RSI, you can type, but you don’t want to, and if you get frustrated with something, you will hurt yourself,” she says. “We wanted to make sure our system was not going to tempt someone to hurt themselves, so we wanted to fill in all the gaps—including a good way to move the mouse, which was the last piece.”

In Naturally Speaking, moving the mouse pointer is an awkward matter of calling up a set of onscreen quadrants, choosing the one where the pointer should appear, calling up a smaller set of quadrants, choosing one, and so forth. In Utter Command, the user can simply bring up a set of rulers that provide a coordinate system spanning the whole screen. For example, saying “50 by 60” will move the mouse pointer to a location 50 horizontal units to the right of the screen’s upper left corner and 60 vertical units below it. This YouTube video illustrates the process:

From videos like this one, you begin to get a sense of how fast and efficient speech interfaces can be, at least for expert users. I was floored by Patch’s facility with the program during a live WebEx demo last week. “If you give me any one program, I can speed it up using speech,” she says. “There might be a few things that will be a little bit slower, but on average, we’ve come up with one speech command for every 2.3 mouse and keyboard commands, whether you’re running a PowerPoint presentation or working on an Excel spreadsheet.”

Unfortunately, I’m a Mac guy, and Redstart Utter Command—like Naturally Speaking—is only available for Windows. And in truth, the price tag for the combined system is too high for casual users like me who are merely intrigued by speech-driven command and control. But it may make a lot of sense for people with injuries that make it difficult or impossible to type—and there are 35 million people in the United States who fit that description, according to Patch.

“We think of this as a spring-loaded market,” she says. “There are a lot of people who have RSI and have tried speech software, but just aren’t comfortable with it because the cognitive load is too high, and because it makes them go a little slower. But if you change that and make the cognitive load a lot lower and make things work faster, then there is a reason to pick it up again.”

Wade Roush is the producer and host of the podcast Soonish and a contributing editor at Xconomy. Follow @soonishpodcast

Trending on Xconomy

By posting a comment, you agree to our terms and conditions.

Comments are closed.