Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
The machine had been delivered two days ago on her first adult birthday. She had said, "But father, everybody - just everybody in the class who has the slightest pretensions to being anybody has one. Nobody but some old drips would use hand machines - "
The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."
Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher. But when it was delivered, it was the model she wanted - obtained perhaps with a little more wail and sniffle than quite went with the adulthood of fourteen - and copy was turned out in a charming and entirely feminine handwriting, with the most beautifully graceful capitals anyone ever saw. Even the phrase, "Oh, golly." somehow breathed glamour when the Transcriber was done with it.
--Isaac Asimov, Second Foundation - 1953
Here at AnandTech, we do our best to cover the topics that will interest our readers. Naturally, some topics are of interest to the vast majority of readers, while others target a more limited audience. At first glance, this article falls squarely into the latter category. However, when we think about where computers started and where they are now, and then try to extrapolate that and determine where they are heading in the future, certainly the User Interface has to play a substantial part in making computers easier to use for a larger portion of the population. Manual typewriters gave way to keyboards; text interfaces have been replaced by GUIs (mostly); and we have mice, trackballs, touchpads, and WYSIWYG interfaces now. Unfortunately, we have yet to realize the vision of Isaac Asimov and other science fiction writers where computers can fully understand human speech.
Why does any of this really matter? I mean, we're all basically familiar with using keyboards and mice, and they seem to get the job done quite well. Certainly, it's difficult to imagine speech recognition becoming the preferred way of playing games. (Well, some types of games at least.) There are also people in the world that can type at 140 wpm or faster -- wouldn't they just be slowed down by trying to dictate to the computer instead of typing?
There are plenty of seemingly valid concerns, and change can be a difficult process. However, think back for a moment to the first time you saw Microsoft's new wheel mouse. I don't know how other people reacted, but the first time I saw one I thought it was the stupidest gimmick I had ever seen. I already had a three button mouse, and while the right mouse button was generally useful, the middle mouse button served little purpose. How could turning the middle mouse button into a wheel possibly make anything better? Fast forward to today, and it irritates me to no end if I have to use a mouse that doesn't have a wheel. In fact, when I finally tried out the wheel mouse, it only took about two hours of use before I was hooked. I've heard the same thing from many other people. In other words, just because something is different or you haven't tried it before, don't assume that it's worthless.
There are a couple areas in which speech recognition can be extremely useful. For one, there are handicapped people that don't have proper control over their arms and hands, and yet they can speak easily. Given how pervasive computers have become in everyday life, flat out denying access to certain people would be unconscionable. Many businesses are finding speech recognition to be useful as well -- or more appropriately, voice recognition. (The difference between speech recognition and voice recognition is that voice recognition generally only has to deal with a limited vocabulary.) As an example, warehousing job functions only require a relatively small vocabulary of around 400 words, and allowing a computer system to interface with the user via earphones and a microphone can free up the hands to do other things. The end result is increased productivity and reduced errors, which in turn yields better profitability.
The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."
Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher. But when it was delivered, it was the model she wanted - obtained perhaps with a little more wail and sniffle than quite went with the adulthood of fourteen - and copy was turned out in a charming and entirely feminine handwriting, with the most beautifully graceful capitals anyone ever saw. Even the phrase, "Oh, golly." somehow breathed glamour when the Transcriber was done with it.
--Isaac Asimov, Second Foundation - 1953
Here at AnandTech, we do our best to cover the topics that will interest our readers. Naturally, some topics are of interest to the vast majority of readers, while others target a more limited audience. At first glance, this article falls squarely into the latter category. However, when we think about where computers started and where they are now, and then try to extrapolate that and determine where they are heading in the future, certainly the User Interface has to play a substantial part in making computers easier to use for a larger portion of the population. Manual typewriters gave way to keyboards; text interfaces have been replaced by GUIs (mostly); and we have mice, trackballs, touchpads, and WYSIWYG interfaces now. Unfortunately, we have yet to realize the vision of Isaac Asimov and other science fiction writers where computers can fully understand human speech.
Why does any of this really matter? I mean, we're all basically familiar with using keyboards and mice, and they seem to get the job done quite well. Certainly, it's difficult to imagine speech recognition becoming the preferred way of playing games. (Well, some types of games at least.) There are also people in the world that can type at 140 wpm or faster -- wouldn't they just be slowed down by trying to dictate to the computer instead of typing?
There are plenty of seemingly valid concerns, and change can be a difficult process. However, think back for a moment to the first time you saw Microsoft's new wheel mouse. I don't know how other people reacted, but the first time I saw one I thought it was the stupidest gimmick I had ever seen. I already had a three button mouse, and while the right mouse button was generally useful, the middle mouse button served little purpose. How could turning the middle mouse button into a wheel possibly make anything better? Fast forward to today, and it irritates me to no end if I have to use a mouse that doesn't have a wheel. In fact, when I finally tried out the wheel mouse, it only took about two hours of use before I was hooked. I've heard the same thing from many other people. In other words, just because something is different or you haven't tried it before, don't assume that it's worthless.
There are a couple areas in which speech recognition can be extremely useful. For one, there are handicapped people that don't have proper control over their arms and hands, and yet they can speak easily. Given how pervasive computers have become in everyday life, flat out denying access to certain people would be unconscionable. Many businesses are finding speech recognition to be useful as well -- or more appropriately, voice recognition. (The difference between speech recognition and voice recognition is that voice recognition generally only has to deal with a limited vocabulary.) As an example, warehousing job functions only require a relatively small vocabulary of around 400 words, and allowing a computer system to interface with the user via earphones and a microphone can free up the hands to do other things. The end result is increased productivity and reduced errors, which in turn yields better profitability.
38 Comments
View All Comments
FrankyJunior - Sunday, April 30, 2006 - link
For anyone that wants to try Dragon, I just noticed that the preferred version is in the CompUSA ad today for $99.Never would have looked twice at it if I hadn't read this article yesterday.
NullSubroutine - Thursday, April 27, 2006 - link
are we to the day when i say 'computer' and it does what i want, and when i time travel by going around the sun ill be confused when they hand me a mouse and keyboard when wanting to use a computer?JarredWalton - Thursday, April 27, 2006 - link
Almost. And if you go around the sun *backwards* you can travel through time in the other direction. :Dquanta - Tuesday, April 25, 2006 - link
How about a review based on http://www.voicebox.com">VoiceBox Tehnologies products? It was demonstrated on Discovery Channel, and it seems to work without extensive voice training, and it actually _understand_ human speeches. The Discovery Channel can be found in http://www.exn.ca/dailyplanet/view.asp?date=3/13/2...">here.rico - Tuesday, April 25, 2006 - link
Where did you find Dragon Pro for $160? I thought it ususally cost about $800. Thanks.JarredWalton - Tuesday, April 25, 2006 - link
Heh, sorry - got "Preferred" and "Professional" mixed up. I'm not entirely sure what Pro includes, i.e. "Comes with a full set of network deployment tools."Trying to surf through Nuance's site is a bit tricky, and finding prices takes some effort as well. I think the only difference between Standard and Preferred is the ability to transcribe recordings in preferred - can anyone confirm for sure? I asked Nuance and didn't get a reply.
Tabah - Sunday, April 23, 2006 - link
Excellent article/review. Here's the question I've been wondering. Personally I use DNS for blogging and generally anything that requires excessive typing. A friend of mine on the other hand swears by IBM ViaVoice. Any chance we could get a comparison article/review at a later date?JarredWalton - Tuesday, April 25, 2006 - link
I will try to get in touch with IBM. I'm sure they wouldn't mind participating in a follow-up article.Tabah - Tuesday, April 25, 2006 - link
Oddly enough ViaVoice is licensed by Nuance so you might have a better chance talking to them. The main reason I'd like to see a comparison between VV and DNS isn't so much because they're made/released by the same company, but because off the cost difference between them. Like I said before I really like DNS but VV at the high end (VV Pro USB vs DNS Pro) is still a few hundred dollars cheaper.Poser - Sunday, April 23, 2006 - link
Listening to the dictation files, I was amazed that all the punctuation was spoken. I would have expected that they would (or could) be replaced by using a non-speech sound. Something along the lines of a click of the tongue for a comma -- there's a good number of distinct sounds you can make with your tongue that we don't have words for but that anyone could recognize and make. Think of "The Gods Must be Crazy" and the language used by the Kalahari bushmen for an extreme example.Also, thanks for the article, it was really interesting and potentially very helpful! I'll hold off until Vista hits and I see some comparisons, but I'm certain now that I'll end up using one of the two.