Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
The machine had been delivered two days ago on her first adult birthday. She had said, "But father, everybody - just everybody in the class who has the slightest pretensions to being anybody has one. Nobody but some old drips would use hand machines - "
The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."
Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher. But when it was delivered, it was the model she wanted - obtained perhaps with a little more wail and sniffle than quite went with the adulthood of fourteen - and copy was turned out in a charming and entirely feminine handwriting, with the most beautifully graceful capitals anyone ever saw. Even the phrase, "Oh, golly." somehow breathed glamour when the Transcriber was done with it.
--Isaac Asimov, Second Foundation - 1953
Here at AnandTech, we do our best to cover the topics that will interest our readers. Naturally, some topics are of interest to the vast majority of readers, while others target a more limited audience. At first glance, this article falls squarely into the latter category. However, when we think about where computers started and where they are now, and then try to extrapolate that and determine where they are heading in the future, certainly the User Interface has to play a substantial part in making computers easier to use for a larger portion of the population. Manual typewriters gave way to keyboards; text interfaces have been replaced by GUIs (mostly); and we have mice, trackballs, touchpads, and WYSIWYG interfaces now. Unfortunately, we have yet to realize the vision of Isaac Asimov and other science fiction writers where computers can fully understand human speech.
Why does any of this really matter? I mean, we're all basically familiar with using keyboards and mice, and they seem to get the job done quite well. Certainly, it's difficult to imagine speech recognition becoming the preferred way of playing games. (Well, some types of games at least.) There are also people in the world that can type at 140 wpm or faster -- wouldn't they just be slowed down by trying to dictate to the computer instead of typing?
There are plenty of seemingly valid concerns, and change can be a difficult process. However, think back for a moment to the first time you saw Microsoft's new wheel mouse. I don't know how other people reacted, but the first time I saw one I thought it was the stupidest gimmick I had ever seen. I already had a three button mouse, and while the right mouse button was generally useful, the middle mouse button served little purpose. How could turning the middle mouse button into a wheel possibly make anything better? Fast forward to today, and it irritates me to no end if I have to use a mouse that doesn't have a wheel. In fact, when I finally tried out the wheel mouse, it only took about two hours of use before I was hooked. I've heard the same thing from many other people. In other words, just because something is different or you haven't tried it before, don't assume that it's worthless.
There are a couple areas in which speech recognition can be extremely useful. For one, there are handicapped people that don't have proper control over their arms and hands, and yet they can speak easily. Given how pervasive computers have become in everyday life, flat out denying access to certain people would be unconscionable. Many businesses are finding speech recognition to be useful as well -- or more appropriately, voice recognition. (The difference between speech recognition and voice recognition is that voice recognition generally only has to deal with a limited vocabulary.) As an example, warehousing job functions only require a relatively small vocabulary of around 400 words, and allowing a computer system to interface with the user via earphones and a microphone can free up the hands to do other things. The end result is increased productivity and reduced errors, which in turn yields better profitability.
The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."
Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher. But when it was delivered, it was the model she wanted - obtained perhaps with a little more wail and sniffle than quite went with the adulthood of fourteen - and copy was turned out in a charming and entirely feminine handwriting, with the most beautifully graceful capitals anyone ever saw. Even the phrase, "Oh, golly." somehow breathed glamour when the Transcriber was done with it.
--Isaac Asimov, Second Foundation - 1953
Here at AnandTech, we do our best to cover the topics that will interest our readers. Naturally, some topics are of interest to the vast majority of readers, while others target a more limited audience. At first glance, this article falls squarely into the latter category. However, when we think about where computers started and where they are now, and then try to extrapolate that and determine where they are heading in the future, certainly the User Interface has to play a substantial part in making computers easier to use for a larger portion of the population. Manual typewriters gave way to keyboards; text interfaces have been replaced by GUIs (mostly); and we have mice, trackballs, touchpads, and WYSIWYG interfaces now. Unfortunately, we have yet to realize the vision of Isaac Asimov and other science fiction writers where computers can fully understand human speech.
Why does any of this really matter? I mean, we're all basically familiar with using keyboards and mice, and they seem to get the job done quite well. Certainly, it's difficult to imagine speech recognition becoming the preferred way of playing games. (Well, some types of games at least.) There are also people in the world that can type at 140 wpm or faster -- wouldn't they just be slowed down by trying to dictate to the computer instead of typing?
There are plenty of seemingly valid concerns, and change can be a difficult process. However, think back for a moment to the first time you saw Microsoft's new wheel mouse. I don't know how other people reacted, but the first time I saw one I thought it was the stupidest gimmick I had ever seen. I already had a three button mouse, and while the right mouse button was generally useful, the middle mouse button served little purpose. How could turning the middle mouse button into a wheel possibly make anything better? Fast forward to today, and it irritates me to no end if I have to use a mouse that doesn't have a wheel. In fact, when I finally tried out the wheel mouse, it only took about two hours of use before I was hooked. I've heard the same thing from many other people. In other words, just because something is different or you haven't tried it before, don't assume that it's worthless.
There are a couple areas in which speech recognition can be extremely useful. For one, there are handicapped people that don't have proper control over their arms and hands, and yet they can speak easily. Given how pervasive computers have become in everyday life, flat out denying access to certain people would be unconscionable. Many businesses are finding speech recognition to be useful as well -- or more appropriately, voice recognition. (The difference between speech recognition and voice recognition is that voice recognition generally only has to deal with a limited vocabulary.) As an example, warehousing job functions only require a relatively small vocabulary of around 400 words, and allowing a computer system to interface with the user via earphones and a microphone can free up the hands to do other things. The end result is increased productivity and reduced errors, which in turn yields better profitability.
38 Comments
View All Comments
JarredWalton - Friday, April 21, 2006 - link
That's definitely true -- if you look at how accuracy scales with CPU usage, doubling and even tripling the processor time comes with only incremental increases in accuracy. I do have to say that I noticed it being a little sluggish on my single core system when I was multitasking, but obviously I push my computers a little harder than a lot of people. Depending on what you're willing to live with in terms of speed, I'm sure both Dragon and Microsoft speech recognition can work on a Pentium III level system.LanceM - Friday, April 21, 2006 - link
So is that selection typical Asimov? If so, it has convinced me to never bother reading any of his works.His ideas/plots/etc. may be interesting, but I don't think I could handle phrases like, "as if she were some dried-up, old-maid teacher." Give me Joseph Conrad or William Faulkner.
Dfere - Monday, April 24, 2006 - link
Asimov is classic Sci-Fi- pulp, which usually had a gritty detective-novel appeal. Hs works are in large part murder mystery type novels. You have to understand the nature of the literature, the history and the author. I don't think a critique is deserved until then.Most Sci Fi writers of any ability first master imaginative concepts and apply them, even Drke and Sirling.
I give Kudos to the staff for including literary comments, the poster who said this should not be a book of the month club lives a very one dimensional life.
Shoal07 - Friday, April 21, 2006 - link
What makes Asimov special is many of his ideas in sci fiction are comming true today or are atleast on the horizon. Asimov shaped the way many of us picture the future.goinginstyle - Friday, April 21, 2006 - link
Why does the Anandtech staff revert to literary quotes in their reviews now? This is a computer website, not a book club.JarredWalton - Friday, April 21, 2006 - link
I read Asimov's foundation series as a teenager, and I loved it. He gave me lots of fanciful dreams about where technology might go in the future, and even though some of the writing styles have changed over the years, I still find a lot of these old sci-fi books to be entertaining. You should try reading War of the Worlds if you think that quote was bad. LOLSorry if some of you didn't like the quote. Everyone has their own dislikes and likes, but in the end it's just an introduction. I hope to one day be able to yell at my computer and have it properly understand what I say, as well as the context (i.e., yelling means something is going wrong, and maybe it can help me out). Will we ever get there? Probably some day, but whether it happens in our lifetimes or not is anyone's guess.
NegativeEntropy - Saturday, April 22, 2006 - link
I like the use of quotes -- though it does remind me a bit of being in English/writing class ("Always do something in the introduction to get your audience's attention...").On the subject of "classic" Sci-fi writers, I also still enjoy old school Heinlein. Though his characters can get a bit repetitive across his pile of works, many of the science ideas are still valid (and I share much of his apparent personal philosophy).
On the actual article -- thanks for doing it. I have been curious where this technology was at in terms of every day usage and hardware requirements.
Regarding CPU usage, it's possible DNS attempts to use whatever resources are available based on preferences. i.e. on minimum, it attempts to impact the system minimally, regardless of the CPU resources available; say 25% on min, 50% on med and 95% on max with the percentage staying relatively consistent from a P3 1GHz to an A64 2.6GHz. This would explain its reported good scaling from system to system. If you want to test it, underclock your A64 system to half its frequency and compare utilization at the medium setting.
kristof007 - Friday, April 21, 2006 - link
Here at Anandtech you can always count on to find something else. Great article! I tried out speech recognition a few years back and I got frustrated with it over one thing or another so I just dropped it and went back to typing. I've been typing for about 8 years now. I never learned the "proper" way to type where every finger has a spot. Anyway I hope Vista will make speech recognition WAAY better so that it could be used around the OS AND for speech recognition.Thanks for the article!