Speech Recognition: Instead of Typing and Clicking, Talk and Command

Article excerpt

Why should educators care about speech recognition technology? One reason is that it will eventually be so widespread as to be practically inescapable. From IVR (Interactive Voice Response) systems that include "homework hotlines," where parents call in to check their student's progress or schedule appointments with teachers, to speech therapy software such as IBM's Speech Viewer, speech recognition technology is already making inroads into the educational environment.

While the technology certainly exists to bring speech recognition into the classroom, there are many obstacles that must still be overcome. To understand these problems and better appreciate the benefits this technology brings, a little background is in order.(*)

The Challenges

Voice and speech recognition has been around since the early 1970's, when research was conducted on these technologies at DARPA (Defense Advanced Research Projects Agency). While commercial applications existed in the 80's and early 90's, they were cost and technology prohibitive. Today's technology, though, has brought voice and speech recognition out of the laboratories, once and for all.

However, some of the problems that plagued early pioneers attempting to enable consistent, reliable speech recognition still remain. For example, every person speaks differently, with various noises or disturbances in their speech. Pausing, clearing the throat, coughing or using sounds like "uh," "um" and "ah" all conspire to send the "listening" computer into confusion. Fast talkers tend to run their words together even more than speakers with normal pacing. Quite often there is background noise that "pollutes" incoming voice signals, making it difficult for the computer to accurately identify sounds. And, many words sound alike, putting the burden of understanding meaning onto the computer -- which is natural language processing (NLP), where computers not only recognize speech, but understand what those words mean.

All of these challenges have been met to some extent, as a look at current speech recognition products will verify. But before we look at what's available, we should understand the types of speech recognition systems that exist.

One or Multiple Persons

Speaker-dependent speech recognition systems have been pretty much the norm for commercial applications. This type of system is trained through repetition to recognize a vocabulary of words (up to a few thousand) from a particular user and is based on a template representation of speech. Users train speaker-dependent systems in their voice patterns by speaking voice samples or words that must be recognized. The computer then stores these templates or voice prints on the system. Later, when speech recognition is enabled, the system compares the spoken commands with the stored voice prints. When the voice print and spoken commands match, the system instructs the computer to execute the command.

Other types of speaker-dependent systems include those that match phonemes, multiple words or triphones, and are used for larger vocabularies. Most of the software packages integrating speech recognition that are available are based on speaker-dependent systems.

Speaker-independent speech recognition systems are more like what are typically envisioned in science fiction works. These systems have the ability to recognize speech regardless of who it comes from. These types of systems, as you can imagine, were quite rare until recently, and can still be difficult to create, as they must be able to accurately recognize words from any speaker.

Stop and Start and Stop Speaking or Keep on biking?

Speaker-dependent and speaker-independent speech recognition systems can further be divided into two additional types of systems, based on what type of speech signal can be input: discrete and continuous.

Discrete speech recognition systems make users separate each spoken word with a pause. …