Methods of Teaching Speech Recognition

Article excerpt

Abstract

Objective: This article introduces the history and development of speech recognition, addresses its role in the business curriculum, outlines related national and state standards, describes instructional strategies, and discusses the assessment of student achievement in speech recognition classes. Methods: Research methods included a synthesis of historical research, survey research, a Delphi study, and an analysis of instruments for assessing speed and accuracy. Results: A synthesis of the literature revealed best practices for teaching speech recognition, training in the software, teaching students with poor reading skills, and assessing achievement in speech recognition. Conclusions: Speech recognition students should be able to produce mailable documents at more than 100 words a minute with 95% accuracy. Student achievement in speech recognition should be assessed by production/ mailability, quizzes/paper tests, speed on straight copy, accuracy on straight copy, technique, and attendance/work habits. Production work should be assessed by criterion-referenced grading, and speed should be assessed by GWAM on 1-minute timings on straight copy with 1.4 syllabic intensity. The words should be counted by using 1.4-syllable standard words. Accuracy should be assessed by the percentage of correct words, calculated by one minus the percentage of errors.

Introduction

"Speech recognition" refers to the ability of computers to recognize and respond to spoken commands. This technology allows individuals to input data into a computer without the use of a keyboard. Speech recognition software is utilized for "voice writing" that permits the user to create and format documents by dictating text into a microphone. This chapter introduces the history and development of speech recognition, addresses its role in the curriculum, outlines related national and state standards, describes appropriate instructional strategies, and discusses the assessment of student achievement in speech recognition classes.

The History and Development of Speech Recognition

Speech recognition is a technology that has evolved over many years, beginning in 1936 when Bell Laboratories began conducting research on automatic speech recognition and transcription ("History of Speech," n.d.). In 1950 Bell Laboratories developed the first technology that could recognize spoken numbers ("The History of Voice Recognition," 2004), and in 1964 IBM demonstrated its "shoe box recognizer" for spoken digits at the World's Fair (IBM Research, 2007). In 1984 IBM introduced a speech recognition system on a huge 4341 mainframe computer connected to a user interface on an Apollo computer that could recognize a 5,000-word vocabulary with 95% accuracy. In 1992 IBM released the Personal Dictation System that used an IBM personal computer to take dictation at 80 words a minute with 95% accuracy. In 1995 Dragon Systems and Kurzweil introduced two additional speech recognition products, and in 1997 Dragon released the first "continuous speech recognition" (CSR) software, NaturallySpeaking, that did not require the user to pause between words when dictating ("Speech Recognition Software," n.d.). CSR technology requires three components: a computer with sufficient memory and a high quality soundcard, a noise cancellation headset for dictation into the computer, and speech recognition software (Kirriemuir, 2003). In CSR systems, complex algorithms compare the user's speech to a model of spoken language that is based on three types of input data: acoustic (sound patterns of the human voice), linguistic (how words are grouped together in patterns), and lexical (the words in the "vocabulary" or system dictionary) ("Understanding the Use," 1999). CSR systems are "speaker dependent" because the software is trained to the user's unique acoustic patterns (The Nifty 59, 2006).

Distributed speech recognition (DSR) or "speaker independent" systems utilize a somewhat different technology in which the software is installed on a company's server and is primarily used for specialized operations such as call centers (Aurora Distributed Speech Recognition, 2007; The Nifty 59, 2006). …

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.