Magazine article Computers in Libraries

Choosing and Using Text-to-Speech Software

Magazine article Computers in Libraries

Choosing and Using Text-to-Speech Software

Article excerpt

Imagine choosing the voice you'd like to read aloud to you the morning news or a book as you drive to work. Imagine not being able to tell if it is a natural human voice or a synthetic, computer-generated voice. Believe it or not, the technology is available for you to have your choice of voice. It used to be that synthetic voices sounded like computers and were difficult to understand and annoying to listen to for extended periods of time. Now, there are a variety of high-quality voices, female and male, with different accents and pitches and speeds to choose from. Text-to-speech (TTS) software is ready for widespread use by libraries, other organizations, and individual users.

The Voder or Voice coder, the first electronic speech synthesis machine, was developed initially in the late 1920s by Bell Labs and demonstrated to the public at the New York World's Fair and at the San Francisco Golden Gate Exhibition in 1939. If we were to hear these voices today, perhaps we would chuckle. It would be difficult to understand what was being said. However, electronic speech synthesis opened up a whole world of literature to individuals with little or no sight.

Ray Kurzweil was the principal developer on the first print-to-reading machine for the blind. (He was also the principal developer for the first CCD flat-bed scanner, the first text-to-speech synthesizer, the first music synthesizer capable of re-creating the grand piano and other orchestral instruments, and the first commercially marketed large-vocabulary speech recognition application.) The first prototype for the Kurzweil reading machine was completed in 1975. Original units cost $30,000-$50,000. Although the price has dropped considerably since then, with stretched budgets and a small percentage of people who have need for this, most libraries cannot justify purchasing a Kurzweil machine.

What Text-to-Speech Can Do

TTS software offers the affordable ability to turn just about any electronic text that is not image-based into an artificially spoken communication. TTS can be used to create an audible substitute for--or complement to--visual reading. TTS software forms the basis of screen reader software that greatly improves the accessibility of electronic information for people who are blind or who experience temporary or permanent low vision. Interest in the use of TTS software is increasing among libraries and library patrons. If your library has not yet investigated and/or implemented TTS solutions, now is a great time to do so.

TTS software can be beneficial to more than just the blind and low-vision members of the population that your library serves. Children and adults who are learning to read for the first time can benefit from hearing a book or passage read aloud, either in conjunction with visual reading or as a preparation or reinforcement for visual reading. Reluctant readers of all ages also can benefit from the ability to toggle TTS software on and off on demand to meet their particular needs. People learning a second language also can benefit, especially if the software is sufficiently sophisticated to provide a good vocalization of the sound of the second language when spoken, perhaps even in various dialects.

TTS software also can be used to make image-based information more accessible and informative to all. For example, the Illinois State Library has provided LSTA funds to the Alliance Library System to help teams of librarians involved in digital imaging projects across Illinois to develop and deploy TTS-based audio descriptions. The digital imaging teams are writing brief audio descriptions of selected images they have digitized. Then they'll use TTS software from NeoSpeech to create computer-generated audio renditions of these descriptions. The result will be an audio file that's smaller than a human-generated audio narration. It's also less expensive and faster to produce. In addition to having the audio available on the library's Web site and in ContentDM, we will have a central best practices site with the 10 to 12 images selected from each library's digital image collection with a link to the TTS-generated audio file. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.