Although test subjects still prefer announcements recorded from natural speech when forced to choose, the results of Studies 1 and 2 suggest that AT&T's new text-to-speech has clearly advanced beyond the traditional TTS, currently found in many telephone services. While these results suggest that work remains to be done, it is unclear that the goal of TTS needs to be announcements indistinguishable from natural speech. One can argue that user acceptance of TTS is a function, not only of its fidelity to natural speech, but also of the types of advanced and personalized services that such technology affords. With this in mind, the question facing telephony service developers becomes, not, "Do we have a TTS technology that is indistinguishable from natural speech?", but, "Is our TTS natural-sounding enough, given the types of features and services we are able to provide, that callers will accept it." This is not to say that the practice of employing recorded natural speech can be abandoned. There are certainly telephone services for which either naturally recorded speech or AT&T's NextGen TTS would be most appropriate (e.g., 800CALLATT vs. an Email Reader service). Callers do, after all, prefer naturally recorded speech, as the results of Study 1 demonstrate. The trick, will be to determine which one to choose.
Beutnagel, M., Conkie, A., Schroeter, J., Styulianou, Y., Syrdal, A.( 1999). "The AT&T Next Generation Text-to-Speech System". Joint Meeting of ASA/EAA/DAGA in Berlin, Germany.
Diehl, R. L., Souther, A. F., & Convis, C. L.( 1980). "Conditions on rate normalization in speech perception". Perception & Psychophysics, 27,435-443.
Martin, J. G.( 1972). "Rhythmic (hierarchical) versus serial structure in speech and other behavior". Psychological Review, 79,487-509.
Miller, J. L.( 1981). "Effects of speaking rate on segmental distinctions". In P. D. Eimas & J. Miller(Eds.), Perspectives on the study of speech(p. 39-74). Hillsdale, NJ: Erlbaum.