Yeah, but Can You Search Audio?
Convery, Anna, Online
Numerous technological advances over the past 10 to 15 years have allowed information professionals to become quite skilled at organizing and retrieving information--when it's in textual or numeric form. While there will always be room for improvement, recent progress in managing these kinds of data has been quite remarkable. However, as organizations increasingly archive a wide range of recorded audio and video data, such as conference calls, voice mail, radio and television broadcasts, training sessions, focus groups, speeches, and sales presentations, a new challenge is emerging for information professionals. You simply cannot search these with traditional tools.
Into this void steps the rapidly maturing field of speech analytics, once known as audio mining. Traditionally, organizations used manual means to search their recorded audio, if indeed they felt up to the task of searching it at all. Two methods have emerged in recent years to automate this process: speech-to-text, in which the audio is first transcribed to text then scanned for requested search terms, and phonetic search, in which searches are performed directly on the audio, meaning search terms need not match letter for letter, but simply sound for sound.
The need for organizations to search their constantly expanding audio archives, to improve the speed and accuracy levels of these searches, and to support multiple languages drives speech analytics technology. Demand for speech analytics comes from within any organization that accumulates large amounts of recorded audio--from extremely poor-quality audio all the way up to crystal-clear broadcast audio. A common trait shared by these organizations is the understanding that a wealth of information exists in recorded audio and that, if harnessed, this knowledge can contribute tremendously in complementing text-based research, improving customer satisfaction, and uncovering new revenue opportunities.
CAN YOU HEAR ME NOW?
Before pursuing a speech analytics solution for your organization, you must first address what this technology needs to do. Generally agreed is that an effective speech analytics product must, at a minimum, be able to search potentially thousands of hours of audio and return results that are both accurate and timely. While speech-to-text products can search faster than phonetics-based counterparts, phonetic solutions are able to convert the audio into a searchable form up to 50 times faster than speech-to-text, often delivering quicker results from start to finish. The speed of rendering audio searchable is a key hurdle in making speech analytics viable and the advances made within the past year in optimizing the speed of phonetic search indexing represent a major breakthrough in the field.
You must also consider the accuracy of search when evaluating a speech analytics solution. Accuracy equates to obtaining results that are of interest. In other words, relevant results. For example, if you search on the word bomb, you will receive many results referencing bomb as in an incendiary device, which is perhaps what you want, but you will also get results where the speaker said, "That film just bombed." This example shows why it is also important for speech analytics tools to provide capabilities for rules-based queries that help eliminate irrelevant results. Consider another example: Suppose a company is interested in understanding why customers close accounts. If they simply search on the word close, they will get some relevant data, a good deal of irrelevant data, and, worst of all, they will completely miss those instances in which a customer used terms such as "cancel" or "shut down" to indicate closing his or her account. The ability to structure sophisticated queries will help ensure that the searcher captures such instances.
LISTENING TO HIS MASTER
So just how do these solutions work? The speech-to-text approach is dictionary-dependent, meaning a decision on whether the searched term can be found in the dictionary and that it has been correctly transcribed must be made. …