Numerous technological advances over the past 10 to 15 years have allowed information professionals to become quite skilled at organizing and retrieving information--when it's in textual or numeric form. While there will always be room for improvement, recent progress in managing these kinds of data has been quite remarkable. However, as organizations increasingly archive a wide range of recorded audio and video data, such as conference calls, voice mail, radio and television broadcasts, training sessions, focus groups, speeches, and sales presentations, a new challenge is emerging for information professionals. You simply cannot search these with traditional tools.
Into this void steps the rapidly maturing field of speech analytics, once known as audio mining. Traditionally, organizations used manual means to search their recorded audio, if indeed they felt up to the task of searching it at all. Two methods have emerged in recent years to automate this process: speech-to-text, in which the audio is first transcribed to text then scanned for requested search terms, and phonetic search, in which searches are performed directly on the audio, meaning search terms need not match letter for letter, but simply sound for sound.
The need for organizations to search their constantly expanding audio archives, to improve the speed and accuracy levels of these searches, and to support multiple languages drives speech analytics technology. Demand for speech analytics comes from within any organization that accumulates large amounts of recorded audio--from extremely poor-quality audio all the way up to crystal-clear broadcast audio. A common trait shared by these organizations is the understanding that a wealth of information exists in recorded audio and that, if harnessed, this knowledge can contribute tremendously in complementing text-based research, improving customer satisfaction, and uncovering new revenue opportunities.
CAN YOU HEAR ME NOW?
Before pursuing a speech analytics solution for your organization, you must first address what this technology needs to do. Generally agreed is that an effective speech analytics product must, at a minimum, be able to search potentially thousands of hours of audio and return results that are both accurate and timely. While speech-to-text products can search faster than phonetics-based counterparts, phonetic solutions are able to convert the audio into a searchable form up to 50 times faster than speech-to-text, often delivering quicker results from start to finish. The speed of rendering audio searchable is a key hurdle in making speech analytics viable and the advances made within the past year in optimizing the speed of phonetic search indexing represent a major breakthrough in the field.
You must also consider the accuracy of search when evaluating a speech analytics solution. Accuracy equates to obtaining results that are of interest. In other words, relevant results. For example, if you search on the word bomb, you will receive many results referencing bomb as in an incendiary device, which is perhaps what you want, but you will also get results where the speaker said, "That film just bombed." This example shows why it is also important for speech analytics tools to provide capabilities for rules-based queries that help eliminate irrelevant results. Consider another example: Suppose a company is interested in understanding why customers close accounts. If they simply search on the word close, they will get some relevant data, a good deal of irrelevant data, and, worst of all, they will completely miss those instances in which a customer used terms such as "cancel" or "shut down" to indicate closing his or her account. The ability to structure sophisticated queries will help ensure that the searcher captures such instances.
LISTENING TO HIS MASTER
So just how do these solutions work? The speech-to-text approach is dictionary-dependent, meaning a decision on whether the searched term can be found in the dictionary and that it has been correctly transcribed must be made. Dictionaries can be trained to support out-of-dictionary terms such as proper names, brand names, industry slang and jargon, but this process is cumbersome. Phonetic search is based on the probability that the search term has found a good phonetic match within the audio files. Since there is no dictionary dependence, any term or phrase is searchable, including terms and phrases outside that language model. The rule of thumb is, "If you can sound it out, you can search for it." Since phonetic searching is based on matching the phonetic string of the search term to the audio file, accuracy and relevancy improve with the length of the phonetic search string.
While people know that recorded audio contains a wealth of information and it is within these files that users will derive key nuggets of information, what is not so widely understood is the complexity this introduces into searching and analyzing those files. This is another area in which the two speech analytics methodologies differ. Speech-to-text trains to a new dictionary and grammar patterns, whereas the phonetic approach trains to a new acoustic model (phonetic representation of that language) and can support languages even when a written form of that language does not exist.
With speech-to-text and phonetic search representing two different ways of mining recorded audio, you should ask, "Which method makes the most sense for my organization?"
First, any organization interested in using and deploying speech analytics software should thoroughly test their recordings with the prospective solution, taking care to ensure that its audio sample is of searchable quality. As with any technology, it is important to verify that the solution is compatible within your environment and that you are comfortable with the application, can work easily with it, and are able to change search terms and rules without having to recruit the vendor's professional services support team. Key functional areas to focus on with a speech analytics solution should include speed, relevancy of results and, of course, ease-of-use. When evaluating speed, it's important to consider not only how quickly searches are conducted but also the speed at which the audio is rendered searchable, since this goes a long way toward impacting the real total cost of ownership and how easily the solution can scale to your needs.
SPEAK INTO THE MICROPHONE
Here are some other less-obvious areas that prospective users of speech analytics solutions should be aware of:
* Some offerings pre-categorize and limit search terms and phrases to offset scalability issues. These restrictions may limit your ability to extract all of the relevant information stored within your audio.
* We live in a global economy, a fact that becomes more obvious every day. Make sure that your current and future language needs can be easily supported.
* Many speech analytics solutions use original equipment manufacturer (OEM) technology that has been developed by other vendors. This is a common practice across the software industry, but if being close to the source of the core technology and benefiting quickly from technology advances is important to your organization, then you may wish to also conduct due diligence on the core technology and the relationship between the organizations.
* Analyze the out-of-the-box applications for reports and features (such as ad-hoc search) that are of immediate benefit to you.
* And, as always, check customer and partner references.
The need to efficiently search recorded audio and unlock key information residing within these assets is as important to organizations--and, in some cases, more important--as the ability to search text. By understanding the different methodologies and properly evaluating the various offerings in the speech analytics field, you can select the solution that drives the best results for your organization.
Anna Convery [email@example.com] is the SVP of marketing & product management for Nexidia.
Comments? E-mail letters to the editor to firstname.lastname@example.org.…