Hey, Can Anybody Read This?
Castellucio, Michael, Strategic Finance
As globalization and the Internet combine to shrink the space between the lines, both longitudinal and latitudinal, the sounds of other languages become more common in our neighborhood. And with the addition of new languages there's a new problem.
Until now, on the Internet at least, if you knew English, you could get by very well. Although only 8% of the world's population are native speakers of English, a majority of 56.5% of Internet users have English as their native language. But that's changing. New Internet users who don't speak English now outnumber new English users, and Dan DePalma, a principal analyst at Forrester Research, says that due to this growth, "U.S. businesses will have to learn to translate their Web sites and Internet services into other languages." (Global Internet statistics by language are available at www.euromktg.com/globstats.)
Commerce depends on communication, and anything that inhibits the exchange of information is a problem. So what do you do? You can keep a pool of translators to handle e-mail, agreements, contracts, and website content, but that can be very expensive. Nevertheless, if you want to broadcast worldwide, you can no longer rely solely on English.
Why Not Let IBM Do It?
After all, haven't computers done a great job translating foreign currencies? Even the Euro is just another set of routines folded into the program, and as long as the information is updated, the exchanges are just fine. Just teach the same machines to translate the e-mail.
Well, with words there are a couple of extra steps. Computers only understand numbers - when they are handling letters and words they see them as strings of numbers. You can assign a numerical value to a word, give it ASCII equivalents for each letter, and then get the computer to match that number pattern to a stored definition of the word you found in the dictionary. But that's just the beginning.
Let's take the example of teaching a computer first to understand a word in your own language before giving it the foreign equivalent. Take a word as simple as dog. Let's give that the number 8-7-3 for the three letters. Store that in the computer, and now check the dictionary. Four-legged mammal, right? OK, associate that with your number and you're done. Well, not exactly, because to dog is a verb that means to persistently pursue. What numbers do we assign to that dog? Or the investment not worth its price, or the ugly person, or ruin as in going to the dogs? We'll have to enter all of these dogs before we begin to associate them with their foreign equivalents.
That's the first level of problems, and it only deals with word-for-word translations - the poorest and most inaccurate translations. A good translation program will look at the structure of the sentence first in order to determine the use and meaning of the words. On this level things really get confusing.
Numbers are relatively easy to use when plugged into formulas that are based on logic. The rules of language, grammar and usage, usually aren't based on reason but on conventional acceptance. And because many of those choices (where does the verb go, what words can be contractions, which can't) were made too long ago to have been documented, we just memorize the usage and forget the reasons. …