Software: Lost in Translation; Search-Engine Translations Are Still Imperfect at Best-But New Statistical Methods Could Help Raze the Tower of Babel
Byline: Mac Margolis and Jonathan Adams, With Andrew Ehrenkranz in Paris
Frank Sinatra's rendition of the tune "The Girl From Ipanema" might not have been a hit if it weren't for Norman Gimbel, the songwriter who translated the Portuguese lyrics into singable English. Had Sinatra run "The Girl" through Google instead, the tune never would have made it onto vinyl. As the saying goes: Traduttore, traditore --the translator is a traitor.
That holds especially for machine translation, or MT--the software that translates Web pages for the likes of Google and AltaVista. In recent years a handful of search engines have come to dominate the globe, even though at best they do a poor job of translating Web pages. This performance gap is keeping millions of non-English-speaking people from getting access to English-language Web pages--which currently account for about 35 percent of the billions of Web pages now available via search engines. If engineers can solve some of the more vexing problems of machine translation--and many think they eventually will--it could transform the competitive landscape for search-engine firms.
It will also be the biggest advance in MT since U.S. Sovietologists used computers to make sense out of Russian-language documents during the cold war. They made swift advances, but couldn't crack the tougher problems--the ambiguities of meaning and the complexities of grammar, to name two. With the advent of the Internet and powerful computer chips, their technology made its way to the common man, warts and all. In recent years MT firms like Systran of San Diego, California, which currently supplies Google and AltaVista, and Language Weaver of Marina del Ray, California, have incorporated advances in linguistics and statistics to render texts in languages from Croatian to Mapudungun, spoken by the Mapuches in Chile. …