English rules the Internet, which can be a frustrating thing for
the world's 1.3 billion Chinese and 322 million Spanish-speakers.
They outnumber Anglophones. Even online, two-thirds of users speak
something other than English at home.
So when someone promises a smoother and easier translation
program, people around the world tend to perk up their ears. It's a
step closer to a truly "worldwide" Web where every page would be
available for everyone to read in his or her own language.
The latest step comes later this month when the National
Institute of Standards and Technology (NIST), an arm of the United
States government, announces results of its tests of several machine-
translation systems. The agency is expected to give top honors, not
to the linguistic-savvy programs at universities and elsewhere, but
to a newcomer: Internet search company Google. Google's apparent
success suggests that a new approach to translation - fancy math
rather than linguistic know-how - may be the way forward in a field
that has struggled with the nuance and ambiguity of human language.
"Nobody in my team is able to read Chinese characters," says
Franz Och, who heads Google's machine-translation (MT) effort. Yet,
they are producing ever more accurate translations into and out of
Chinese - and several other languages as well.
To demonstrate the software's prowess, Mr. Och displayed an
Arabic newspaper headline at a recent media tour of Google's
headquarters in Mountain View, Calif. One commercially available MT
program translated it: "Alpine white new presence tape registered
for coffee confirms Laden." Then he displayed the translation from
Google's prototype, which made considerably more sense: "The White
House Confirmed the Existence of a New Bin Laden tape."
Of course, every MT program can point to strengths in its
approach versus weakness in others', experts say. The key is whether
statistical systems have become powerful enough to outperform the
intensive, rules-based systems now available.
"These translations were impossible a few years ago," Och says.
But the advent of ever-cheaper and faster data-crunching and the
mushrooming number of online documents have changed the equation.
Google has improved the algorithms for its MT program, he says, by
feeding its computers the equivalent of 1 million books of text,
using sources such as parallel translations of United Nations
Google's MT system is still under development and not available
to the public. Talking about it at an event for journalists and
industry analysts may mean that at least a test version will be
coming in the next few months, observers speculate.
"The results were very impressive, not the stupid machine
translation you see on the Internet, which isn't really good," says
Philipp Lenssen, who's been writing about Google in his online blog,
Google Blogoscoped, since May 2003.
"This opens up a lot of new possibilities because you don't
really want to read machine translation at the moment," Mr. Lenssen
says. He speculates that it could be a perfect part of a Google Web
browser, should the company decide to release one. A user might
search the entire Web in his native language and have pages returned
to him already translated. "You can apply it to so many situations,"
Many translations, one root
Today, nearly every translation service offered on the Web - AOL,
Alta Vista, Babblefish, even Google's - is powered by translation
technology developed by Systran. The company, based in San Diego and
Paris, has been involved in MT for more than 30 years. Each day, it
translates more than 25 million Web pages.
MT involves years of hard work creating rules for translation
between a pair of languages, says Dimitris Sabatakakis, chief
executive officer of Systran. Using statistical methods, such as
Google does, is a well-known technique. …