Speaking Louder Than Words with Pictures across Languages

Speaking Louder Than Words with Pictures across Languages

It has been said that an old Chinese proverb placed a value of 10,000 words on a single picture, and a similar Japanese proverb devalues this to only 100 words. In most languages the current consensus seems to be that a picture is worth 1000 words. Whatever the true worth in words a good picture is capable of conveying (sometimes quite complex) meaning clearly and without the need for language. Show a picture of an elephant to speakers of two different languages, and most likely they will both understand exactly what it means. A picture can in effect ground the meaning to an object or concept in the real world and act as a convenient bridge over language barriers.

Picture Books

Our idea originally stemmed from the rise in popularity of picture book translation aids in Japan. These books are a modern interpretation of the traditional phrase book, and they improve on it by adding image annotations and allowing users to compose their own phrases by combining fragments of sentences that are found on the same page together. For example, figure 1 illustrates the process of communication using a picture book. The process is simple: the user of the book simply points at pictures or text on the pages of the book, in a particular order. In this case let's assume the user is a Japanese person wanting to communicate with an English speaker. The user first points to "I want to go to the ~" Here the ~ is a placeholder for a number of possible filler items that appear on the same page. In this example we give two possible filler items: restaurant and cinema; the user chooses restaurant.

The picture book is a powerful idea because it is easy for users to understand the communication process and because the use of pictures to support the words in the book will not only aid the process of visual search for phrases but also assist the communication process. However, the picture book has limits by virtue of its being a book, namely: the number of pictures contained in the book is limited; complex expressions cannot be constructed; the search for the appropriate pictures can be laborious; and pictures are only designed to be combined with pictures on the same page. Combining pictures with others not designed to be used with them may not make sense.

The aim of our research was to try to find a way to create a process of visual communication in a similar form to the picture books but within the framework of an intelligent interactive information system capable of mediating to facilitate the communication.

Machine Translation

If a machine is going to lend a hand in the communication process between two people, perhaps the most obvious way it can contribute is by providing an automatic translation of the natural language expressing what is intended to be communicated. Machine-translation (MT) systems already exist on mobile devices, for example the VoiceTra and TextTra mobile applications that take their input from speech or from text, respectively. Machine translation however is also not without problems. First, neither of the two input methods previously described are perfect for use on mobile devices. Textual input is very cumbersome on small mobile devices, and speech-recognition systems frequently make errors that are hard for the users to correct. Second, the MT systems themselves can make errors. Sometimes nonsense is generated, or if the MT system is particularly skilful very fluent output can be produced that carries totally the wrong meaning. The users may have no idea what has been communicated to the other party, and in some cases users may believe they understand perfectly what was expressed, when in fact they are gravely mistaken.

Our Idea

Our idea is a very simple one: use pictures as the user input method. The users should be able to input the gist of what they wish to say in the form of a sequence of picture icons and then let the machine work out what they intend to express and provide a translation of this in the other language. …

