Alex Pentland, Deb Roy, Chris Wren
M.I.T. Media Laboratory
E15-387, 20 Ames St., Cambridge MA 02139
In the language of cognitive science, perceptual intelligence is the ability to solve the frame problem: it is being able to classify the current situation, so that you know what variables are important, and thus can take appropriate action. Once a computer has the perceptual intelligence to know who, what, when, where, and why, then statistical learning methods are usually sufficient for the computer to determine what aspects of the situation are significant, and to react appropriately.
A key idea in building interfaces based on the idea of perceptual intelligence is that they be adaptive both to overall situation and to the individual user. Thus the interface must learn user behaviors, and how they vary as a function of the situation. For instance, we have built systems that learn user's driving behavior, thus allowing the automobile to anticipate the driver's actions ( Pentland and Liu 1999), systems that learns typical pedestrian behaviors, allowing it to detect unusual events ( Oliver et al 1998).
Most recently we have built audiovisual systems that learn word meanings from natural audio and visual input ( Roy and Pentland 1998). A significant problem in designing effective interfaces is the difficulty in anticipating a person's word choice and associated intent. Our system addresses this problem learning the vocabulary of each user together with its visual grounding.