In the 1970s, some AI leaders predicted that we would soon see all manner of artificially intelligent entities in our daily lives. Unfortunately, in the interim, this has been true mostly in the realm of science fiction. Recently, however, pioneering researchers have been bringing together advances in many subfields of AI, such as robotics, computer vision, natural language and speech processing, and cognitive modeling, to create the first generation of robots and avatars that illustrate the true potential of combining these technologies. The purpose of this article is to highlight a few of these projects and to draw some conclusions from them for future research.
We begin with a short discussion of scope and terminology. Our focus here is on how robots and avatars interact with humans, rather than with the environment. Obviously, this cannot be a sharp distinction, since humans form part of the environment for such entities. However, we are interested primarily in how new interaction capabilities enable robots and avatars to enter into new kinds of relationships with humans, such as hosts, advisors, companions, and jesters.
We will not try to define robot here, but we do want to point out that our focus is on humanoid robots (although we stretch the category a bit to include a few animallike robots that illustrate the types of interaction we are interested in). Industrial automation robotics, while economically very important, and a continual source of advances in sensor and effector technology for humanoid robots, will continue to be more of a behind-the-scenes contributor to our everyday lives.
The meaning of the term avatar is currently in flux. Its original and narrowest use is to refer to the graphical representation of a person (user) in a virtual reality system. Recently, however, the required connection to a real person has been loosened and the term avatar has been used to refer to NPCs (nonplayer characters) in three-dimensional computer games and to synthetic online sales representatives, such as Anna at ikea.com. We hope this broader usage will catch on and displace the term embodied conversational agent, which is somewhat confusing, especially in the same discussion as robots, since it is, after all, robots--not graphical agents--that have real bodies. We will therefore use the term avatar in this article to refer to intelligent graphical agents in general.
Human Interaction Capabilities
There are four key human interaction capabilities that characterize the new generation of robots and avatars: engagement, emotion, collaboration, and social relationship. These capabilities are listed roughly in order from "low-level" (closer to the hardware and with shorter real-time constraints) to "high-level" (more cognitive), but as we will see, there are many interdependencies among the capabilities.
Engagement is the process by which two or more participants in an interaction initiate, maintain, and terminate their perceived connection to one another (Sidner et al. 2005). In natural human interactions, engagement constitutes an intricately timed physical dance with tacit rules for each phase of an interaction.
In copresent interaction, engagement indicators include where you look, when you nod your head, when you speak, how you gesture with your hands, how you orient your body, and how long you wait for a response before trying to reestablish contact. Strategies for initiating an interaction involve, for example, catching your potential interlocutor's eye and determining whether his or her current activity is interruptible. The desire to end an interaction (terminate engagement) is often communicated through culturally mediated conventions involving looking, body stance (for example, bowing), and hand gestures. Careful empirical and computational analysis of these rules and conventions in human …