Animations using Vision-Based Gesture
It has been only six years since the publication of Neal Stephenson "Snow Crash"  and much of the technology predicted in this techno fiction already exists. Realistic 3D scenery and avatars with human-like motion can be animated on a PC, along with continuous speech recognition, speech synthesis, and 3D sound. This could create the medium for the next century, for interactive entertainment, education, business and social gatherings on the web. However, one part is still missing -- a simple and natural interface to 3D animations. Conventional input devices such as a mouse or joystick control only two parameters, and are counter-intuitive in 3D interaction. New devices such as sensing gloves or body suites are too cumbersome for the general use.
A technology that promises a natural and unconstrained spatial interface is a direct gesture input, where gestures and control values are extracted from hand images acquired by video cameras ([1, 3, 4]. Such optical interfaces promise multiple degrees of freedom for control, high precision, speed, and ultimately very low cost.
Research on video-based gesture recognition in Bell Labs started ten year ago. Our first real-time gesture recognizer was operating In 1990 at four frames/sec, using a system of thirty processors . A recently developed practical gesture interface system, GestureVR  runs at the rate of 60Hz on a PC. It recognizes gestures reliably, computing up to twenty simultaneous parameters from two hands. This paper describes several variants of GestureVR, and shows how this system can be used as an intuitive input to spatial applications.
The basic component of GestureVR is a planar gesture recognition unit. It acquires hand images from a camera, recognizes gestures and estimates