Skinput is an input technology that uses bio-acoustic sensing to localize finger taps on the skin. When augmented with a pico-projector, the device can provide a direct manipulation, graphical user interface on the body. The technology was developed by Chris Harrison, Desney Tan, and Dan Morris, at Microsoft Researchs Computational User Experiences Group. Skinput represents one way to decouple input from electronic devices with the aim of allowing devices to become smaller without simultaneously shrinking the surface area on which input can be performed. While other systems, like SixthSense have attempted this with computer vision, Skinput employs acoustics, which take advantage of the human bodys natural sound conductive properties (e.g., bone conduction). This allows the body to be annexed as an input surface without the need for the skin to be invasively instrumented with sensors, tracking markers, or other items. Microsoft has not commented on the future of the projects, other than it is under active development. It has been reported this may not appear in commercial devices for at least 2 years.
Definition of Skinput Technology
The Microsoft company have developed Skinput , a technology that appropriates the human body for acoustic transmission, allowing the skin to be used as an input surface. In particular, we resolve the location of finger taps on the arm and hand by analyzing mechanical vibrations that propagate through the body. We collect these signals using a novel array of sensors worn as an armband. This approach provides an always available, naturally portable, and on-body finger input system. We assess the capabilities, accuracy and limitations of our technique through a two-part, twenty-participant user study. To further illustrate the utility of our approach, we conclude with several proof-of-concept applications we developed.
Introduction of Skinput Technology
The primary goal of Skinput is to provide an alwaysavailable mobile input system - that is, an input system that does not require a user to carry or pick up a device. A number of alternative approaches have been proposed that operate in this space. Techniques based on computer vision are popular These, however, are computationally expensive and error prone in mobile scenarios (where, e.g., non-input optical flow is prevalent). Speech input is a logical choice for always-available input, but is limited in its precision in unpredictable acoustic environments, and suffers from privacy and scalability issues in shared environments. Other approaches have taken the form of wearable computing.
This typically involves a physical input device built in a form considered to be part of ones clothing. For example, glove-based input systems allow users to retain most of their natural hand movements, but are cumbersome, uncomfortable, and disruptive to tactile sensation. Post and Orth present a "smart fabric" system that embeds sensors and conductors into abric, but taking this approach to always-available input necessitates embedding technology in all clothing, which would be prohibitively complex and expensive. The SixthSense project proposes a mobile, alwaysavailable input/output capability by combining projected information with a color-marker-based vision tracking system. This approach is feasible, but suffers from serious occlusion and accuracy limitations. For example, determining whether, e.g., a finger has tapped a button, or is merely hovering above it, is extraordinarily difficult
Skinput leverages the natural acoustic conduction properties of the human body to provide an input system, and is thus related to previous work in the use of biological signals for computer input. Signals traditionally used for diagnostic medicine, such as heart rate and skin resistance, have been appropriated for assessing a users emotional state. These features are generally subconsciouslydriven and cannot be controlled with sufficient precision for direct input. Similarly, brain sensing technologies such as electroencephalography (EEG) & functional near-infrared spectroscopy (fNIR) have been used by HCI researchers to assess cognitive and emotional state; this work also primarily looked at involuntary signals. In contrast, brain signals have been harnessed as a direct input for use by paralyzed patients, but direct brain computer interfaces (BCIs) still lack the bandwidth requiredfor everyday computing tasks, and require levels of focus, training, and concentration that are incompatible with typical computer interaction.
There has been less work relating to the intersection of finger input and biological signals. Researchers have harnessed the electrical signals generated by muscle activation during normal hand movement through electromyography (EMG). At present, however, this approach typically requires expensive amplification systems and the application of conductive gel for effective signal acquisition, which would limit the acceptability of this approach for most users. The input technology most related to our own is that of Amento et al who placed contact microphones on a users wrist to assess finger movement. However, this work was never formally evaluated, as is constrained to finger motions in one hand.
The Hambone system employs a similar setup, and through an HMM, yields classification accuracies around 90% for four gestures (e.g., raise heels, snap fingers). Performance of false positive rejection remains untested in both systems at present. Moreover, both techniques required the placement of sensors near the area of interaction (e.g., the wrist), increasing the degree of invasiveness and visibility. Finally, bone conduction microphones and headphones - now common consumer technologies - represent an additional bio-sensing technology that is relevant to the present work. These leverage the fact that sound frequencies relevant to human speech propagate well through bone.
Bone conduction microphones are typically worn near the ear, where they can sense vibrations propagating from the mouth and larynx during speech. Bone conduction headphones send sound through the bones of the skull and jaw directly to the inner ear, bypassing transmission of sound through the air and outer ear, leaving an unobstructed path for environmental sounds.
Despite being a Microsoft Research internal project, Skinput has been demonstrated publicly several times. The first public appearance was at Microsofts TechFest 2010, where the recognition model was trained live on stage, during the presentation, followed by an interactive walkthrough of a simple mobile application with four modes: music player, email inbox, Tetris, and voice mail. A similar live demo was given at the ACM CHI 2010 conference, where the academic paper received a "Best Paper" award. Attendees were allowed to try the system. Numerous media outlets have covered the technology, with several featuring live demos.