Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Multimodal Interfaces

systems based users’ oviatt

Definition: Multimodal interfaces process two or more combined user input modes, such asspeech, pen, touch, manual gestures, and gaze, in a coordinated manner with multimedia system output.

They are a new class of emerging systems that aim to recognize naturally occurring forms of human language and behavior, with the incorporation of one or more recognition-based technologies (e.g., speech, pen, vision). Multimodal interfaces represent a paradigm shift away from conventional graphical user interfaces. They are being developed largely because they offer a relatively expressive, transparent, efficient, robust, and highly mobile form of human-computer interaction. They represent users’ preferred interaction style, and they support users’ ability to flexibly combine modalities or to switch from one input mode to another that may be better suited to a particular task or setting.

Multimodal systems have evolved rapidly during the past decade, with steady progress toward building more general and robust systems. Major developments have occurred in the hardware and software needed to support key component technologies incorporated in multimodal systems, in techniques for fusing parallel input streams, in natural language processing (e.g., unification-based integration), and in time-sensitive and hybrid architectures. To date, most current multimodal systems are bimodal, with the two most mature types involving speech and pen or touch input (Oviatt et al., 2000), and audio-visual input (e.g., speech and lip movements; Potamianos et al., 2004). However, these systems have been diversified to include new modality combinations such as speech and manual gesturing, and gaze tracking and manual input. Multimodal applications also range from map-based and virtual reality systems for simulation and training, to multi-biometric person identification/verification systems for security purposes, to medical, educational, and web-based transaction systems (Oviatt et al., 2000).

Given the complex nature of users’ multimodal interaction, cognitive science has played an essential role in guiding the design of robust multimodal systems. The development of well integrated multimodal systems that blend input modes synergistically depends heavily on accurate knowledge of the properties of different input modes and the information content they carry, how multimodal input is integrated and synchronized, how users’ multimodal language is organized, what individual differences exist in users’ multimodal communication patterns, and similar issues (Oviatt, 2003). Prototyping of new multimodal systems also has depended heavily on the use of high-fidelity simulation methods.

Commercial activity on multimodal systems has included PDAs, cell phones, in-vehicle systems, and desktop applications for CAD and other application areas. Commercial systems typically do not include parallel processing of two semantically rich input modes delivered together by the user, and those that are not fusion-based systems but instead process just one mode alternative at a time. In contrast, for over a decade research-level multimodal systems have included fusion-based processing of two parallel input streams that each conveys rich semantic information (Oviatt et al., 2000). Such systems can support mutual disambiguation of input signals. Mutual disambiguation involves recovery from unimodal recognition errors within a multimodal architecture, because semantic information from each input mode supplies partial disambiguation of the other mode (Oviatt, 2003).

One particularly advantageous feature of such multimodal systems is their superior error handling. The error suppression achievable with a bimodal system, compared with a unimodal one, can be in excess of 40%. Trimodal multi-biometric systems have also been demonstrated to perform more robustly than bimodal ones. In addition to improving the average system reliability, a multimodal interface can perform in a more stable manner for challenging user groups (e.g., accented speakers) and in real-world settings (e.g., mobile contexts).

Research has begun to design more tangible, adaptive, and collaborative multimodal systems for a variety of application areas (e.g., meetings, education). Recent progress in cognitive science, neuroscience, and biology is also contributing to the theoretical foundations needed to design future multimodal interfaces with greater flexibility, generality and power.

Multiple Source Alignment for Video Analysis - Abstract, Introduction, Screenplays in movie production practice, System Architecture, a. Screenplay parsing [next] [back] Multimedia Web Information Systems - World Wide Web History, Multimedia on the World Wide Web

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or