Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O

Multimedia Web Information Systems - World Wide Web History, Multimedia on the World Wide Web

data based audio video

Lambros Makris and Michael G. Strintzis
Informatics and Telematics Institute, Thessaloniki Greece

Definition: Multimedia Web information systems represent a whole new breed of information systems intended to process, store, retrieve, and disseminate information through the Internet using Web protocols.

Since its inception the World Wide Web (WWW) has revolutionized the way people interact, learn and communicate. A whole new breed of information systems has been developed to process, store, retrieve, and disseminate information through the Internet using Web protocols. These systems are using hypertext to link documents and present any kind of information, including multimedia, to the end user through a Web browser. Web based information systems are often called Web Information Systems (WIS). A WIS, which uses multimedia as its key component, is a Multimedia Web Information System.

World Wide Web History

The notion of hypertext, small pieces of information interlinked with one another using active words or hotspots, was well defined before the introduction of the WWW. Vannevar Bush in 1945 describes the Memex, a photo-electrical-mechanical device for memory extension which could make and follow links between documents on microfiche. In 1962 Doug Engelbart sets the foundation for an “oNLine System” (NLS) which does browsing and editing of interlinked information. Ted Nelson in 1965 first uses the word “hypertext” in to describe the ideas behind his Xanadu system. But it was not until 1989 when Tim Berners-Lee, while he was consulting for CERN – the European Organization for Nuclear Research located in Geneva, Switzerland – described a system of “linked information systems” which he called the World Wide Web. Berners-Lee made a prototype, the Worldwide Web browser (Figure 1), together with a special hypertext server to demonstrate the capabilities of his system.

The wide acceptance of the WWW came with NCSA Mosaic, a Web browser developed by Marc Andreessen and Eric Bina, at the National Center for Supercomputing Applications, University of Illinois, in 1993. NCSA Mosaic was originally written for X-Windows but it was later ported to the Windows and Macintosh platforms, thus making Mosaic the first multi-platform Web browser. The first commercial browsers, e.g. Netscape Communicator and Microsoft Internet Explorer, were mostly based on the source code of NCSA Mosaic but were wildly re-written while they were caught in the “Browser Wars” of mid-1995 to late 1996. By 1998 Netscape’s market share, which once was more than 90%, evaporated, leaving Internet Explorer the unchallenged dominator for more than five years.

It was not until recently, when a new round has begun. Mozilla Firefox, the open-source descendant of Netscape Communicator, is offering standards compliance, a better security model and various other tidbits that make life easier for designers and users. Firefox, together with other browsers, such as Opera and Safari, now challenge the lead position of Internet Explorer cutting pieces off its share, and probably starting a new round of technological advancement for the WWW.

Multimedia on the World Wide Web

HTTP is built on top of TCP (Transmission Control Protocol) which guarantees correct delivery of data packets using acknowledgements, timeouts and retries. While this behavior may be desirable for documents and images it is not suitable for time-sensitive information such as video and audio. TCP imposes its own flow control and windowing schemes on the data stream, destroying the temporal relations between video frames and audio packets. On the other hand reliable message delivery is not required for video and audio, which can tolerate frame losses. For this reason the Real-time Transport Protocol (RTP), along with the Real Time Streaming Protocol (RTSP), were designed to stream media over computer networks.

Applications typically run RTP on top of UDP (User Datagram Protocol) to make use of its multiplexing and checksum services; however, RTP may be used with other suitable underlying network or transport protocols. RTP, despite its name, does not in itself provide for real-time delivery of multimedia data or other quality-of-service (QoS) guarantees, but relies on lower-layer services to do so. RTP combines its data transport with a control protocol (RTCP), which makes it possible to monitor data in order for the receiver to detect if there is any packet loss and to compensate for any delay jitter. Except RTP, other implementations also exist, which offer streaming audio and video for both unicast and multicast transmission.

Web Browsers, in general, do not have the inherent ability to play audio and video streams. This can be accomplished through special plug-ins, which establishes their own connections to servers and present multimedia within the flow of the hypertext document or in a separate window. Typically, these plug-ins use some form of the aforementioned RTP family of protocols to control their communication with the server and low-bit rate compression schemes to encode the audio and video streams. As network bandwidth to the end-user increases, applications such as video on demand, multimedia presentations, two-way multimedia communication etc. are becoming common place.

Another area which benefits from the ability of browsers to host third-party plug-ins is that of 3D applications (Figure 2).

Extensible 3D (X3D) , the successor to the Virtual Reality Modeling Language (VRML), is an XML-based 3D file format which enables real-time communication of 3D data both for stand-alone and networked applications. X3D builds on top of VRML offering XML conformance and easier integration with other XML-based applications, a componentized approach which allows developers to support subsets of the specification (profiles), and better support for not only 3D data but also text, audio and video which, this way, can participate in complex 3D scenes. VRML and X3D scenes can be combined with web applications to form sophisticated systems that add a new dimension to human-computer interaction.

The Scalable Vector Graphics (SVG) specification, on the other hand, deals with interactive and animated 2D data. Graphics created in SVG can be scaled without loss of quality across various platforms and devices. SVG is XML-based and scriptable through the SVG Document Object Model (DOM). Web applications using SVG can allow programmers to produce personalized graphics on-the-fly for users, which in-turn can modify this data on the client, or input their own data. Since text in SVG graphics is text and not some bitmap information, it can be searched normally, and localized with minimum effort. Finally the combination of SVG with the Synchronized Multimedia Integration Language (SMIL) can produce animations of the SVG objects, with characteristics such as motion paths, fade-in or fade-out effects, and objects that grow, shrink, spin or change color.

Search and Retrieval of Multimedia for the World Wide Web

As the amount of multimedia data stored on the Web, keeps increasing, the problem arises of devising efficient methods for retrieving and employing it in the context of complex activities. The coexistence of information in various formats (i.e. Image, Video, Audio, and Text) and the fact that pieces of data relevant to a user’s needs may exist in any of the aforementioned forms, potentially associated with each other, make evident the importance of information systems supporting querying and accessing of multimedia content.

Multimedia content-based access would put fewer burdens on the user, who would not have to repeat the query using a different tool for each of the desired media types, while at the same time would allow for the possible associations between information of various types to be exploited, thus making possible the maximization of the relevance of the returned results. Nevertheless, one should not neglect the possibility that most efficient search for one type of media may require a totally different approach than that required for another type of media, thus an attempt to integrate query methodologies may lead either to poor performance due to compromises being made or, alternatively, to excessive system or graphical user interface complexity, that could make the system hardly usable by the non-expert. Another technical barrier results from the expectations of the user for searching and filtering. Evidence has shown that users consider high-level semantic concepts when looking for specific multimedia content. Typical high-level semantics include objects, people, places, scenes, events, actions and so forth, which are difficult to derive automatically from the multimedia data.

The very first attempts for multimedia retrieval were based on exploiting existing captions and surrounding text to classify visual and aural data to predetermined classes or to create a restricted vocabulary. Although relatively simple and computationally efficient, this approach has several restrictions; it neither allows for unanticipated queries nor can be extended easily. Additionally, such keyword-based approaches assume either the preexistence of textual annotation (e.g. captions) or that annotation using the predetermined vocabulary is performed manually. In the latter case, inconsistency of the keyword assignments among different indexers can also hamper performance.

To overcome the limitations of the keyword-based approach, the use of low-level indexing features has been proposed. Relevant items are retrieved by comparing the low-level features of each item in the database with those of a user-supplied sketch or, more often, a key-item that is either selected from a restricted set or is supplied by the user. This retrieval methodology is known as Query-by-Example. One of the first attempts to realize this scheme is the Query by Image Content (QBIC) system (Figure 3). Other systems also exist, which use various proprietary indexing feature-sets.

MPEG-7, on the other hand, aims to standardize the description of multimedia content by defining a set of binary Descriptors and corresponding Description Schemes (DS), thus supporting a wide range of applications. The MPEG-7 DSs provide a way to describe in XML the important concepts related to audio-visual data in order to facilitate the searching, indexing and filtering of audio-visual data. The MPEG-7 reference software, known as XM (experimentation Model) implements a set of general descriptors (i.e. color, texture, shape, motion) which can be readily used to for multimedia search and retrieval. The SCHEMA Reference System is an example of a web based system which uses the MPEG-7 XM for content-based information retrieval.

Multimodal Interfaces [next] [back] Multimedia System-on-a-Chip - Introduction, A SoC for handheld Multi-media, Intel® XScale® Micro-architecture

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or