Other Free Encyclopedias » Online Encyclopedia » Encyclopedia - Featured Articles » Contributed Topics from K-O » Multimedia Web Information Systems - World Wide Web History, Multimedia on the World Wide Web

World Wide Web Protocols and Standards

html document language http

The WWW was originally based on three simple ideas, the HyperText Transfer Protocol (HTTP) which is used to transmit information between a server (Web server) and a client (Web browser), the Uniform Resource Locator (URL) which identifies any file or document available on the Internet and the HyperText Markup Language (HTML) which is used to structure and format the hypertext documents. It was this simplicity that made WWW stand out of its competition – Gopher, Hyper-G, and WAIS to name a few – and survive on those early days of the information reform.

HTTP is a simple protocol that allows easy exchange of documents, much like the File Transfer Protocol (FTP). HTTP defines how messages are formatted and transmitted between the server and the client, and what actions should be taken in response to the various commands. HTTP is a stateless protocol, which means that each request is completely independent with no relationship whatsoever to the requests that preceded it. This is not usually a problem for viewing statically generated pages but for applications that generate dynamic content (for example a shopping cart) it is extremely important to know what has happened previously. To the rescue came the cookies mechanism. Cookies allow a Web server to store small pieces of information on a user’s machine and later retrieve it. They are embedded in the headers of the HTTP protocol flowing back and forth, identifying each request as coming from the same user.

Information on the WWW can be anything from HTML files to audio and video. Each piece of data is pinpointed by its URL ; the global address of documents and other resources on the WWW. URLs are strings that specify how to access network resources, such as HTML documents. A URL has, in general, the following form:

<scheme>://<user>:<password>@<host>:<port>/<url-path>

where <scheme> is the protocol that should be used to access the resource (e.g. ftp, http, file etc.), <user> and <password> is used for resources that need authentication, <host> is the name of the server which hosts the resource, <port> is the port on which the particular server is listening on, and <url-path> is data specific to the scheme (i.e. for http or ftp, url-path may be the complete path and filename to a file).

The URL solved a perpetual problem of information systems in networked environments. After its introduction, each and every file, executable and resource could be accurately addressed thus making possible the development of interlinked documents not necessarily bound to a particular system but rather able to include any widely used protocol. To wrap all that, Berners-Lee introduced HTML, a language which could be used to produce and present structured documents, but most of all a language which could include links to other resources – the foundation of the WWW.

HTML was not something new. It was actually an application of SGML (Standard Generalized Markup Language – ISO 8879) which is an international standard for defining structured document types and markup languages. HTML spared the complexity of SGML by defining a small set of rules which were easy to learn and could initially deliver what promised – a true universal way of building hypertext documents. But as in all other sides of the human life people got greedy. They wanted to get more out of its simplicity, and as a result the language was heavily abused. Tables were used to fake document layout, GIF graphics to create white space and typography effects and tags which were only meant to be used to convey the meaning of the document were used according to the way browsers rendered them. On top of that, HTML expanded without a plan, driven by the supply and demand of the market, which consequently leaded to the balkanization of the language. Users are familiar with browser incompatibilities which break document layout and often cause much frustration or even inability to access a particular document or site.

To battle this situation, the World Wide Web Consortium (W3C) was formed in 1994 as an industry consortium dedicated to building consensus around Web technologies. In late 1997 HTML 4.0 was released as a W3C Recommendation bringing an end to vendors’ “inventions” and hacks. HTML 4.0 added new features, such as the ability to include Cascading StyleSheets (CSS) for separation of content from structure, and the definition of the Document Object Model (DOM) which enabled more interactive pages through JavaScript. HTML 4.0 also included important features to make content more accessible to some users with disabilities. But standards conformance and accessibility, even now, is suffering from badly written browsers and design tools, “traditional” designers, and the millions of obsolete pages which are already in place.

A step forward is now being prepared as committed researchers are setting the foundations for a Semantic Web, an extension of the current Web, which will better convey information and meaning for both computers and people. XML (Extensible Markup Language) is used throughout this effort since it provides a basic syntax that is extensible and can describe any resource or data structure. The Semantic Web will provide the tools necessary for the development of breakthrough applications which will rely more on automated procedures rather than human interaction.

 

User Comments

Your email address will be altered so spam harvesting bots can't read it easily.
Hide my email completely instead?

Cancel or